Categories
Uncategorized

Posting on AIDL

(This is adapted from my post on AIDL, I decided to turn it to a blog message such that I can easily refer to.)

In this post, I just want to address the issue on posting at AIDL. And more particularly, why sometimes your posts are deleted, have comments stopped, or why you sometimes see sad faces from me.

To premise this: we are now are fairly big group (6500+) and both Waikit Lau and I want to keep the group be public.  What we want to let you have certainly freedom to post your thought first without going through us, the administrators.

Of course, “with great power, come great responsibility”. Such freedom of posting brings a lot of abuse. AIDL is constantly spammed. Or at the very least, the forum is over-posted. So our counter-measure is to check all posts by whether they are 1) relevant, 2) non-commercial, 3) correct. All these 3 criteria requires some judgment calls, but this is how I interpret them

  1.  Relevancy: Your post has to be related to AI or DL. Since ML is a fundamental skill of DL and could be a part of AI. So ML post is welcomed. This rules out many posts, albeit they can be “interesting” by other standard. Note that we are always a *niche group*. So even your post has very interesting theory about general relativity, we can’t quite keep your post.
  2. Non-commercial: If you post anything which is related to $, we can only allow them on Saturday. Your solicitation of survey, conference, reading your sites which require signups. They all implicitly include money. We have very strict rules on such commercial posts. So be advised you should only post them on Saturday.
  3. Correctness: This criterion is to tackle faked news, which is rampant on Facebook. This is perhaps the part which frustrates most people. Because if you are not careful, the web can easily trick you to believe in falsehood. Just consider the recent news about “AI is causing job losses”. Once you go into the sources, most of the reputable sources were actually arguing “Automation/ Computerisation is causing job losses”. “Automation” and “AI” are very different concepts, and I can’t quite see discussion of “automation” relevant to us (Back to Point 1).

Now not meeting these 3 criteria usually explains why some posts are gone. If the post has many likes/shares, I might consider to just give a “sad face” comment. Most of the times though, I would just delete them. Sorry this appears to be rude, but it’s a necessity given our group’s volume.

So how do you avoid these issues? My suggestion is that you should make sure you check your post carefully. Was it the original source? If it is not, does the text distort the originals? Does the post has any click-bait? Remember if you post something at AIDL, you will be asked to give us the source. From time to time, I will also fact-check some sensational statement.

One last point, admittedly such curatorship requires subjective judgment. And frankly I could be wrong. So if you feel strongly about your post, do PM me. You will have my time to review your post together.

Thanks,
Arthur Chan

Categories
Uncategorized

Thoughts from Your Humble Administrators – Jan 22 2017

This week at AIDL:

Must-Read (not really): Kirsten Stewart’s involvement of AI (a style transfer paper) It makes people wonder whether she can get the lowest Erdos-Bacon number. 🙂

If you like this message, subscribe the Grand Janitor Blog’s RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Categories
Uncategorized

Thoughts from Your Humble Administrators – Jan 15

I (Arthur) have been traveling, but there are several newsworthy events:

  • How should we view Microsoft acquires Maluuba?  (First brought up by Zubair Ahmed) My speculation is that MS is trying to tackle unresolved issue in both QA and reading comprehension.  Maluuba recently open their newsQA dataset.  The Goal Oriented dialogue dataset also sounds fairly interesting.
  • For the most part though, Maluuba, just like DeepMind and MetaMind, are “research startup”.   They generate no bookings. Thus you may think big companies are trying snatch 1) the research, 2) the researcher in these kind of startups.   The rest, perhaps is really hype……
  • There are two pieces of news on deep-learning-based poker bot, one from University of Alberta (DeepStack), one from CMU (CMU new, the Verge).  CMU’s Libratus is cleaning up the Pros. With a crushing difference of earnings, collecting $81,716 to the humans $7,228.    Libratus’ detail is out of reach, but DeepStack method is quite similar to AlphaGo, a DNN is built as the approximation function.
  • Several interesting resources this week:
  • Finally, perhaps the most interesting event within our group: Waikit Lau and me are going to hold an online session.  So far, AIDLers seems to be very interesting.  (The response is overwhelming. 🙂 )  We will disclose more detail in the next week or so.

Must read: Times interview with Andrew Ng

If you like this message, subscribe the Grand Janitor Blog’s RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Categories
Uncategorized

Thoughts from Your Humble Administrators – Jan 8, 2017

What we have been thinking last week:

I (Arthur) am traveling, thus the late issue.

  • Our group was once called simply “Deep Learning”.  Of course, all of you have read many articles about “Deep Learning is/is not a hype” type of articles.  So you may wonder how to call B.S. in those articles.
  • There is a trick – Does the author mention that deep learning’s view point is to “automatically learn a representation”, instead of relying on human experts or “feature engineering”.   If an article fails to mention this point,  then that article doesn’t worth its salt.
  • “But hey Arthur, isn’t that just your definition of what deep learning is?”  I know you will say that. 🙂  Not really.  This point happens to be mentioned by Hinton in Lecture 2 of his Coursera class, In Chapter 1 of “Deep Learning” written by Goodfellow.   Goodfellow went on to explain one of the original “deep” means that there is a “deep hierarchical representation”.   That’s why DNN is good, CNN is  good, RNN? If you expand it, it is deep too.  So perhaps that’s why it is also part of “deep learning”.
  • My point is: I guess it doesn’t really matter if someone is for/against deep learning, what you want to look at is their arguments.  Do they know what they are talking about?  Hint: most of them don’t, even authors we allow them to post in this forum.  But we only ask for relevancy, so if the OP has opinions, we let it go.  You are warned though on the validity of some of the blog posts in the forum.
  • CES 2017 is happening.  To AIDLers, perhaps the most relevant is all intelligent agent and virtual assistants.   At least the me, the one which leads the trend recently is Amazon Echo/Dot which is enabled by Alexa.    Not only Amazon’s teams ASR capability should be on-par with Apple, MS and Google.  Alexa leads in terms of long-distance speech recognition (based on beam-forming) as well as keyword wakeup (i.e. user can trigger recognition by saying “Alexa” instead of pressing buttons in old iOS.)  Those are impressive features, and tough to work well technically.
  • One thing to point out, while ASR is very impressive in many of these virtual assistants,  the dialogue system is still lacking the “soul”.   This says for all chatbots – working for specific domain is okay, but you will quickly notice that it is not real. Jose Diaz asked us if one day a universal real virtual assistant is possible, I can only say it is in a universe far far away.

Must read: all posts about Mario.  Why? We love him since we were young! 🙂

Arthur

If you like this message, subscribe the Grand Janitor Blog’s RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Categories
deep learning deep neural network DNN

Reading Michael Nielsen’s “Neural Networks and Deep Learning”

Introduction

Let me preface this article: after I wrote my top five list on deep learning resources, one oft-asked question is “What is the Math prerequisites to learn deep learning?”   My first answer is Calculus and Linear Algebra, but then I will qualify certain techniques of Calculus and Linear Algebra are more useful.  e.g. you should already know gradient, differentiation, partial differentiation and Lagrange multipliers, you should know matrix differentiation and preferably trace trick , eigen-decomposition and such.    If your goal is to understand machine learning in general, then having good skills in integrations and knowledge in analysis helps. e.g. 1-2 stars problems of Chapter 2 at PRML [1] requires some knowledge of advanced function such as gamma, beta.   Having some Math would help you go through these questions more easily.

Nevertheless,  I find that people who want to learn Math first before approaching deep learning miss the point.  Many engineering topics was not motivated by pure mathematical pursuit.  More often than not, an engineering field is motivated by a physical observation. Mathematics is more like an aid to imagine and create a new solution.  In the case of deep learning.  If you listen to Hinton, he would often say he tries to first come up an idea and makes it work mathematically later.    His insistence of working on neural networks at the time of kernel method stems more from his observation of the brain.   “If the brain can do it, how come we can’t?” should be a question you ask every day when you run a deep learning algorithm.   I think these observations are fundamental to deep learning.  And you should go through arguments of why people think neural networks are worthwhile in the first place.   Reading classic papers from Wiesel and Hubel helps. Understanding the history of neural network helps.  Once you read these materials, you will quickly grasp the big picture of much development of deep learning.

Saying so, I think there are certain topics which are fundamental in deep learning.   They are not necessarily very mathematical.  For example, I will name back propagation [2] as a very fundamental concept which you want to get good at.   Now, you may think that’s silly.    “I know backprop already!”  Yes, backprop is probably in every single machine learning class.  It will easily give you an illusion that you master the material.    But you can always learn more about a fundamental concept.  And back propagation is important theoretically and practically.  You will encounter back propagation either as a user of deep learning tools, a writer of a deep learning framework or an innovator of new algorithm.  So a thorough understanding of backprop is very important, and one course is not enough.

This very long digression finally brings me to the great introductory book Michael Nielson’s Neural Network and Deep Learning (NNDL)    The reason why I think Nielson’s book is important is that it offers an alternative discussion of back propagation as an algorithm.   So I will use the rest of the article to explain why I appreciate the book so much and recommend nearly all beginning or intermediate learners of deep  learning to read it.

First Impression

I first learned about “Neural Network and Deep Learning” (NNDL) from going through Tensorflow’s tutorial.   My first thought is “ah, another blogger tries to cover neural network”. i.e. I didn’t think it was promising.   At that time, there were already plenty of articles about deep learning.  Unfortunately, they often repeat the same topics without bringing anything new.

Synopsis

Don’t make my mistake!  NNDL is a great introductory book which balance theory and practice of deep neural network.    The book has 6 chapters:

  1. Using neural network to recognize digits – the basic of neural network, a basic implementation using python (network.py)
  2. How the backpropagation algorithm works –  various explanation(s) of back propagation
  3. Improving the way neural networks learn – standard improvements of the simple back propagation, another implementation in python (network2.py)
  4. A visual proof that neural nets can compute any function – universal approximation algorithm without the Math, plus fun games which you can approximate function yourself
  5. Why are deep neural networks hard to train?  – practical difficultie of using back propagation, vanishing gradients
  6. Deep Learning  – convolution neural network (CNN), the final implementation based on Theano (network3.py), recent advances in deep learning (circa 2015).

The accompanied python scripts are the gems of the book. network.py and network2.py can run in plain-old python.   You need Theano on network3.py, but I think the strength of the book really lies on network.py and network2.py (Chapter 1 to 3) because if you want to learn CNN, Kaparthy’s lectures probably gives you bang for your buck.

Why I like Nielsen’s Treatment of Back Propagation?

Reading Nielson’s exposition of neural network is the sixth  time I learn about the basic formulation of back propagation [see footnote 3].  So what’s the difference between his treatment and my other reads then?

Forget about my first two reads because I didn’t care enough neural networks enough to know why back propagation is so named.   But my latter reads pretty much give me the same impression of neural network: “a neural network is merely a stacking of logistic functions.    So how do you train the system?  Oh, just differentiate the loss functions, the rest is technicalities.”   Usually the books will guide you to verify certain formulae in the text.   Of course, you will be guided to deduce that “error” is actually “propagating backward” from a network.   Let us call this view network-level view.   In a network-level view, you really don’t care about how individual neurons operate.   All you care is to see neural network as yet another machine learning algorithm.

The problem of network level view is that it doesn’t quite explain a lot of phenomena about back propagation.  Why is it so slow some time?  Why certain initialization schemes matter?  Nielsen does an incredibly good job to break down the standard equations into 4 fundamental equations (BP1 to BP4 in Chapter2).  Once interpret them, you will realize “Oh, saturation is really a big problem in back propagation” and “Oh, of course you have to initialize the weights of neural network with non-zero values.  Or else nothing propagate/back propagate!”    These insights, while not mathematical in nature and can be understood with college calculus, is deeper understanding about back propagation.

Another valuable part about Nielsen’s explanation is that it comes with a accessible implementation.  His first implementation (network.py) is a 74 lines python in idiomatic python.   By adding print statements on his code, you will quickly grasp on a lot of these daunting equations are implemented in practice.  For example, as an exercise, you can try to identify how he implement BP1 to BP4 in network.py.    It’s true that there are books and implementations about neural network,  but the description and implementation don’t always come together.  Nielsen’s presentation is a rare exception.

Other Small Things I Like

  • Nielsen correctly point out the Del symbol in machine learning is more like a convenient device rather than its more usual meaning like the Del operator in Math.
  • In Chapter 4,  Nielson mentioned universal approximation of neural network.  Unlike standard text book which points you to a bunch of papers with daunting math, Nielsen created a javascript which allows you to approximate functions (!), which I think those are great ways to learn intuition behind the theorem.
  • He points out that it’s important to differentiate activation and the weighted input.  In fact,  this point is one thing which can confuse you when reading a derivation of back propagation because textbooks usually use different symbols for activation and weighted input.

There are many of these insightful comments from the book, I encourage you to read and discover them.

Things I don’t like

  • There are many exercises of the book.  Unfortunately, there is no answer keys.  In a way, this make Nielson more an old-style author which encourage readers to think.   I guess this is something I don’t always like because spending time to think of one single problem forever doesn’t always give you better understanding.
  • Chapter 6 gives the final implementation in Theano.  Unfortunately, there is not much introductory material on Theano within the book.    I think this is annoying but forgivable, as Nielson pointed out, it’s harder to introduce Theano and introductory book.  I would think anyone interested in Theano should probably go through the standard Theano’s tutorial at here and here.

Conclusion

All-in-all,  I highly recommend Neural Network and Deep Learning  to any beginning and intermediate learners of deep learning.  If this is the first time you learn back propagation,  NNDL is a great general introductory book.   If you are like me, who already know a thing or two about neural networks, NNDL still have a lot to offer.

Arthur

[1] In my view, PRML’s problem sets have 3 ratings, 1-star, 2-star and 3-star.  1-star usually requires college-level of Calculus and patient manipulation, 2-star requires some creative thoughts in problem solving or knowledge other than basic Calculus.  3-star are more long-form questions and it could contain multiple 2-star questions in one.   For your reference, I solved around 100 out of the 412 questions.  Most of them are 1-star questions.

[2] The other important concept in my mind is gradient descent, and it is still an active research topic.

[3] The 5 reads before “learnt” it once back in HKUST, read it from Mitchell’s book, read it from Duda and Hart, learnt it again from Ng’s lecture, read it again from PRML.  My 7th is to learn from Karparthy’s lecture, he present the material in yet another way.  So it’s worth your time to look at them.

If you like this message, subscribe the Grand Janitor Blog’s RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Categories
deep learning Machine Learning

Facebook Artificial Intelligence/Deep Learning Group @ 1000 Members

I (Arthur) always remember comp.speech and comp.speech.research which I was able to cross path with many great developers/researchers.   Another fond memory of mine related to discussion forum was with CMU Sphinx, a large vocabulary speech recognizer, which many users later become very advanced, and spawned numerous projects.   You always learn something new from people around the world.  That was the reason why Internet is really really great.

Translate to now, wow, searching for a solid discussion forum for deep learning is hard.   Many of them, in Facebook or LinkedIn are really spammy.  I tried Plus for a while, but for the most part no one digs my message. (My writing style? 🙂 )  So when Waikit Lau, an old friend + veteran startup investors/mentor/helper, asked me to help admin the group.  I was more than happy to oblige.

Yes, you hear it right,  Artificial Intelligence & Deep Learning Group is a curated discussion forum,  we rejected spammers, ads and only blog posts which are relevant to us are allowed.

Alright everyone does it, I might as well:
WE ARE 1000 MEMBERS STRONG!
WE ARE 1000 MEMBERS STRONG!
WE ARE 1000 MEMBERS STRONG!

(Just kidding, we are not really chasing for a bigger group, but more quality discussion.)

Some come join us.  We are very happy to chat with you on deep learning.

Arthur and Waikit

You might also like Learning Machine Learning,  Some Personal Experience and Learning Deep Learning, My Top-5 List.

If you like this message, subscribe the Grand Janitor Blog’s RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Categories
deep learning Machine Learning reinforcement learning

Learning Deep Learning – My Top-Five List

Many people have been nagging me to write a beginner guide on deep learning.    Geez, that’s a difficult task – there are so many tutorials, books, lectures to start with, and the best way to start highly depends on your background, knowledge and skill sets.  So it’s very hard to give a simple guideline.

In this post, I will do something less ambitious: I gather what I think is the top-5 most important resources which let you to start to learn deep learning.   Check out the “Philosophy” section on why this list is different from other lists you saw.

Philosophy

There are many lists of resources of deep learning.  To name a few, the “Awesome”  list,  the Reddit machine learning FAQ. I think they are quality resources, and it’s fair to ask why I started “Top-Five” a year ago.

Unlike all the deep learning resource list you saw, “Top-Five” is not meant to be an exhaustive list.  Rather it assumes you have only limited amount of time to study and gather resources while learning deep learning.    For example, suppose you like to learn through on-line classes.  Each machine/deep learning class would likely take you 3 months to finish. It will take you a year to finish all the classes.   As a result, having a priority is good.  For instance, without any guidance, reading Goodfellow’s Deep Learning would confuse you.   A book such as Bishop’s Pattern Recognition and Machine Learning (PRML) would likely be a better “introductory book”.

Another difference between Top-Five list and other resource list is that the resource are curated. Unless specified, I have either finished the material myself.  So for classes I have at least audit the whole lecture once.  For books I probably browse it once. In a way,  this is more an “Arthur’s list”, rather than some disorganized links.  You also see a short commentary why (IMO) they are useful.

Which Top-Five?

As the number of sections in my list grow, it’s fair to ask what resources should you spend time on first.   That’s a tough question because humans differ in their preference of learning.  My suggestion is start from the following,

  1. Taking classes – by far I think it is the most effective way to learn.  Listening+doing homework usually teach you a lot.
  2. Book Reading – this is important because usually lectures only summarize a subject.   Only when you read through a certain subject, you start to get deeper understanding.
  3. Playing with Frameworks – This allows you to actually create some deep learning applications, and turn some your knowledge in real-life
  4. Blog Reading – this is useful but you better know which blogs to read (Look at the section “Blogs You Want To Read”).  In general, there are just too many blog writers these days, and they might only have murky understanding of the topic.   Reading those would only make you feel more confused.
  5. Joining Forums and ask questions – this is where you can dish out some of your ideas and ask for comments.  Once again, the quality of the forum matters.   So take a look of the section “Facebook Forums”.

Lectures/Courses

Basic Deep Learning (Also check out “The Basic-Five“)

This are more the must-take courses if you want to learn the basic jargons of deep learning.   Ng’s, Karparthy’s and Socher’s class teach you basic concepts but they have a theme of building applications.   Silver’s class link deep learning concepts with reinforcement learning. So after these 4 classes, you should be able to talk deep learning well and work with some basic applications.

  1. Andrew Ng’s Coursera Machine Learning class
    • You need to walk before you run.   Ng’s class is the best beginner class on machine learning in my opinion.  Check out this page for my review.
  2. Andrew Ng’s deeplearning.ai Specialization
  3. Fei-Fei Li and Andrew Karpathy’s Computer Vision class (Stanford cs231n 2015/2016)
    • I listen through the lectures once.  Many people just call this a Karpathy’s class, but it is also co-taught by another experienced graduate student, Justin Johnson.  For the most part this is the class for learning CNN,  it also brings you to the latest technology of more difficult topics such as image localization, detection and segmentation.
  4. Richard Socher’s Deep Learning and Natural Language Processing (Standard cs224d)
    • I listen to the whole lecture once, the first few lectures were very useful for me when I tried to understand RNN and LSTM.   This might also be the best set of lecture to learn Socher’s recursive neural network. Compare to Karpathy’s class, Socher’s place more emphasis on mathematical derivation.  So if you are not familiar with matrix differentiation, this would be a good class to start with and get your hands wet.
  5. David Silver’s Reinforcement Learning
    • This is a great class taught by the main programmer of AlphaGo.  It starts from the basic of reinforcement learning such as DP-based method, then proceeds to more difficult topic such as Monte-Carlo and TD method, as well as function approximation and policy gradient.   It takes quite a bit of understanding even if you already background of supervised learning.   As RL is being used more and more applications, this class should be a must-take for all of you.

You should also consider:

  • Fast.ai‘s Deep Learning for Coders
    • a class which has generally good review.  I would suggest you read Arvind Nagaraj’s post which compare deeplearning.ai and fast.ai.
  • Theories of Deep Learning
    • Or Stanford Stat 385, which is one of the theory class of deep learning.
  • Hugo Larochelle’s Neural Network class
    • by another star-level innovator of the field.  I only heard Larochelle’s lecture in a deep learning class, but he is succinct and to the point than many.
  • MIT Self Driving 6.S094
    • See the description in the session of Reinforcement Learning.
  • Nando de Freita’s class on Machine/Deep Learning
    • I don’t have a chance to go through this one, but it is both for beginner and more advanced learners.  It covers topics such as reinforcement learning and siamese network.    I also think this is the class if you want to use Torch as your deep learning language.
Intermediate Deep/Machine Learning

The intermediate courses are meant to be the more difficult sets of classes.  They are much more difficult to finish – Math is necessary. There are also many confusing concepts even if you already have Master.

  1. Hinton’s Neural Network Machine Learning
    •  While the topics are advanced, Prof. Hinton’s class is probably the one which can teach you the most on the philosophical difference between deep learning and general machine learning.  The first time I audit the class in 2016 October, his explanation on models based on statistical mechanical model blew my mind.   I finished the course around 2017 April, which results in a popular review post. Unfortunately, due to the difficulty of the class, it was ranked lower in this list. (It was ranked 2nd, then 4th on the Basic Five, but I found that it requires deeper understanding than the Karparthy’s, Socher’s and Silver’s.  Later on when deeplearning.ai comes up, I shift Prof Hinton’s course to one of the Intermediate classes. )
  2. Daphne Koller’s Probabilistic Graphical Model
    • if you want to understand tougher concepts in models such as DBN, you want to have some background in Bayesian network as well.  If that’s the route you like, Koller’s class is for you.  But this class, just like Hinton’s NNML, is notoriously difficult and not for faint of heart – you will be challenged on probability concepts (Course 1), graph theory and algorithm (Course ) and parameter estimation (Course 3).
Reinforcement Learning

Reinforcement learning has deep history by itself and you can think it has the heritage from both computer science and electrical engineering.

My understanding of RL is fairly shallow so I can only tell you which are the easier class to take, but all of these classes are more advanced. Georgia Tech CS8803 should probably be your first. Silvers’ is fun, and it’s based on Sutton’s book, but be ready to read the book in order to finish some of the exercises.

  1. Udacity’s Reinforcement Learning 
    • This is a class which is jointly published by Georgia Tech and you can take it as an advanced course CS8803.  I took Silver’s class first, but I found the material this class provides a non-deep learning take and quite refreshing if you start out at reinforcement learning.
  2. David Silver’s Reinforcement Learning
    • See description in the “Introductory Deep Learning” section.
  3. MIT Self Driving 6.S094
    • A specialized class in self-driving.  The course is mostly computer vision, but there is one super-entertaining exercise on self driving, which mostly likely you want to use RL to solve the problem. (Here is some quick impression about the class.)

You should also consider:

I heard good things about them……
  • Oxford Deep NLP 2017 
    • This is perhaps the second class of deep learning on NLP. I found the material interesting because it covers material which wasn’t covered by the Socher’s class.  I haven’t takem it yet.  So I will comment later.
  • Statistical Computing by Nicholas Zabara  
    • looks super interesting and most material are actually ML-based.
  • CMU CS11-747 Neural Networks and NLP
    •  A great sets of lecture by Graham Neubig. Neubig has written few useful tutorial on DL in NLP.  So I add his as more promising candidate here as well.
  • NYU Deep Learning class at 2014
    • Prof. Yann LeCun.  To me this is an important class, with similar importance as Prof. Hinton’s class.  Mostly because Prof. LeCun is one of the earliest experimenters on BackProp and SGD.  Unfortunately these NYU’s lecture was removed.   But do check out the slides though.
  • Also from Prof. Yann LeCun, Deep Learning inaugural lectures.
  • Berkely’s Seminar on Deep Learning: by Prof.  Ruslan Salakhutdinov, an early researcher on unsupervised learning.
  • University of Amsterdam Deep Learning
    • If you have already audit cs231n and cs224d, perhaps the material here is not too new, but I found it useful to have a second source when I look at some of the material.   I also like the presentation of back-propagation, which is more mathematical than most beginner class.
  • Special Topics in Deep Learning
    • I found it great resource if you want to drill on more exoteric topics in deep learning.
  • Deep Learning for Speech and Language
    • of my own curiosity on speech recognition. This course is perhaps is the only one I can find on DL on ASR.   If you happen to stumble this paragraph, I’d say most software you find on-line are not really too applicable in real-life.  The only exceptions are discussed in this very old article of mine.
For reference
Great Preliminaries
More on Basic Machine Learning (Unsorted)
More AI than Machine Learning (Unsorted)
More about the Brain:

I don’t have much, but you can take a look of my another list on Neuroscience MOOCs.

Books

I wrote quite a bit on the Recommended Books Page.   In a nutshell,  I found that classics such as PRML and Duda and Hart are still must-reads in the world of deep learning.   But if you still want a list, alright then……

  1. Michael Nielson’s Deep Learning Book: or NNDL,  highly recommended by many.  This book is very suitable for beginners who want to understand the basic insights of simple feed forward networks and their setups.    Unlike most text books, it doesn’t quite go through the Math until it gives you some intuition.   While I only went through recently, I highly recommend all of you to read it.  Also see my read on the book.
  2. PRML : I love PRML!  Do go to read my Recommended Books Page to find out why.
  3. Duda and Hart:  I don’t like it as much as PRML, but it’s my first machine learning Bible.  Again, go to my Recommended Books Page to find out why.
  4. The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville:  This is the book for deep learning, but it’s hardly for beginner.   I recently browse through the book.  Here is some quick impression.
  5. Natural Language Understanding with Distributed Representation by Kyung Hyun Cho.   This is mainly for NLP people, but it’s important to note how different that NLP is seen from a deep learning point of view.

Others: Check out my Recommended Books Page.  For beginner, I found Mitchell’s and Domingo’s books are quite interesting.

Frameworks

  1. Tensorflow : most popular, and could be daunting to install, also check out TFLearn.  Keras became the de-facto high-level layer lately.
  2. Torch :  very easy to use even if you don’t know Lua.   It also leads you to great tutorials.  Also check out PyTorch.
  3. Theano : grandfather of deep learning frameworks, also check out Lasagne.
  4. Caffe : probably the fastest among the generic frameworks.  It takes you a while to understand the setup/syntax.
  5. Neon : the very speedy neon, it’s optimized on modern cards. I don’t have a benchmarking between caffe and neon yet, but its MNIST training feels very fast.

Others:

  • deeplearning4j: obviously in java, but I heard there are great support on enterprise machine learning.

Tutorials

  1. Theano Tutorial:  a great sets of tutorials and you can run it from CPU.
  2. Tensorflow Tutorial : a very comprehensive sets of tutorial.  I don’t like it as much as Theano’s because some tasks require compilation, which could be fairly painful.
  3. char-rnn:  not exactly a tutorial but if you want to have fun with deep learning.  You should train at least one char-rnn.   Note that word-based version is available.  The package is also optimized now as torch-rnn.  I think char-rnn is also a great starting code for intermediate learners to learn Torch.
  4. Misc: generally running the examples of a package can teach you a lot.  Let’s say this is one item.

Others: I also found Learning Guide from YeravaNN’s lab to be fairly impressive.  There is ranked resource list on several different topics, which is similar to the spirit of my list.

Mailing Lists

  1. (Shameless Plug) AIDL Weekly  Curated by me and Waikit Lau, AIDL weekly is a tied-in newsletter of the AIDL Facebook group. We provide in-depth analysis of weekly events of AI and deep learning.
  2. Mapping Babel Curated by Jack Clark.  I found it entertaining and well-curated.  Clark is more in the journalism space and I found his commentary thoughtful.
  3. Data Machina This is a link only letter.  The links are quite quality.

Of course, there are more newsletter than these three.  But I don’t normally recommend them.   One reason is many “curators” don’t always read the original sources before they share the links, which sometimes inadvertently spread faked news to the public.   In Issue #4 of AIDL Weekly, I described one of such incidences.  So you are warned!

Facebook Forums

That’s another category I am going to plug shamelessly.  It has to do with most Facebook forums have too much noise and administrator pay too little attention to the group.

  1. (Shameless Plug) AIDL This is a forum curated by me and Waikit.  We like our forum because we actively curate it, delete spam and facilitate discussion within the group.  As a result it become one of the most active group.  It has 10k+ members.  As of this writing, we have a tied-in mailing list as well as a weekly show.
  2. Deep Learning  Deep Learning has comparable size as AIDL, but less active, perhaps because the administrators use Korean.  I still find some of the links interesting and use the group a lot before  administering AIDL.
  3. Deep Learning/AI Curated by Sid Dharth and Ish Girwan.  DLAI follows very similar philosophy and Sid control posting tightly.  I think his group will be one of the up-and-coming group next year.
  4. Strong Artificial Intelligence  This is less about deep learning, but more on AI.   It is perhaps the biggest FB group on AI, its membership stabilized but posting is solid and there are still some life in discussion. I like the more philosophical ends of the posts which AIDL usually refrained from.

Non-trivial Mathematics You should Know

Due to popular demand,  this section is what I would say a bit on the most relevant Math which you need to know.   Everyone knows that Math is useful, and yes, stuffs like Calculus, Linear Algebra, Probability and Statistics are super useful too.  But then I think they are too general, so I will name several specific topics which turns out to be very useful, but not very well taught in school.

  1. Bayes’  Theorem:  Bayes’ theorem is important not only as a simple rule which you will use it all the time.   The high school version usually just ask you to reverse the end of probabilities. But once it is apply in reasoning, you will need to be very clear how to interpret terms such as likelihood and priors. It’s also very important what the term Bayesian really means, and why people see it as better than frequentist.   All these thinking if you don’t know Bayes’ rules, you are going to get very confused.
  2. Properties of Multi-variate Gaussian Distribution:  The one-dimensional Gaussian distribution is an interesting mathematical quantity.  If you try to integrate it, it will be one of the integrals you quickly you can’t integrate it in trivial way.   That’s the point you want to learn the probability integral and how it was integrated.   Of course, once you need to work on multi-variate Gaussian, then you will need to learn further properties such as diagonalizing the covariance matrix and all the jazz.   Those are non-trivial Math.   But if you master them, it will helps you work through more difficult problems in PRML.
  3. Matrix differentiation : You can differentiate all right, but once it comes to vector/matrix, even the notation seems to be different from your college Calculus.  No doubt, matrix differentiation is seldom taught in school.   So always refer to useful guide such as Matrix Cook Book, then you will be less confused.  (Matrix reference manual is also good. )
  4. Calculus of Variation: If you want to find the best value which optimize a function you use Calculus, if you want to find the best function/path which optimize a functional, you use Calculus of Variation. For the most part, Euler-Langrange equation is what you need.
  5. Information theory:  information theory is widely used in machine learning.  More importantly the reasoning and thinking can be found everywhere.  e.g. Why do you want to optimize cross-entropy, instead of square error?  Not only square error over-penalize incorrect outputs.  You can also think of cross-entropy is learning from the surprise of a mistake.

Blogs You Should Read

  1. Chris Olah’s Blog  Olah has great capability to express very difficult mathematical concepts to lay audience.   I greatly benefit from his articles on LSTM and computational graph.   He also makes me understand learning topology is fun and profitable.
  2. Andrew Karparthy’s Blog  If you hadn’t read “The Unreasonable Effectiveness of Recurrent Neural Networks“, you should.   Karparthy’s articles show both great enthusiasm on the topic and very good grasp on the principle.    I also like his article on reinforcement learning.
  3. WildML Written by Danny Britz,  he is perhaps less well-known than either Olah or Karparthy, but he enunciate many topics well. For example, I enjoy his explanation on GRU/LSTM a lot.
  4. Tombone’s Computer Vision Blog Written by Tomasz Malisiewicz.  This is the first few blogs I read about computer vision, Malisiewicz has great insight on machine learning algorithms and computer vision.   Many of his articles give insightful comments on relationship between ML techniques.
  5. The Spectactor written by Shakir Mohamad.  This is my goto page on mathematical statistics as well as theoretial basis of  deep learning techniques.  Check out his thought on what make a ML technique deep, as well as his tricks in machine learning.

That’s it for now. Check out this page and I might update with more contents. Arthur

This post is first published at http://thegrandjanitor.com/2016/08/15/learning-deep-learning-my-top-five-resource/.

You might also like Learning Machine Learning,  Some Personal Experience.

If you like this message, subscribe the Grand Janitor Blog’s RSS feed.  You can also find me at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

 

(20160817): I change the title couple of times, because this is more like a top-5 list of a list. So I retitled the post as “top-five resource”, “top-five”, now I settled to use “top-five list”, which is a misnomer but close enough.

(20160817): Fixed couple of typos/wording issues.

(20160824): Add a section on important Math to learn.

(20160826): Fixed Typos, etc.

(20160904): Fixed Typos

(20161002): Changed the section on books to link to my article on NNDL.   Added a section on must-follow blogs.

(20170128): As I go deep on Socher’s lectures, I boost up his class ranking to number 3.  I also made Karparthay’s lecture into rank number 2. I think Silver’s class is important but the material is too advanced, and perhaps less of importance for deep learning learners.  (It is more about reinforcement learning when you look at it closely.)  Hinton’s class is absolutely crucial but it requires more mathematical understanding than Karparthay’s class.  Thus the ranking.

I also 2 more classes (NYU, MIT)  to check out and 2 more as references (VTech and UA).

(20161207): Added descriptions of Li, Karparthy and Johnson’s class,   Added description of Silver’s class.

(20170310): Add “Philosophy”, “Top-Five of Top-Five”, “Top-Five Mailing List”, “Top-Five Forums”.  Adjusted description on Socher’s class, linked a quick impression on GoodFellow’s “Deep Learning”.

(20170312): Add Oxford NLP class, Berkeley’s Deep RL into the mix.

(20170319): Add the Udacity’s course into the mix.  I think next version I might have a separate section on reinforcement learning.

(20170326): I did another rewrite last two weeks mainly because there are many new lectures released during Spring 2017. Here is a summary:

  •  I separate all “Courses/Lectures” session to two tracks: “Basic Deep Learning” and “Reinforcement Learning”. It’s more a decluttering of links. I also believe reinforcement learning should be separate track because it requires more specialized algorithms.
  • On the “Basic Deep Learning” track, ranking has change. It was Ng’s, cs231n, cs224d, Hinton’s, Silver’s, now it becomes Ng’s, cs231n, cs224d, Silvers’s, Hinton’s. As I go deep into Hinton’s class, I found that it has more difficult concepts. Both Silver’s and Hinton’s class are more difficult than the first 3 IMO.
  • I also gives some basic description on the U. of Amsterdam’s class. I don’t know much about it yet, but it’s refreshing because it gives different presentation from the “Basic 5” I recommend.

(20170412): I finished Hinton’s NNML, added Berkley CS294-131 into the mix.

(20170620): Links up “Top-5” List with “Basic 5”.  Added a list of AI, added link to my MOOC list.

(20170816): Added deeplearning.ai into Basic 5.  It becomes the new official recommendation to AIDL newcomers.

(20171126): Added several ML classes. Added Stats 385 into the considered list.

Appendix:
Links to process: http://ai.berkeley.edu/lecture_videos.html

Categories
deep learning Machine Learning Math Programming Thought

How To Get Better At X (X = Programming, Math, etc ) ……

Here are some of my reflections on how to improve at work.

So how would you get better at X?

X = Programming

  • Trace code of smart programmers, learn their tricks,
  • Learn how to navigate codebase using your favorite editors,
  • Learn algorithm better, learn math better,
  • Join an open source project,  first contribute, then see if you can maintain,
  • Always be open to learn a new language.

X = Machine Learning

X = Reading Literature

  • Read everyday, make it a thing.
  • Browse arxiv‘s summary as if it more than daily news.
  • Ask questions on social networks, Plus or Twitter, listen to other people,
  • Teach people a concept, it makes you consolidate your thought and help you realize something you don’t really know something.

X = Unix Administration

  • Google is your friend.
  • Listen to experienced administrator, their perspective can be very different – e.g. admin usually care about security more than you.   Listen to them and think whether your solution incorporate their thought.
  • Every time you solve a problem, put it in a notebook.  (Something which Tadashi Yonezaki at Scanscout taught me.)

X = Code Maintenance

  • Understand the code building process, see it as a part of your jobs to learn them intimately,
  • Learn multiple types of build system, learn autoconf, cmake, bazel.  Learn them,  because by knowing them you can start to compile and eventually really hack a codebase.
  • Learn version control, learn GIT.  Don’t say you don’t need one, it would only inhibit your speed.
  • Learn multiple types of version control systems, CVS, SVN, Mercury and GIT.  Learn why some of them are bad (CVS), some of them are better but still bad (SVN).
  • Send out a mail whenever you are making a release, make sure you communicate clearly what you plan to do.

X = Math/Theory

  • Focus on one topic.  For example, I am very interested in machine learning these days, so I am reading Bishops.
  • Don’t be cheap, buy the bibles in the field.  Get Thomas Cover if you are studying information theory.   Read Serge Lang on linear algebra.
  • Solve one problem a day, may be more if you are bored and sick of raising dumbbells.
  • Re-read a formulation of a certain method.  Re-read a proof.   Look up different ways of how people formulate and prove something.
  • Rephrasing Ian Stewart – you always look silly before your supervisor.  But always remember that once you study to the graduate-level, you cannot be too stupid.   So what learning math/theory takes is gumption and perseverance.

X = Business

  • Business has mechanism so don’t dismiss it as fluffy before you learn the details,
  • Listen to your BD, listen to your sales, listen to your marketing friends.   They are your important colleagues and friends

X = Communication

  • Stands on other people shoes, that is to say: be empathetic,
  • I think it’s Atwood said: (rephrase) It’s easy to be empathetic for people in need, but it’s difficult to be empathetic for annoying and difficult people.   Ask yourself these questions,
    • Why would a person became difficult and annoying in the first place?  Do they have a reason?
    • Are you big enough to help these difficult and annoying people?   Even if they could be toxic?
  • That said, communication is a two-way street, there are indeed hopeless situation.  Take it in stride, spend your time to help friends/colleagues who are in need.

X = Anything

Learning is a life-long process, so be humble and ready to be humbled.

Arthur

 

 

 

Categories
deep learning Machine Learning

Learning Machine Learning – Some Personal Experience

Introduction

Some context: a good friend of mine, Waikit Lau, starts a facebook group called “Deep Learning“.  It is a gathering place of many deep learning enthusiasts around the globe.  And so far it is almost 400 members strong.   Waikit kindly gave me the admin right of the group; I was able to interact with all members since, and had a lot of fun.

When asked “Which topic do you like to see in “Deep Learning”?”, surprisingly enough, “Learning Deep Learning” is the topic most members would like to see more.   So I decided to write a post, summarizing my own experience of learning deep learning, and machine learning in general.

My Background

Not every one could predict the advent of deep learning, neither do I.  I was trained as a specialist in automatic speech recognition (ASR), with half of the time focusing on research (at HKUST, CMU, BBN), the other half on implementation (Speechworks, CMUSphinx).   That reflects in my current role, Principal Speech Architect, which my research-to-implementation is around 50-50.    If you are being nice to me, you can say I was quite familiar with standard modeling in speech recognition,  with passable programming skills.  Perhaps what I gain from ASR, is more an understanding in languages and linguistics, which I would described as cool party tricks.  But real-life speech recognition only use little linguistic [1].

To be frank though, while ASR used a lot of machine learning techniques such as GMM, HMM, n-grams, my skills in general machine learning were clearly lacking.   For a while, I didn’t have an acute sense of dangerous issues such as over- and under-fitting, nor I would able to foresee the rise of deep neural network in so many different fields.    So when my colleagues start to tell me, “Arthur, you got to check out this Microsoft’s work using deep neural network!” I was mostly suspicious at the time and couldn’t really fathom its importance.   Obviously I was too specialized in ASR – if I had ever give a deeper thought on “universal approximation theorem“,  the rise of DNN would make a lot of sense to me.  I can only blame myself for my ignorance.

That is a long digression.  So long story short: I woke up about 4 years ago and said “screw it!” I decided to “empty my cup” and learn again.   I decided to learn everything I can learn on neural networks, and in general machine learning again.  So this article is about some of the lessons I learn.

Learning The Jargons

If you are an absolute beginner,  the best way to start is to take a good on-line class.   For example Andrew Ng’s machine learning class   (my review) would be a very good place to start.   Because Ng’s class is generally known to be gentle to beginners.

Ideally you want to finish the whole course,  from there you will be able to have some basic understanding on what you are doing.  For example, you want to know that “Oh, if I want to make a classifier, I need a train set and a test set; And it’s absolutely wrong that they are the same”.   Now this is a rather deep thought, and actually there are people I know just take short cut and use the training set as the test set.  (Bear in mind, they or their love ones suffer eventually. 🙂 )    If you don’t know anything about machine learning, learning how to setup data set is the absolute minimum you want to learn.

You would also want to know some basic machine learning methods such as linear regression, logistic regression and decision tree.   Most method you will use in practice require these techniques as building blocks.  e.g.  If you don’t really know logistic regression, understanding neural network would be much tougher.   If you don’t understand linear classifier, understand support vector machine would be tough too.  If you have know idea what decision tree, no doubt you will confuse about random forest.

Learning basic classifiers also equipped you with intuitive understanding of core algorithms,  e.g. you will need to know stochastic gradient descent (SGD) for many things you do in DNN.

Once you go through first class, then there are two things you want to do: one is to actually work on a machine learning problem, the other is to learn more about certain techniques.  So let me split them into two sections:

How To Work On Actual Machine Learning Problems

Where Are The Problems?

If you are still in school and specialize in machine learning, chances you are funded by agency.   So more than likely you already have a task.   My suggestion for you is try to learn up your own problem as much as you can, and make sure you master all the latest techniques first, because that will help your daily job and career.

On the other hand, what if you were not major in machine learning?  For example, what if you were an experienced programmer in the first place, and now shift your attention to machine learning?  The simple answer for that is Kaggle.  Kaggle is a multi-purpose venue where you can learn and compete in machine learning.  You will also start from basic tasks such as MNIST or CIFAR-10 to first hone your skill.

Another good source of basic machine learning tasks, are tutorials of machine learning toolkits.  For example,  Theano’s deeplearning.net tutorial is my first taste on MNIST,  from there I also follow the tutorial to train up the IMDB sentiment classifier and well as polyphonic music generator.

My only criticism to Kaggle is that it lacks of the most challenging problem you can find in the field.   e.g. At the time when imagenet was not yet solved, I would hope a large scale computer vision would be hold at Kaggle.   And now when machine reading is the most acute problem, I would hope that there are tasks which every one in the world would try to tackle.

If you have my concerns, then consider other evaluations sources.  In your field, there got to be a competition or two holding every years. Join them, and make sure you gain experience from these competitions.  By far, I think it is the fastest way to learn.

Practical Matter 1 – Linux Skills

For the most part, what I found tripping many beginners are linux skills, especially software installation.    For that I would recommend you to use Ubuntu.   Many machine learning software can be installed by simple apt-get.   If you are into python, try out anaconda python, because it will save you a lot of time in software installation.

Also remember that Google is your friend.  Before you feel frustrated about a certain glitch and give up, always turn to google, paste your error message, to see if you find an answer.  Ask forums if you still can’t resolve your issue.   Remember, working on machine learning requires you to have certain problem-solving skill.  So don’t feel deter by small things.

Oh you ask what if you are using windows? Nah, switch to Linux, a majority of machine learning tools ran in Linux anyway.   Many people would also recommend Docker.   So far I heard both good and bad things about it.  So I can’t say if I like it or not.

Practical Matter 2 – Machines

Another showstopper for many people is compute.   I will say though if you are a learner,  the computational requirement can be just a simple dual-core desktop with no GPU cards.   Remember, a lot of powerful machine learning tools are developed before GPU card became trendy.   e.g. libsvm is mostly a CPU-based software and all Theano’s tutorial can be completed within a week with a decent CPU-only machine.  (I know because I did that before.)

On the other hand, if you have to do a moderate size task.  Then you should buy a decent GPU card,  a GTX980 would be a choice consumer card, for a more supported workstation grade card, Quadro series would be nice.    Of course, if you can come up with 5k, then go for a Tesla K40 or K80.   The GPU card you use directly affect your productivity.   If you know how to build a computer, consider to DIY one.  Tim Dettmer has couple of articles (e.g. here) on how to build a decent machine for deep learning.    Though you might never reach the performance of a 8-GPU card monster, you will be able to test with pleasure on all standard techniques including DNN, CNN and LSTM.

Once You Have a Taste

For the most part, your first few tasks will teach you quite a lot of machine learning.   Then the next problem you will encounter is how do you progressively improve your classifier performance.  I will address that next.

How To Learn Different Machine Learning Methods

As you might already know, there are many ways to learning machine learning.  Some will approach it mathematically and try to come up with an analysis of how a machine technique works.  That’s what you will learn when you go through school training, i.e. say a 2-3 year master program, or the first 3-4 year of a PhD program.

I don’t think that type of learning has anything wrong.  But machine learning is also a discipline which requires real-life experimental data to confirm your theoretical knowledge.  An overly theoretical approach would sometimes hurt your learning.   That said, you will need both practical and theoretical understanding to work well in practice.

So what should you do?  I will say machine learning should be learned through 3 aspects, they are

  1. Running the Program,
  2. Hacking the Source Code,
  3. Learning the Math (i.e. Theory).

Running the Program – A Thinking Man Guide

In my view, by far the most important skill in machine learning is to run a certain technique.    Why?  Wouldn’t that the theory is important too?  Why don’t we go to first derive an algorithm from the first principle, and then write our own program?

In practice, I found that starting that a top-down approach, i.e. go from theory to implementation, can work.   But most of the time, you will easily pigeonhole yourself into certain technique, and couldn’t quite see the big picture of the field.

Another flaw of the top-down approach is that it assumes you would understand more from just the principle.   In practice, you might need to deal with multiple types of classifiers at work, and it’s hard to understand their principle in a timely manner.    Besides, having practical experience of running will teach you aspects of the technique.   For example, have you run libsvm on a million data point, with each vector in the dimension of a thousand?   Then you will notice that type of algorithm to find support vectors makes a huge difference.   You will also appreciate why many practitioners from big companies would suggest beginners to learn random forest soon, because in practice random forest is the faster and more scalable solution.

Let me sort of bite my tongue: While it is meant to be a practice, at this stage, you should try very hard to feel and understand a certain technique.    If you are new, this is also a stage where you should ask if general principle such as bias vs variance work in your domain.

What is the mistake you can make while using a technique for beginners?    I think the biggest is you decide to run certain things without thinking why, that’s detrimental to your career.    For example, many people would read a paper, pick up all techniques the author used, then rush to rerun all these experiments themselves.    While this is usually what people do in evaluation/competition, it is a big mistake in real industrial scenario.   You should always think about if a technique would work for you – “Is it accurate but too slow?”,  “Is its performance good but takes up too much memory?”,  “Are there any good integration route which fits to our existing codebase?”   Those are all tough questions you should answer in practice.

I hope you get an impression from me that being practical in machine learning requires a lot of thinking too.   Only when you master this aspect of knowledge, then you are ready to take up more difficult parts of our work, i.e.  changing the code, algorithm and even the theory itself.

Hacking the Source Code

I believe the more difficult task after you successfully run an experiment, is to change the algorithm itself.   Mastery of using a program perhaps ties to your general skills in Linux.   Whereas mastery of source code would tie to your coding skills in lower-level language such as C/C++/Java.

Making the source code works require you the capability to read and understand a source code base,  a valuable skill in practice.     Reading a code base requires a more specialized type of reading – you want to keep notes of a source file, make sure you understand each of the function calls, which could go many levels deep.   gdb is your friend, and your reading session should be based on both gdb and eye-balling the source code.  Setting conditional break points and display important variables.   These are the tricks.  And at the end, make sure you can spell out the big picture of the program – What does it do?  What algorithm does it implement?  Where is the important source files?   And more importantly, if I was the one who wrote the program, how would I write it?

What I said so far applies for all types of programs, for machine learning, this is a stage you should focus on just the algorithm.  e.g.  you can easily implement SGD of linear regression without understanding the math.    So why would you want to decouple math out of the process then?    The reason is that there are always multiple implementations for a same technique and each implementation can be based on slightly different theories.    Once again, chasing down the theory would take you too much time.

And do not underestimate the work required to learn the Math behind even the simplest technique in the field.   Consider just linear regression,  and consider how people have thought about it as 1) optimizing the squared loss, 2) as a maximum likelihood problem [2],  then you will notice it is not a simple topic as you learned in Ng’s class.   While I love the Math, would not knowing the Math affect your daily work? Not in most circumstances.    On the other hand, that will be situations you want to just focus on implementations.    That’s why decoupling theory and practice is a good thinking.

Learning The Math and The Theory

That brings us to our final stage of learning – the theory of machine learning.  Man, this is such a tough thing to learn, and I don’t really do it well myself.   But I can share you some of my experience.

First thing first, as I am an advocate of bottom-up learning in machine learning, why would we want to learn any theory at all?

In my view, there are several use of theory,

  1. Simplify your practice: e.g. knowing direct method of linear regression would save you a lot of typing when implementing one using SGD.
  2. Identify BS: e.g.  You have a data set with two classes with prior 0.999:0.001, your colleague has created a classifier with 99.8% accuracy and decide he has done his job.
  3. Identify redundant idea:  someone in marketing and sales ask why can’t we create more data point by squaring every elements of the data point.  You should know how to answer, “That is just polynomial regression.”
  4. Have fun with theory and the underlying mathematics,
  5. Think of a new idea
  6. Brag before your colleagues and show how smart you are. 

(There is no 6.  Don’t try to understand theory because you want to brag.  And for that matter, stop bragging.)

So now we establish theory can be useful.  How do you learn it?   By far I think the most important means are to listen to good lectures, reading papers, and actually do the math,

With lectures, you goal is to gather insight from experienced people.  So I would recommend the Ng’s class as the first class, then Hinton’s Neural Networks For Machine Learning.  I also heard Koller’s class on Graphical Models are good.  If you understand Mandarin,  H. T. Lin’s classes on support vector machine are perhaps the best.

On papers, subscribe to arxiv.org today, get an RSS feed for yourself, read at least the headlines daily to learn what’s new.   That’s where I first learn many of the important concepts last few years: LSTM, LSTM with attention, highway networks etc.

If you are new, check out the “Awesome resources”, like Awesome Deep Learning, that’s where you find all basic papers to read.

And eventually you will find that just listening to lecture and reading papers don’t explain enough, this is the moment you should go to the “Bible”.   When I say Bible, we are really talking about 7-8 textbook which are known to be good in the field:

If you have to start with one book, consider either Pattern Classification by Duda and Hart or  Patten Recognition and Machine Learning (PRML) by C. M.  Bishop.   (Those are the only I read deep as well.) In my view, the former is suitable for a 3rd year undergraduate or graduate students to tackle.  There are many computer exercises, so you will enjoy a lot in both math problem solving and programming.  PRML is more for advanced graduates, like a PhD.   PRML is known to be more Bayesian,  in a way, it’s more modern.

And do the Math, especially for the first few chapters, where you would be frustrated by more advanced calculus problems.   Noted though, both Duda and Hard, and PRML’s exercises are guided.  Try to spread out this kind of Math exercise overtime, for example, I try to spend 20-30 minutes to tackle one problem in PRML a day.  Write down all of your solutions and attempts in a note book.  You will be greatly benefited from this effort.    You will gain valuable insights of different techniques: their theory, their motivations, their implementations as well as their notable variants.

Finally, if you have tough time on the Math, don’t stay on the same problem all the time.   If you can’t solve a problem after a week, look it up on google, or go to standard text such as Solved Problems in Analysis.  There is no shame of looking up the answers if you had tried.

Conclusion

No one can hit the ground running and train a Google’s “convolutional LSTM” on 80000 hours of data in one day.   Nor one can think of the very smart idea of using multiplier in a RNN. (i.e. LSTM),  using attention to do sequence-to-sequence learning, or reformulating neural network such that a very deep one is trainable.  It is hard to understand the fundamentals of concepts such as LSTM or CNN, not to say to innovate on them.

But you got start somewhere, in this article I tell you my story of how I started and restarted this learning process.   I hope you can join me in learning.   Just like all of you, I am looking forward to see what deep learning will bring to humanity.   And rest assure, you and I will enjoy the future more because we understand more behind the scene.

You might also like Learning Deep Learning – My Top Five List.

Arthur

 

[1]  As Fred Jelinek said “Every time I fire a linguist, the performance of our speech recognition system goes up.(https://en.wikiquote.org/wiki/Fred_Jelinek)

Categories
Uncategorized

Some Thoughts on Learning Machine Learning/Data Science

I have been refreshing myself on various aspects of machine learning and data science.  For the most part it has been a very nice experience.   What I like most is that I finally able to grok many machine learning jargons people talk about.    It gave me a lot of trouble even as merely a practitioner of machine learning.  Because most people just assume you have some understanding of what they mean.

Here is a little secret: all these jargons can be very shallow to very deep.  For instance, “lasso” just mean setting the regularization terms with exponent 1.   I always think it’s just people don’t want to say the mouthful: “Set the regularization term to 1”, so they come up with lasso.

Then there is bias-variance trade off.   Now here is a concept which is very hard to explain well.    What opens my mind is what Andrew Ng said in his Coursera lecture, “just forget the term bias and variance”.  Then he moves on to talk about over and under-fitting.  That’s a much easier to understand concept.   And then he lead you to think.  In the case, when a model underfits, we have an estimator that has “huge bias”,  and when the model overfit, the estimator would allow too much “variance”.   Now that’s a much easier way to understand.   Over and under-fitting can be visualized.   Anyone who understands the polynomial regression would understand what overfitting is.  That easily leads you to have a eureka moment: “Oh, complex models can easily overfit!”   That’s actually the key of understanding the whole phenomenon.

Not only people are getting better to explain different concepts. Several important ideas are enunciated better.  e.g. reproducibility is huge, and it should be huge in machine learning as well.   Yet even now you see junior scientists in entry level ignore all important measures to make sure their work reproducible.   That’s a pity.  In speech recognition, e.g. I remember there was a dark time where training a broadcast news model was so difficult, despite the fact that we know people have done it before.    How much time people waste to repeat other peoples’ work?

Nowadays, perhaps I would just younger scientists to take the John Hopkins’ “Reproducible Research”.  No kidding.  Pay $49 to finish that class.

Anyway, that’s my rambling for today.   Before I go, I have been actively engaged in the Facebook’s Deep Learning group.  It turns out many of the forum uses love to hear more about how to learn deep learning.   Perhaps I will write up more in the future.

Arthur