Category Archives: deep learning

Learning Deep Learning: The "Basic Five" - Five Beginner Classes on Deep Learning

By onojeghuo from Unsplash, CC0

I have been self-learning deep learning for a while, informally from 2013 when I first read Hinton's "Deep Neural Networks for Acoustic Modeling in Speech Recognition" and through Theano, more "formally" from various classes since the 2015 Summer when I got freshly promoted to Principal Speech Architect [5].   It's not an exaggeration that deep learning changed my life and career.   I have been more active than my previous life.  e.g.  If you are reading this, you are probably directed from the very popular Facebook group, AIDL, which I admin.

So this article was written at the time I finished watching an older version on Richard Socher's cs224d on-line [1].  That, together with Ng's, Hinton's, Li and Karpathy's and Silvers's, are the 5 classes I recommended in my now widely-circulated "Learning Deep Learning - My Top-Five List".    I think it's fair to give these sets of classes a name - Basic Five. Because IMO, they are the first fives classes you should go through when you start learning deep learning.

In this post I will say a few words on why I chose these five classes as the Five. Compared to more established bloggers such as Kapathy, Olah or Denny Britz, I am more a learner in the space [2], experienced perhaps, yet still a learner.  So this article and my others usually stress on learning.  What you can learn from these classes? Less talk-about, but as important: what is the limitation of learning on-line?   As a learner, I think these are interesting discussion, so here you go.

What are the Five?

Just to be clear, here is the classes I'd recommend:

  1. Andrew Ng's Coursera Machine Learning - my review,
  2. Fei-Fei Li and Andrew Karpathy's Convolutional Neural Networks for Visual Recognition or Stanford cs231n 2015/2016,
  3. Richard Socher's Deep Learning and Natural Language Processing or Stanford cs224d,
  4. David Silver's Reinforcement Learning,
  5. Hinton's Neural Network and Machine Learning - my review.

And the ranking is the same as I wrote in Top-Five List.  Out of the five, four has official video playlist published on-line for free[6]. With a small fee, you can finish the Ng's and Hinton's class with certification.

How much I actually Went Through the Basic Five

Many beginner articles usually come with gigantic set of links.   The authors usually expect you to click through all of them (and learn through them?) When you scrutinize the list, it could amount to more than 100 hours of video watching, and perhaps up to 200 hours of work.  I don't know about you, but I would suspect if the author really go through the list themselves.

So it's fair for me to first tell you what I've actually done with the Basic Five as of the first writing (May 13, 2017)

CoursesMy Progress
Ng's "Machine Learning"Finished the class in entirety without certification.
Li and Karpathy's "Convolutional Neural Networks for Visual Recognition" or cs231nListened through the class lectures about ~1.5 times. Haven't done any of the homework
Socher's "Deep Learning for Natural Language Processing" or cs224dListened through the class lecture once. Haven't done any of the homework.
Silver's "Reinforcement Learning"Listened through the class lecture 1.5 times. Only worked out few starter problems from Denny Britz's companion exercises.
Hinton's "Neural Network for Machine Learning"Finished the class in entirety with certification. Listen through the class for ~2.5 times.

This table is likely to update as I go deep into a certain class, but it should tell you the limitation of my reviews.  For example,  while I have watched through all the class videos, only on Ng's and Hinton's class I have finished the homework.   That means my understanding on two of the three "Stanford Trinities"[3] is weaker, nor my understanding of reinforcement learning is solid.   Together with my work at Voci, the Hinton's class gives me stronger insight than average commenters on topics such as unsupervised learning.

Why The Basic Five? And Three Millennial Machine Learning Problems

Taking classes is for learning of course.  The five classes certainly give you the basics, and if you love to learn the fundamentals of deep learning. And take a look of footnote [7].  The five are not the only classes I sit through last 1.5 years so their choice is not arbitrary.  So oh yeah. Those are the stuffs you want to learn. Got it? That's my criterion. 🙂

But that's what other one thousand bloggers would tell you as well. I want to give you a more interesting reason.  Here you go:

If you go back in time to the Year 2000.  That was the time Google just launched their search engine, and there was no series of Google products and surely there was no Imagenet. What was the most difficult  problems for machine learning?   I think you would see three of them:

  1. Object classification,
  2. Statistical machine learning,
  3. Speech recognition.

So what's so special about these three problems then?  If you think about that, back in 2000, all three were known to be hard problems.  They represent three seemingly different data structures -

  1. Object classification - 2-dimensional, dense array of data
  2. Statistical machine learning (SMT) - discrete symbols, seemingly related by loose rules human called grammars and translation rules
  3. Automatic speech recognition (ASR)- 1-dimensional time series, has similarity to both object classification (through spectrogram), and loosely bound by rules such as dictionary and word grammar.

And you would recall all three problems have interest from the government, big institutions such as Big Four, and startup companies.  If you master one of them, you can make a living. Moreover, once you learn them well, you can transfer the knowledge into other problems.  For example, handwritten character recognition (HWR) resembles with ASR, and conversational agents work similarly as SMT.  That just has to do with the three problems are great metaphor of many other machine learning problems.

Now, okay, let me tell one more thing: even now, there are people still (or trying to) make a living by solving these three problems. Because I never say they are solved.  e.g. What about we increase the number of classes from 1000 to 5000?  What about instead of Switchboard, we work on conference speech or speech from Youtube? What if I ask you to translate so well that even human cannot distinguish it?  That should convince you, "Ah, if there is one method that could solve all these three problems, learning that method would be a great idea!"

And as you can guess, deep learning is that one method revolutionize all these three fields[4].  Now that's why you want to take the Basic Five.  Basic Five is not meant to make you the top researchers in the field of deep learning, rather it teaches you just the basic.   And at this point of your learning, knowing powerful template of solving problems is important.  You would also find going through Basic Five makes you able to read majority of the deep learning problems these days.

So here's why I chose the Five, Ng's and NNML are the essential basics of deep learning.   Li and Kaparthy's teaches you object classification to the state of the art.  Whereas, Socher would teach you where deep learning is on NLP, it forays into SMT and ASR a little bit, but you have enough to start.

My explanation excludes Silver's reinforcement learning.   That admittedly is the goat from the herd.   I add Silver's class because increasingly RL is used in even traditionally supervised learning task. And of course, to know the place of RL, you need a solid understanding.  Silver's class is perfect for the purpose.

What You Actually Learn

In a way, it also reflect what's really important when learning deep learning.  So I will list out 8 points here, because they are repeated them among different courses.

  1. Basics of machine learning:  this is mostly from Ng's class.  But theme such bias-variance would be repeated in NNML and Silver's class.
  2. Gradient descent: its variants (e.g. ADAM), its alternatives (e.g. second-order method), it's a never-ending study.
  3. Backpropagation: how to view it? As optimizing function, as a computational graph, as flowing of gradient.  Different classes give you different points of view. And don't skip them even if you learn it once.
  4. Architecture: The big three family is DNN, CNN and RNN.  Why some of them emerge and re-emerge in history.  The detail of how they are trained and structured.  None of the courses would teach you everything, but going through the five will teach you enough to survive
  5. Image-specific technique: not just classification, but localization/detection/segmentation (as in cs231n 2016 L8, L13). Not just convolution, but "deconvolution" and why we don't like it is called "deconvolution". 🙂
  6. NLP-specific techniques: word2vec, Glovec, how they were applied in NLP-problem such as sentiment classification
  7. (Advanced) Basics of unsupervised learning; mainly from Hinton's, and mainly about techniques 5 years ago such as RBM, DBN, DBM and autoencoders,  but they are the basics if you want to learn more advanced ideas such as GAN.
  8. (Advanced) Basics of reinforcement learning: mainly from Silver's class, from the DP-based model to Monte-Carlo and TD.

The Limitation of Autodidacts

By the time you finish the Basic Five, and if you genuinely learn something out of them.  Recruiters would start to knock your door. What you think and write about deep learning  would appeal to many people.   Perhaps you start to answer questions on forums? Or you might even write LinkedIn articles which has many Likes.

All good, but be cautious! During my year of administering AIDL, I've seen many people who purportedly took many deep learning class, but upon few minutes of discussion, I can point out holes in their understanding.    Some, after some probing, turned out only take 1 class in entirety.  So they don't really grok deeper concept such as back propagation.   In other words, they could still improve, but they just refuse to.   No wonder, with the hype of deep learning, many smart fellows just choose to start a company or code without really taking time to grok the concepts well.

That's a pity.  And all of us should be aware is that self-learning is limited.  If you decide to take a formal education path, like going to grad schools, most of the time you will sit with people who are as smart as you and willing to point out your issues daily.   So any of your weaknesses will be revealed sooner.

You should also be aware that as deep learning is hyping, your holes of misunderstanding is unlikely to be uncovered.  That has nothing to do with whether you work in a job.   Many companies just want to hire someone to work on a task, and expect you learn while working.

So what should you do then?  I guess my first advice is be humble, be aware of Dunning-Kruger Effect.  Self-learning usually give people an intoxicating feeling that they learn a lot.  But learning a lot doesn't mean you know everything.  There are always higher mountains, you are doing your own disservice to stop learning.

The second thought is you should try out your skill.  e.g. It's one thing to know about CNN, it's another to run a training with Imagenet data.   If you are smart, the former took a day.  For the latter, it took much planning, a powerful machine, and some training to get even Alexnet trained.

My final advice is to talk with people and understand your own limitation.  e.g. After reading many posts on AIDL, I notice that while many people understand object classification well enough, they don't really grasp the basics of object localization/detection.  In fact, I didn't too even after the first parse of the videos.   So what did I do?
I just go through the videos on localization/detection again and again until I understand[8].

After the Basic Five.......

So some of you would ask "What's next?" Yes, you finished all these classes, as if you can't learn any more! Shake that feeling off!  There are tons of things you still want to learn.  So I list out several directions you can go:

  • Completionist: As of the first writing, I still haven't really done all the homework on all five classes, notice that doing homework can really help your understand, so if you are like me, I would suggest you to go back to these homework and test your understanding.
  • Intermediate Five:  You just learn the basics so it's time to learn the next level.   I don't have a concrete ideas of the next 5 classes yet, but for now I would go with Koller's Bayesian Network, Columbia's EdX CSMM 102xBerkeley's Deep Reinforcement LearningUdacity's Reinforcement Learning  and finally Oxford Deep NLP 2017.
  • Drilling the Basics of Machine Learning: So this goes another direction - let's work on your fundamentals.  For that, you can any Math topics forever.  I would say the more important and non-trivial parts perhaps Linear Algebra, Matrix Differentiation and Topology.  Also  check out this very good link on how to learn college-level of Math.
  • Specialize on one field: If you want to master just one single field out of the Three Millennial Machine Learning Problems I mentioned, it's important for you to just keep on looking at specialized classes on computer vision or NLP.   Since I don't want to clutter this point, let's say I will discuss the relevant classes/material in future articles.
  • Writing:  That's what many of you have been doing, and I think it helps further your understanding.  One thing I would suggest is to always write something new and something you want to read yourself.  For example, there are too many blog posts on Computer Vision Using Tensorflow in the world.  So why not write one which is all about what people don't know?  For example, practical transfer learning for object detection.  Or what is deconvolution? Or literature review on some non-trivial architectures such as Mask-RCNN? And compare it with existing decoding-encoding structures.  Writing this kind of articles takes more time, but remember quality trumps quantity.
  • Coding/Githubbing: There is a lot of room for re-implementing ideas from papers and open source them.  It is also a very useful skill as many companies need it to repeat many trendy deep learning techniques.
  • Research:  If you genuinely understand deep learning, you might see many techniques need refinement.  Indeed, currently there is plenty of opportunities to come up with better techniques.   Of course, writing papers in the level of a professional researchers is tough and it's out of my scope.  But only when you can publish, people would give you respect as part of the community.
  • Framework: Hacking in C/C++ level of a framework is not for faint of hearts.  But if you are my type who loves low-level coding, try to come up with a framework yourself could be a great idea to learn more.  e.g. Check out Darknet, which is surprisingly C!


So here you go.  The complete Basic Five, what they are, why they were basic, and how you go from here.   In a way, it's also a summary of what I learned so far from various classes since Jun 2015.   As in my other posts, if I learn more in the future, I would keep this post updated.  Hope this post keep you learning deep learning.

Arthur Chan

[1] Before 2017, there was no coherent set of Socher's class available on-line.  Sadly there was also no legitimate version.  So the version I refer to is a mixture of 2015 and 2016 classes.   Of course, you may find a legitimate 2017 version of cs224n on Youtube.

[2] My genuine expertise is speech recognition, unfortunately that's not a topic I can share much due to IP issue.

[3] "Stanford Trinity" is a term I learned from the AI Playbook List from Andreseen Howoritz's list.

[4] Some of you (e.g. from AIDL) would jump up and say "No way! I thought that NLP wasn't solved by deep learning yet!" That's because you are one lost soul and misinformed by misinformed blog post.  ASR is the first field being tackled by deep learning, and it dated back to 2010.  And most systems you see in SMT are seq2seq based.

[5] I was in the business of speech recognition from 1998 when I worked on voice-activated project for my undergraduate degree back in HKUST.  It was a mess, but that's how I started.

[6] And the last one, you may always search it through youtube.  Of course, it is not legit for me to share it here.

[7] I also audit,

I also took,

[8]  It's still a subject that *I* could explore.  For example, just the logistic seems to be hard enough to setup.

* * *
If you like this message, subscribe the Grand Janitor Blog's RSS feed.  You can also find me at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

* * *


20170513: First version finished

20170514: Fixed many typos.  Rewrite/add some paragraphs.  Ready to publish.


If you like this post, you might also like:

Learning Deep Learning - My Top-Five List

A Review on Hinton's Coursera "Neural Networks and Machine Learning"

For the Not-So-Uninitiated: Review of Ng's Coursera Machine Learning Class

Learning Machine Learning - Some Personal Experience


Links of My Reviews

Since I started to re-learn machine learning.  I wrote several review articles on various classes, books and resources.   Here is a collection of links:

For the Not-So-Uninitiated: Review of Ng's Coursera Machine Learning Class

One Algorithm to rule them all - Reading "The Master Algorithm"

Radev's Coursera Introduction to Natural Language Processing - A Review

Learning Deep Learning - My Top-Five List

Learning Machine Learning - Some Personal Experience

A Review on Hinton's Coursera "Neural Networks and Machine Learning"

Reading Michael Nielsen's "Neural Networks and Deep Learning"


A Review on Hinton's Coursera "Neural Networks and Machine Learning"

Cajal's drawing chick cerebellum cells, from Estructura de los centros nerviosos de las aves, Madrid, 1905

For me, finishing Hinton's deep learning class, or Neural Networks and Machine Learning(NNML) is a long overdue task. As you know, the class was first launched back in 2012. I was not so convinced by deep learning back then. Of course, my mind changed at around 2013, but the class was archived. Not until 2 years later I decided to take Andrew Ng's class on ML, and finally I was able to loop through the Hinton's class once. But only last year October when the class relaunched, I decided to take it again, i.e watch all videos the second times, finish all homework and get passing grades for the course. As you read through my journey, this class is hard.  So some videos I watched it 4-5 times before groking what Hinton said. Some assignments made me takes long walks to think through. Finally I made through all 20 assignments, even bought a certificate for bragging right; It's a refreshing, thought-provoking and satisfying experience.

So this piece is my review on the class, why you should take it and when.  I also discuss one question which has been floating around forums from time to time: Given all these deep learning classes now, is the Hinton's class outdated?   Or is it still the best beginner class? I will chime in on the issue at the end of this review.

The Old Format Is Tough

I admire people who could finish this class in the Coursera's old format.  NNML is well-known to be much harder than Andrew Ng's Machine Learning as multiple reviews said (here, here).  Many of my friends who have PhD cannot quite follow what Hinton said in the last half of the class.

No wonder: at the time when Kapathay reviewed it in 2013, he noted that there was an influx of non-MLers were working on the course. For new-comers, it must be mesmerizing for them to understand topics such as energy-based models, which many people have hard time to follow.   Or what about deep belief network (DBN)? Which people these days still mix up with deep neural network (DNN).  And quite frankly I still don't grok some of the proofs in lecture 15 after going through the course because deep belief networks are difficult material.

The old format only allows 3 trials in quiz, with tight deadlines, and you only have one chance to finish the course.  One homework requires deriving the matrix form of backprop from scratch.  All of these make the class unsuitable for busy individuals (like me).  But more for second to third year graduate students, or even experienced practitioners who have plenty of time (but, who do?).

The New Format Is Easier, but Still Challenging

I took the class last year October, when Coursera had changed most classes to the new format, which allows students to re-take.  [1]  It strips out some difficulty of the task, but it's more suitable for busy people.   That doesn't mean you can go easy on the class : for the most part, you would need to review the lectures, work out the Math, draft pseudocode etc.   The homework requires you to derive backprop is still there.  The upside: you can still have all the fun of deep learning. 🙂 The downside:  you shouldn't expect going through the class without spending 10-15 hours/week.

Why the Class is Challenging -  I: The Math

Unlike Ng's and cs231n, NNML is not too easy for beginners without background in calculus.   The Math is still not too difficult, mostly differentiation with chain rule, intuition on what Hessian is, and more importantly, vector differentiation - but if you never learn it - the class would be over your head.  Take at least Calculus I and II before you join, and know some basic equations from the Matrix Cookbook.

Why the Class is Challenging - II:  Energy-based Models

Another reason why the class is difficult is that last half of the class was all based on so-called energy-based models. i.e. Models such as Hopfield network (HopfieldNet), Boltzmann machine (BM) and restricted Boltzmann machine (RBM).  Even if you are used to the math of supervised learning method such as linear regression, logistic regression or even backprop, Math of RBM can still throw you off.   No wonder: many of these models have their physical origin such as Ising model.  Deep learning research also frequently use ideas from Bayesian networks such as explaining away.  If you have no basic background on either physics or Bayesian networks, you would feel quite confused.

In my case, I spent quite some time to Google and read through relevant literature, that power me through some of the quizzes, but I don't pretend I understand those topics because they can be deep and unintuitive.

Why the Class is Challenging - III: Recurrent Neural Network

If you learn RNN these days, probably from Socher's cs224d or by reading Mikolov's thesis.  LSTM would easily be your only thought on how  to resolve exploding/vanishing gradients in RNN.  Of course, there are other ways: echo state network (ESN) and Hessian-free methods.  They are seldom talked about these days.   Again, their formulation is quite different from your standard methods such as backprop and gradient-descent.  But learning them give you breadth, and make you think if the status quote is the right thing to do.

But is it Good?

You bet! Let me quantify the statement in next section.

Why is it good?

Suppose you just want to use some of the fancier tools in ML/DL, I guess you can just go through Andrew Ng's class, test out bunches of implementations, then claim yourself an expert - That's what many people do these days.  In fact, Ng's Coursera class is designed to give you a taste of ML, and indeed, you should be able to wield many ML tools after the course.

That's said, you should realize your understanding of ML/DL is still .... rather shallow.  May be you are thinking of "Oh, I have a bunch of data, let's throw them into Algorithm X!".  "Oh, we just want to use XGBoost, right! It always give you the best results!"   You should realize performance number isn't everything.  It's important to understand what's going on with your model.   You easily make costly short-sighted and ill-informed decision when you lack of understanding.  It happens to many of my peers, to me, and sadly even to some of my mentors.

Don't make the mistake!  Always seek for better understanding! Try to grok.  If you only do Ng's neural network assignment, by now you would still wonder how it can be applied to other tasks.   Go for Hinton's class, feel perplexed by the Prof said, and iterate.  Then you would start to build up a better understanding of deep learning.

Another more technical note:  if you want to learn deep unsupervised learning, I think this should be the first course as well.   Prof. Hinton teaches you the intuition of many of these machines, you will also have chance to implement them.   For models such as Hopfield net and RBM, it's quite doable if you know basic octave programming.

So it's good, but is it outdated?

Learners these days are perhaps luckier, they have plenty of choices to learn deep topic such as deep learning.   Just check out my own "Top 5-List".   cs231n, cs224d and even Silver's class are great contenders to be the second class.

But I still recommend NNML.  There are four reasons:

  1. It is deeper and tougher than other classes.  As I explained before, NNML is tough, not exactly mathematically (Socher's, Silver's Maths are also non-trivial), but conceptually.  e.g. energy-based model and different ways to train RNN are some of the examples.
  2. Many concepts in ML/DL can be seen in different ways.  For example, bias/variance is a trade-off for frequentist, but it's seen as "frequentist illusion" for Bayesian.    Same thing can be said about concepts such as backprop, gradient descent.  Once you think about them, they are tough concepts.    So one reason to take a class, is not to just teach you a concept, but to allow you to look at things from different perspective.  In that sense, NNML perfectly fit into the bucket.  I found myself thinking about Hinton's statement during many long promenades.
  3. Hinton's perspective - Prof Hinton has been mostly on the losing side of ML during last 30 years.   But then he persisted, from his lectures, you would get a feeling of how/why he starts a certain line of research, and perhaps ultimately how you would research something yourself in the future.
  4. Prof. Hinton's delivery is humorous.   Check out his view in Lecture 10 about why physicists worked on neural network in early 80s.  (Note: he was a physicist before working no neural networks.)

Conclusion and What's Next?

All-in-all, Prof. Hinton's "Neural Network and Machine Learning" is a must-take class.  All of us, beginners and experts include, will be benefited from the professor's perspective, breadth of the subject.

I do recommend you to first take the Ng's class if you are absolute beginners, and perhaps some Calculus I or II, plus some Linear Algebra, Probability and Statistics, it would make the class more enjoyable (and perhaps doable) for you.  In my view, both Kapathy's and Socher's class are perhaps easier second class than Hinton's class.

If you finish this class, make sure you check out other fundamental class.  Check out my post "Learning Deep Learning - My Top 5 List", you would have plenty of ideas for what's next.   A special mention here perhaps is Daphne Koller's Probabilistic Graphical Model, which found it equally challenging, and perhaps it will give you some insights on very deep topic such as Deep Belief Network.

Another suggestion for you: may be you can take the class again. That's what I plan to do about half a year later - as I mentioned, I don't understand every single nuance in the class.  But I think understanding would come up at my 6th to 7th times going through the material.

Arthur Chan

[1] To me, this makes a lot of sense for both the course's preparer and the students, because students can take more time to really go through the homework, and the course's preparer can monetize their class for infinite period of time.


(20170410) First writing
(20170411) Fixed typos. Smooth up writings.
(20170412) Fixed typos
(20170414) Fixed typos.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Some Quick Impression of Browsing "Deep Learning"

(Redacted from a post I wrote back in Feb 14 at AIDL)
I have some leisure lately to browse "Deep Learning" by Goodfellow for the first time. Since it is known as the bible of deep learning, I decide to write a short afterthought post, they are in point form and not too structured.

  • If you want to learn the zen of deep learning, "Deep Learning" is the book. In a nutshell, "Deep Learning" is an introductory style text book on nearly every contemporary fields in deep learning. It has a thorough chapter covered Backprop, perhaps best introductory material on SGD, computational graph and Convnet. So the book is very suitable for those who want to further their knowledge after going through 4-5 introductory DL classes.
  • Chapter 2 is supposed to go through the basic Math, but it's unlikely to cover everything the book requires. PRML Chapter 6 seems to be a good preliminary before you start reading the book. If you don't feel comfortable about matrix calculus, perhaps you want to read "Matrix Algebra" by Abadir as well.
  •  There are three parts of the book, Part 1 is all about the basics: math, basic ML, backprop, SGD and such. Part 2 is about how DL is used in real-life applications, Part 3 is about research topics such as E.M. and graphical model in deep learning, or generative models. All three parts deserve your time. The Math and general ML in Part 1 may be better replaced by more technical text such as PRML. But then the rest of the materials are deeper than the popular DL classes. You will also find relevant citations easily.
  • I enjoyed Part 1 and 2 a lot, mostly because they are deeper and fill me with interesting details. What about Part 3? While I don't quite grok all the Math, Part 3 is strangely inspiring. For example, I notice a comparison of graphical models and NN. There is also how E.M. is used in latent model. Of course, there is an extensive survey on generative models. It covers difficult models such as deep Boltmann machine, spike-and-slab RBM and many variations. Reading Part 3 makes me want to learn classical machinelearning techniques, such as mixture models and graphical models better.
  • So I will say you will enjoy Part 3 if you are,
    1. a DL researcher in unsupervised learning and generative model or
    2. someone wants to squeeze out the last bit of performance through pre-training.
    3. someone who want to compare other deep methods such as mixture models or graphical model and NN.

Anyway, that's what I have now. May be I will summarize in a blog post later on, but enjoy these random thoughts for now.


You might also like the resource page and my top-five list.   Also check out Learning machine learning - some personal experience.
If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

AIDL Pinned Post V2

(Just want to keep a record for myself.)

Welcome! Welcome! We are the most active FB group for Artificial Intelligence/Deep Learning, or AIDL. Many of our members are knowledgeable so feel free to ask questions.

We have a tied-in newsletter: and

a YouTube-channel, with (kinda) weekly show "AIDL Office Hour",

Posting is strict at AIDL, your post has to be relevant, accurate and non-commerical (FAQ Q12). Commercial posts are only allowed on Saturday. If you don't follow this rule, you might be banned.


Q1: How do I start AI/ML/DL?
A: Step 1: Learn some Math and Programming,
Step 2: Take some beginner classes. e.g. Try out Ng's Machine Learning.
Step 3: Find some problem to play with. Kaggle has tons of such tasks.
Iterate the above 3 steps until you become bored. From time to time you can share what you learn.

Q2: What is your recommended first class for ML?
A: Ng's Coursera, the CalTech edX class, the UW Coursera class is also pretty good.

Q3: What are your recommended classes for DL?
A: Go through at least 1 or 2 ML class, then go for Hinton's, Karparthay's, Socher's, LaRochelle's and de Freitas. For deep reinforcement learning, go with Silver's and Schulmann's lectures. Also see Q4.

Q4: How do you compare different resources on machine learning/deep learning?
A: (Shameless self-promoting plug) Here is an article, "Learning Deep Learning - Top-5 Resources" written by me (Arthur) on different resources and their prerequisites. I refer to it couple of times at AIDL, and you might find it useful:…/learning-deep-learning-my-top…/ . Reddit's machine learning FAQ has another list of great resources as well.

Q5: How do I use machine learning technique X with language L?
A: Google is your friend. You might also see a lot of us referring you to Google from time to time. That's because your question is best to be solved by Google.

Q6: Explain concept Y. List 3 properties of concept Y.
A: Google. Also we don't do your homework. If you couldn't Google the term though, it's fair to ask questions.

Q7: What is the most recommended resources on deep learning on computer vision?
A: cs231n. 2016 is the one I will recommend. Most other resources you will find are derivative in nature or have glaring problems.

Q8: What is the prerequisites of Machine Learning/Deep Learning?
A: Mostly Linear Algebra and Calculus I-III. In Linear Algebra, you should be good at eigenvectors and matrix operation. In Calculus, you should be quite comfortable with differentiation. You might also want to have a primer on matrix differentiation before you start because it's a topic which is seldom touched in an undergraduate curriculum.
Some people will also argue Topology as important and having a Physics and Biology background could help. But they are not crucial to start.

Q9: What are the cool research papers to read in Deep Learning?
A: We think songrotek's list is pretty good:…/Deep-Learning-Papers-Reading-Roadmap. Another classic is's reading list:

Q10: What is the best/most recommended language in Deep Learning/AI?
A: Python is usually cited as a good language because it has the best support of libraries. Most ML libraries from python links with C/C++. So you get the best of both flexibility and speed.
Other also cites Java (deeplearning4j), Lua (Torch), Lisp, Golang, R. It really depends on your purpose. Practical concerns such as code integration, your familiarity with a language usually dictates your choice. R deserves special mention because it was widely used in some brother fields such as data science and it is gaining popularity.

Q11: I am bad at Math/Programming. Can I still learn A.I/D.L?
A: Mostly you can tag along, but at a certain point, if you don't understand the underlying Math, you won't be able to understand what you are doing. Same for programming, if you never implement one, or trace one yourself, you will never truly understand why an algorithm behave a certain way.
So what if you feel you are bad at Math? Don't beat yourself too much. Take Barbara Oakley's class on "Learning How to Learn", you will learn more about tough subjects such as Mathematics, Physics and Programming.

Q12: Would you explain more about AIDL's posting requirement?
A: This is a frustrating topic for many posters, albeit their good intention. I suggest you read through this blog post before you start any posting.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Thoughts From Your Humble Administrators - Feb 5, 2017

Last week:

Libratus is the biggest news item this week.  In retrospect, it's probably as huge as AlphaGo.   The surprising part is it has nothing to do with deep-learning.   So it worths our time to look at it closely.

  • We learned that Libratus crushes human professional player in head-up no-limit holdem (NLH).  How does it work?  Perhaps the Wired and the Spectrum articles tell us the most.
    • First of all, NLH is not as commonly played in Go, but it is interesting because people play real-money on it.  And we are talking about big money.  World Series of Poker holds a yearly poker tournament, all top-10 players will become instant millionaires. Among pros, holdem is known as the "Cadillac of Poker" coined by Doyle Brunson. That implies mastering holdem is the key skill in poker.
    • Limit Holdem, which pros generally think it is a "chess"-like game.  Polaris from University of Alberta bested humans in three wins back in 2008.
    • Not NLH until now, so let's think about how you would model a NLH in general. In NLH, the game states is 10^165, close to Go.  Since the game only 5 streets, you easily get into what other game players called end-game.   It's just that given the large number of possibility of bet size, the game-state blow up very easily.
    • So in run-time you can only evaluate a portion of the game tree.    Since the betting is continuous, the bet is usually discretized such that the evaluation is tractable with your compute, known as "action abstraction",  actual bet size is usually called "off-tree" betting.   These off-tree betting will then translate to in tree action abstraction in run-time, known as "action translation".   Of course, there are different types of tree evaluation.
    • Now, what is the merit of Libratus, why does it win? There seems to be three distinct factors, the first two is about the end-game.
      1. There is a new end-game solver ( which features a new criterion to evaluate game tree, called Reach-MaxMargin.
      2. Also in the paper, the authors suggest a way to solve an end-game given the player bet size.  So they no longer use action translation to translate an off-tree bet into the game abstraction.  This considerably reduce "Regret".
    • What is the third factor? As it turns out, in the past human-computer games, humans were able to easily exploit machine by noticing machine's betting patterns.   So the CMU team used an interesting strategy, every night, the team will manually tune the system such that repeated betting patterns will be removed.   That confuses human pro.  And Dong Kim, the best player against the machine, feel like they are dealing with a different machine every day.
    • These seems to be the reasons why the pro is crushed.  Notice that this is a rematch, the pros won in a small margin back in 2015, but the result this time shows that there are 99.8% chance the machine is beating humans.  (I am handwaving here because you need to talk about the big blinds size to talk about winnings.  Unfortunately I couldn't look it up.)
    • To me, this Libratus win is very closed to say computer is able to beat the best tournament head-up players.  But poker players will tell you the best players are cash-game players.  And head-up plays would not be representative because bread-and-butter games are usually 6 to 10 player games. So we will probably hear more about pokerbot in the future.

Anyway, that's what I have this week.  We will resume our office hour next week.  Waikit will tell you more in the next couple of days.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Thoughts From Your Humble Administrators - Jan 29, 2017

This week at AIDL:

Must-read:  I would read the Stanford's article and Deep Patient's paper in tandem.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Reading Michael Nielsen's "Neural Networks and Deep Learning"


Let me preface this article: after I wrote my top five list on deep learning resources, one oft-asked question is "What is the Math prerequisites to learn deep learning?"   My first answer is Calculus and Linear Algebra, but then I will qualify certain techniques of Calculus and Linear Algebra are more useful.  e.g. you should already know gradient, differentiation, partial differentiation and Lagrange multipliers, you should know matrix differentiation and preferably trace trick , eigen-decomposition and such.    If your goal is to understand machine learning in general, then having good skills in integrations and knowledge in analysis helps. e.g. 1-2 stars problems of Chapter 2 at PRML [1] requires some knowledge of advanced function such as gamma, beta.   Having some Math would help you go through these questions more easily.

Nevertheless,  I find that people who want to learn Math first before approaching deep learning miss the point.  Many engineering topics was not motivated by pure mathematical pursuit.  More often than not, an engineering field is motivated by a physical observation. Mathematics is more like an aid to imagine and create a new solution.  In the case of deep learning.  If you listen to Hinton, he would often say he tries to first come up an idea and makes it work mathematically later.    His insistence of working on neural networks at the time of kernel method stems more from his observation of the brain.   "If the brain can do it, how come we can't?" should be a question you ask every day when you run a deep learning algorithm.   I think these observations are fundamental to deep learning.  And you should go through arguments of why people think neural networks are worthwhile in the first place.   Reading classic papers from Wiesel and Hubel helps. Understanding the history of neural network helps.  Once you read these materials, you will quickly grasp the big picture of much development of deep learning.

Saying so, I think there are certain topics which are fundamental in deep learning.   They are not necessarily very mathematical.  For example, I will name back propagation [2] as a very fundamental concept which you want to get good at.   Now, you may think that's silly.    "I know backprop already!"  Yes, backprop is probably in every single machine learning class.  It will easily give you an illusion that you master the material.    But you can always learn more about a fundamental concept.  And back propagation is important theoretically and practically.  You will encounter back propagation either as a user of deep learning tools, a writer of a deep learning framework or an innovator of new algorithm.  So a thorough understanding of backprop is very important, and one course is not enough.

This very long digression finally brings me to the great introductory book Michael Nielson's Neural Network and Deep Learning (NNDL)    The reason why I think Nielson's book is important is that it offers an alternative discussion of back propagation as an algorithm.   So I will use the rest of the article to explain why I appreciate the book so much and recommend nearly all beginning or intermediate learners of deep  learning to read it.

First Impression

I first learned about "Neural Network and Deep Learning" (NNDL) from going through Tensorflow's tutorial.   My first thought is "ah, another blogger tries to cover neural network". i.e. I didn't think it was promising.   At that time, there were already plenty of articles about deep learning.  Unfortunately, they often repeat the same topics without bringing anything new.


Don't make my mistake!  NNDL is a great introductory book which balance theory and practice of deep neural network.    The book has 6 chapters:

  1. Using neural network to recognize digits - the basic of neural network, a basic implementation using python (
  2. How the backpropagation algorithm works -  various explanation(s) of back propagation
  3. Improving the way neural networks learn - standard improvements of the simple back propagation, another implementation in python (
  4. A visual proof that neural nets can compute any function - universal approximation algorithm without the Math, plus fun games which you can approximate function yourself
  5. Why are deep neural networks hard to train?  - practical difficultie of using back propagation, vanishing gradients
  6. Deep Learning  - convolution neural network (CNN), the final implementation based on Theano (, recent advances in deep learning (circa 2015).

The accompanied python scripts are the gems of the book. and can run in plain-old python.   You need Theano on, but I think the strength of the book really lies on and (Chapter 1 to 3) because if you want to learn CNN, Kaparthy's lectures probably gives you bang for your buck.

Why I like Nielsen's Treatment of Back Propagation?

Reading Nielson's exposition of neural network is the sixth  time I learn about the basic formulation of back propagation [see footnote 3].  So what's the difference between his treatment and my other reads then?

Forget about my first two reads because I didn't care enough neural networks enough to know why back propagation is so named.   But my latter reads pretty much give me the same impression of neural network: "a neural network is merely a stacking of logistic functions.    So how do you train the system?  Oh, just differentiate the loss functions, the rest is technicalities."   Usually the books will guide you to verify certain formulae in the text.   Of course, you will be guided to deduce that "error" is actually "propagating backward" from a network.   Let us call this view network-level view.   In a network-level view, you really don't care about how individual neurons operate.   All you care is to see neural network as yet another machine learning algorithm.

The problem of network level view is that it doesn't quite explain a lot of phenomena about back propagation.  Why is it so slow some time?  Why certain initialization schemes matter?  Nielsen does an incredibly good job to break down the standard equations into 4 fundamental equations (BP1 to BP4 in Chapter2).  Once interpret them, you will realize "Oh, saturation is really a big problem in back propagation" and "Oh, of course you have to initialize the weights of neural network with non-zero values.  Or else nothing propagate/back propagate!"    These insights, while not mathematical in nature and can be understood with college calculus, is deeper understanding about back propagation.

Another valuable part about Nielsen's explanation is that it comes with a accessible implementation.  His first implementation ( is a 74 lines python in idiomatic python.   By adding print statements on his code, you will quickly grasp on a lot of these daunting equations are implemented in practice.  For example, as an exercise, you can try to identify how he implement BP1 to BP4 in    It's true that there are books and implementations about neural network,  but the description and implementation don't always come together.  Nielsen's presentation is a rare exception.

Other Small Things I Like

  • Nielsen correctly point out the Del symbol in machine learning is more like a convenient device rather than its more usual meaning like the Del operator in Math.
  • In Chapter 4,  Nielson mentioned universal approximation of neural network.  Unlike standard text book which points you to a bunch of papers with daunting math, Nielsen created a javascript which allows you to approximate functions (!), which I think those are great ways to learn intuition behind the theorem.
  • He points out that it's important to differentiate activation and the weighted input.  In fact,  this point is one thing which can confuse you when reading a derivation of back propagation because textbooks usually use different symbols for activation and weighted input.

There are many of these insightful comments from the book, I encourage you to read and discover them.

Things I don't like

  • There are many exercises of the book.  Unfortunately, there is no answer keys.  In a way, this make Nielson more an old-style author which encourage readers to think.   I guess this is something I don't always like because spending time to think of one single problem forever doesn't always give you better understanding.
  • Chapter 6 gives the final implementation in Theano.  Unfortunately, there is not much introductory material on Theano within the book.    I think this is annoying but forgivable, as Nielson pointed out, it's harder to introduce Theano and introductory book.  I would think anyone interested in Theano should probably go through the standard Theano's tutorial at here and here.


All-in-all,  I highly recommend Neural Network and Deep Learning  to any beginning and intermediate learners of deep learning.  If this is the first time you learn back propagation,  NNDL is a great general introductory book.   If you are like me, who already know a thing or two about neural networks, NNDL still have a lot to offer.


[1] In my view, PRML's problem sets have 3 ratings, 1-star, 2-star and 3-star.  1-star usually requires college-level of Calculus and patient manipulation, 2-star requires some creative thoughts in problem solving or knowledge other than basic Calculus.  3-star are more long-form questions and it could contain multiple 2-star questions in one.   For your reference, I solved around 100 out of the 412 questions.  Most of them are 1-star questions.

[2] The other important concept in my mind is gradient descent, and it is still an active research topic.

[3] The 5 reads before "learnt" it once back in HKUST, read it from Mitchell's book, read it from Duda and Hart, learnt it again from Ng's lecture, read it again from PRML.  My 7th is to learn from Karparthy's lecture, he present the material in yet another way.  So it's worth your time to look at them.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Facebook Artificial Intelligence/Deep Learning Group @ 1000 Members

I (Arthur) always remember comp.speech and comp.speech.research which I was able to cross path with many great developers/researchers.   Another fond memory of mine related to discussion forum was with CMU Sphinx, a large vocabulary speech recognizer, which many users later become very advanced, and spawned numerous projects.   You always learn something new from people around the world.  That was the reason why Internet is really really great.

Translate to now, wow, searching for a solid discussion forum for deep learning is hard.   Many of them, in Facebook or LinkedIn are really spammy.  I tried Plus for a while, but for the most part no one digs my message. (My writing style? 🙂 )  So when Waikit Lau, an old friend + veteran startup investors/mentor/helper, asked me to help admin the group.  I was more than happy to oblige.

Yes, you hear it right,  Artificial Intelligence & Deep Learning Group is a curated discussion forum,  we rejected spammers, ads and only blog posts which are relevant to us are allowed.

Alright everyone does it, I might as well:

(Just kidding, we are not really chasing for a bigger group, but more quality discussion.)

Some come join us.  We are very happy to chat with you on deep learning.

Arthur and Waikit

You might also like Learning Machine Learning,  Some Personal Experience and Learning Deep Learning, My Top-5 List.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Learning Deep Learning - My Top-Five List

Many people have been nagging me to write a beginner guide on deep learning.    Geez, that's a difficult task - there are so many tutorials, books, lectures to start with, and the best way to start highly depends on your background, knowledge and skill sets.  So it's very hard to give a simple guideline.

In this post, I will do something less ambitious: I gather what I think is the top-5 most important resources which let you to start to learn deep learning.   Check out the "Philosophy" section on why this list is different from other lists you saw.


There are many lists of resources of deep learning.  To name a few, the "Awesome"  list,  the Reddit machine learning FAQ. I think they are quality resources, and it's fair to ask why I start "Top-Five" around a year ago.

Unlike all the deep learning resource list you saw, "Top-Five" is not meant to be an exhaustive list.  Rather it assumes you have only limited amount of time to study and gather resources while learning deep learning.    For example, suppose you like to learn through on-line classes.  Each machine/deep learning class would likely take you 3 months to finish. It will take you a year to finish all the classes.   As a result, having a priority is good.  For instance, without any guidance, reading Goodfellow's Deep Learning would confuse you.   A book such as Bishop's Pattern Recognition and Machine Learning (PRML) would likely be a better "introductory book".

Another difference between Top-Five list and other resource list is that the resource are curated. Unless specified, I have either finished the material myself.  So for classes I have at least audit the whole lecture once.  For books I probably browse it once. In a way,  this is more an "Arthur's list", rather than some disorganized links.  You also see a short commentary why (IMO) they are useful.

Which Top-Five?

As the number of sections in my list grow, it's fair to ask what resources should you spend time on first.   That's a tough question because humans differ in their preference of learning.  My suggestion is start from the following,

  1. Taking classes - by far I think it is the most effective way to learn.  Listening+doing homework usually teach you a lot.
  2. Book Reading - this is important because usually lectures only summarize a subject.   Only when you read through a certain subject, you start to get deeper understanding.
  3. Playing with Frameworks - This allows you to actually create some deep learning applications, and turn some your knowledge in real-life
  4. Blog Reading - this is useful but you better know which blogs to read (Look at the section "Blogs You Want To Read").  In general, there are just too many blog writers these days, and they might only have murky understanding of the topic.   Reading those would only make you feel more confused.
  5. Joining Forums and ask questions - this is where you can dish out some of your ideas and ask for comments.  Once again, the quality of the forum matters.   So take a look of the section "Facebook Forums".


Basic Deep Learning (Also check out "The Basic-Five")

This are more the must-take courses if you want to learn the basic jargons of deep learning.   Ng's, Karparthy's and Socher's class teach you basic concepts but they have a theme of building applications.   Silver's class link deep learning concept with reinforcement learning. So after these 4 classes, you should be able to talk deep learning well and work with some basic applications.

My only note is on Hinton's class, while it is very useful, and it's fairly general.  I think another general class such as Hugo Larochelle's Neural Networks could be more appropriate.

  1. Andrew Ng's Coursera Machine Learning class: You need to walk before you run.   Ng's class is the best beginner class on machine learning in my opinion.  Check out this page for my review.
  2. Fei-Fei Li and Andrew Karpathy's Computer Vision class (Stanford cs231n 2015/2016) :  I listen through the lectures once.  Many people just call this a Karpathy's class, but it is also co-taught by another experienced graduate student, Justin Johnson.  For the most part this is the class for learning CNN,  it also brings you to the latest technology of more difficult topics such as image localization, detection and segmentation.
  3. Richard Socher's Deep Learning and Natural Language Processing (Standard cs224d) : Another class I hadn't had chance to go through, but the first few lectures were very useful for me when I tried to understand RNN and LSTM.   This might also be the best set of lecture to learn Socher's recursive neural network. Compare to Karpathy's class, Socher's place more emphasis on mathematical derivation.  So if you are not familiar with matrix differentiation, this would be a good class to start with and get your hands wet.
  4. David Silver's Reinforcement Learning This is a great class taught by the main programmer of AlphaGo.  It starts from the basic of reinforcement learning such as DP-based method, then proceeds to more difficult topic such as Monte-Carlo and TD method, as well as function approximation and policy gradient.   It takes quite a bit of understanding even if you already background of supervised learning.   As RL is being used more and more applications, this class should be a must-take for all of you.
  5. Hinton's Neural Network :  While the topics are advanced, Prof. Hinton's class is probably the one which can teach you the most on the philosophical difference between deep learning and general machine learning.  The first time I audit the class in 2016 October, his explanation on models based on statistical mechanical model blew my mind.   I finished the course around 2017 April, which results in a popular review post. Unfortunately, due to the difficulty of the class, it was ranked lower in this list.
    (It was ranked 2nd, then 4th, on the list but I found that it requires deeper understanding than the Karparthy's, Socher's and Silver's.  So I made it lower on the list.

You should also consider:

  • Hugo Larochelle's Neural Network class : by another star-level innovator of the field.  I only heard Larochelle's lecture in a deep learning class, but he is succinct and to the point than many.
  • Daphne Koller's Probabilistic Graphical Model: if you want to understand tougher concepts in models such as DBN, you want to have some background in Bayesian network as well.  If that's the route you like, Koller's class is for you.  But this class, just like Hinton's NNML, is notoriously difficult and not for faint of heart.
  • MIT Self Driving 6.S094  See the description in the session of Reinforcement Learning.
  • Nando de Freita's class on Machine/Deep Learning :  I don't have a chance to go through this one, but it is both for beginner and more advanced learners.  It covers topics such as reinforcement learning and siamese network.    I also think this is the class if you want to use Torch as your deep learning language.
Reinforcement Learning

Reinforcement learning has deep history by itself and you can think it has the heritage from both computer science and electrical engineering.

My understanding of RL is fairly shallow so I can only tell you which are the easier class to take, but all of these classes are more advanced. Georgia Tech CS8803 should probably be your first. Silvers' is fun, and it's based on Sutton's book, but be ready to read the book in order to finish some of the exercises.

  1. Udacity's Reinforcement Learning  This is a class which is jointly published by Georgia Tech and you can take it as an advanced course CS8803.  I took Silver's class first, but I found the material this class provides a non-deep learning take and quite refreshing if you start out at reinforcement learning.
  2. David Silver's Reinforcement Learning See description in the "Introductory Deep Learning" section.
  3. MIT Self Driving 6.S094  A specialized class in self-driving.  The course is mostly computer vision, but there is one super-entertaining exercise on self driving, which mostly likely you want to use RL to solve the problem. (Here is some quick impression about the class.)

You should also consider:

I heard good things about them......
  • Oxford Deep NLP 2017 This is perhaps the second class of deep learning on NLP. I found the material interesting because it covers material which wasn't covered by the Socher's class.  I haven't takem it yet.  So I will comment later.
  • NYU Deep Learning class at 2014: by Prof. Yann LeCun.  To me this is an important class, with similar importance as Prof. Hinton's class.  Mostly because Prof. LeCun is one of the earliest experimenters on BackProp and SGD.  Unfortunately these NYU's lecture was removed.   But do check out the slides though.
  • Also from Prof. Yann LeCun, Deep Learning inaugural lectures.
  • Berkely's Seminar on Deep Learning: by Prof.  Ruslan Salakhutdinov, an early researcher on unsupervised learning.
  • University of Amsterdam Deep Learning:  If you have already audit cs231n and cs224d, perhaps the material here is not too new, but I found it useful to have a second source when I look at some of the material.   I also like the presentation of back-propagation, which is more mathematical than most beginner class.
  • Special Topics in Deep Learning.  I found it great resource if you want to drill on more exoteric topics in deep learning.
  • Deep Learning for Speech and Language More of my own curiosity on speech recognition. This course is perhaps is the only one I can find on DL on ASR.   If you happen to stumble this paragraph, I'd say most software you find on-line are not really too applicable in real-life.  The only exceptions are discussed in this very old article of mine.
For reference
More AI than Machine Learning (Unsorted)
More about the Brain:

I don't have much, but you can take a look of my another list on Neuroscience MOOCs.


I wrote quite a bit on the Recommended Books Page.   In a nutshell,  I found that classics such as PRML and Duda and Hart are still must-reads in the world of deep learning.   But if you still want a list, alright then......

  1. Michael Nielson's Deep Learning Book: or NNDL,  highly recommended by many.  This book is very suitable for beginners who want to understand the basic insights of simple feed forward networks and their setups.    Unlike most text books, it doesn't quite go through the Math until it gives you some intuition.   While I only went through recently, I highly recommend all of you to read it.  Also see my read on the book.
  2. PRML : I love PRML!  Do go to read my Recommended Books Page to find out why.
  3. Duda and Hart:  I don't like it as much as PRML, but it's my first machine learning Bible.  Again, go to my Recommended Books Page to find out why.
  4. The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville:  This is the book for deep learning, but it's hardly for beginner.   I recently browse through the book.  Here is some quick impression.
  5. Natural Language Understanding with Distributed Representation by Kyung Hyun Cho.   This is mainly for NLP people, but it's important to note how different that NLP is seen from a deep learning point of view.

Others: Check out my Recommended Books Page.  For beginner, I found Mitchell's and Domingo's books are quite interesting.


  1. Tensorflow : most popular, and could be daunting to install, also check out TFLearn.  Keras became the de-facto high-level layer lately.
  2. Torch :  very easy to use even if you don't know Lua.   It also leads you to great tutorials.  Also check out PyTorch.
  3. Theano : grandfather of deep learning frameworks, also check out Lasagne.
  4. Caffe : probably the fastest among the generic frameworks.  It takes you a while to understand the setup/syntax.
  5. Neon : the very speedy neon, it's optimized on modern cards. I don't have a benchmarking between caffe and neon yet, but its MNIST training feels very fast.


  • deeplearning4j: obviously in java, but I heard there are great support on enterprise machine learning.


  1. Theano Tutorial:  a great sets of tutorials and you can run it from CPU.
  2. Tensorflow Tutorial : a very comprehensive sets of tutorial.  I don't like it as much as Theano's because some tasks require compilation, which could be fairly painful.
  3. char-rnn:  not exactly a tutorial but if you want to have fun with deep learning.  You should train at least one char-rnn.   Note that word-based version is available.  The package is also optimized now as torch-rnn.  I think char-rnn is also a great starting code for intermediate learners to learn Torch.
  4. Misc: generally running the examples of a package can teach you a lot.  Let's say this is one item.

Others: I also found Learning Guide from YeravaNN's lab to be fairly impressive.  There is ranked resource list on several different topics, which is similar to the spirit of my list.

Mailing Lists

  1. (Shameless Plug) AIDL Weekly  Curated by me and Waikit Lau, AIDL weekly is a tied-in newsletter of the AIDL Facebook group. We provide in-depth analysis of weekly events of AI and deep learning.
  2. Mapping Babel Curated by Jack Clark.  I found it entertaining and well-curated.  Clark is more in the journalism space and I found his commentary thoughtful.
  3. Data Machina This is a link only letter.  The links are quite quality.

Of course, there are more newsletter than these three.  But I don't normally recommend them.   One reason is many "curators" don't always read the original sources before they share the links, which sometimes inadvertently spread faked news to the public.   In Issue #4 of AIDL Weekly, I described one of such incidences.  So you are warned!

Facebook Forums

That's another category I am going to plug shamelessly.  It has to do with most Facebook forums have too much noise and administrator pay too little attention to the group.

  1. (Shameless Plug) AIDL This is a forum curated by me and Waikit.  We like our forum because we actively curate it, delete spam and facilitate discussion within the group.  As a result it become one of the most active group.  It has 10k+ members.  As of this writing, we have a tied-in mailing list as well as a weekly show.
  2. Deep Learning  Deep Learning has comparable size as AIDL, but less active, perhaps because the administrators use Korean.  I still find some of the links interesting and use the group a lot before  administering AIDL.
  3. Deep Learning/AI Curated by Sid Dharth and Ish Girwan.  DLAI follows very similar philosophy and Sid control posting tightly.  I think his group will be one of the up-and-coming group next year.
  4. Strong Artificial Intelligence  This is less about deep learning, but more on AI.   It is perhaps the biggest FB group on AI, its membership stabilized but posting is solid and there are still some life in discussion. I like the more philosophical ends of the posts which AIDL usually refrained from.

Non-trivial Mathematics You should Know

Due to popular demand,  this section is what I would say a bit on the most relevant Math which you need to know.   Everyone knows that Math is useful, and yes, stuffs like Calculus, Linear Algebra, Probability and Statistics are super useful too.  But then I think they are too general, so I will name several specific topics which turns out to be very useful, but not very well taught in school.

  1. Bayes'  Theorem:  Bayes' theorem is important not only as a simple rule which you will use it all the time.   The high school version usually just ask you to reverse the end of probabilities. But once it is apply in reasoning, you will need to be very clear how to interpret terms such as likelihood and priors. It's also very important what the term Bayesian really means, and why people see it as better than frequentist.   All these thinking if you don't know Bayes' rules, you are going to get very confused.
  2. Properties of Multi-variate Gaussian Distribution:  The one-dimensional Gaussian distribution is an interesting mathematical quantity.  If you try to integrate it, it will be one of the integrals you quickly you can't integrate it in trivial way.   That's the point you want to learn the probability integral and how it was integrated.   Of course, once you need to work on multi-variate Gaussian, then you will need to learn further properties such as diagonalizing the covariance matrix and all the jazz.   Those are non-trivial Math.   But if you master them, it will helps you work through more difficult problems in PRML.
  3. Matrix differentiation : You can differentiate all right, but once it comes to vector/matrix, even the notation seems to be different from your college Calculus.  No doubt, matrix differentiation is seldom taught in school.   So always refer to useful guide such as Matrix Cook Book, then you will be less confused.
  4. Calculus of Variation: If you want to find the best value which optimize a function you use Calculus, if you want to find the best function/path which optimize a functional, you use Calculus of Variation. For the most part, Euler-Langrange equation is what you need.
  5. Information theory:  information theory is widely used in machine learning.  More importantly the reasoning and thinking can be found everywhere.  e.g. Why do you want to optimize cross-entropy, instead of square error?  Not only square error over-penalize incorrect outputs.  You can also think of cross-entropy is learning from the surprise of a mistake.

Blogs You Should Read

  1. Chris Olah's Blog  Olah has great capability to express very difficult mathematical concepts to lay audience.   I greatly benefit from his articles on LSTM and computational graph.   He also makes me understand learning topology is fun and profitable.
  2. Andrew Karparthy's Blog  If you hadn't read "The Unreasonable Effectiveness of Recurrent Neural Networks", you should.   Karparthy's articles show both great enthusiasm on the topic and very good grasp on the principle.    I also like his article on reinforcement learning.
  3. WildML Written by Danny Britz,  he is perhaps less well-known than either Olah or Karparthy, but he enunciate many topics well. For example, I enjoy his explanation on GRU/LSTM a lot.
  4. Tombone's Computer Vision Blog Written by Tomasz Malisiewicz.  This is the first few blogs I read about computer vision, Malisiewicz has great insight on machine learning algorithms and computer vision.   Many of his articles give insightful comments on relationship between ML techniques.
  5. The Spectactor written by Shakir Mohamad.  This is my goto page on mathematical statistics as well as theoretial basis of  deep learning techniques.  Check out his thought on what make a ML technique deep, as well as his tricks in machine learning.

That's it for now. Check out this page and I might update with more contents. Arthur

This post is first published at

You might also like Learning Machine Learning,  Some Personal Experience.

If you like this message, subscribe the Grand Janitor Blog's RSS feed.  You can also find me at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.


(20160817): I change the title couple of times, because this is more like a top-5 list of a list. So I retitled the post as "top-five resource", "top-five", now I settled to use "top-five list", which is a misnomer but close enough.

(20160817): Fixed couple of typos/wording issues.

(20160824): Add a section on important Math to learn.

(20160826): Fixed Typos, etc.

(20160904): Fixed Typos

(20161002): Changed the section on books to link to my article on NNDL.   Added a section on must-follow blogs.

(20170128): As I go deep on Socher's lectures, I boost up his class ranking to number 3.  I also made Karparthay's lecture into rank number 2. I think Silver's class is important but the material is too advanced, and perhaps less of importance for deep learning learners.  (It is more about reinforcement learning when you look at it closely.)  Hinton's class is absolutely crucial but it requires more mathematical understanding than Karparthay's class.  Thus the ranking.

I also 2 more classes (NYU, MIT)  to check out and 2 more as references (VTech and UA).

(20161207): Added descriptions of Li, Karparthy and Johnson's class,   Added description of Silver's class.

(20170310): Add "Philosophy", "Top-Five of Top-Five", "Top-Five Mailing List", "Top-Five Forums".  Adjusted description on Socher's class, linked a quick impression on GoodFellow's "Deep Learning".

(20170312): Add Oxford NLP class, Berkeley's Deep RL into the mix.

(20170319): Add the Udacity's course into the mix.  I think next version I might have a separate section on reinforcement learning.

(20170326): I did another rewrite last two weeks mainly because there are many new lectures released during Spring 2017. Here is a summary:

  •  I separate all "Courses/Lectures" session to two tracks: "Basic Deep Learning" and "Reinforcement Learning". It's more a decluttering of links. I also believe reinforcement learning should be separate track because it requires more specialized algorithms.
  • On the "Basic Deep Learning" track, ranking has change. It was Ng's, cs231n, cs224d, Hinton's, Silver's, now it becomes Ng's, cs231n, cs224d, Silvers's, Hinton's. As I go deep into Hinton's class, I found that it has more difficult concepts. Both Silver's and Hinton's class are more difficult than the first 3 IMO.
  • I also gives some basic description on the U. of Amsterdam's class. I don't know much about it yet, but it's refreshing because it gives different presentation from the "Basic 5" I recommend.

(20170412): I finished Hinton's NNML, added Berkley CS294-131 into the mix.

(20170620): Links up "Top-5" List with "Basic 5".  Added a list of AI, added link to my MOOC list.

Links to process: