Learning Deep Learning: The "Basic Five" - Five Beginner Classes on Deep Learning

By onojeghuo from Unsplash, CC0

I have been self-learning deep learning for a while, informally from 2013 when I first read Hinton's "Deep Neural Networks for Acoustic Modeling in Speech Recognition" and through Theano, more "formally" from various classes since the 2015 Summer when I got freshly promoted to Principal Speech Architect [5].   It's not an exaggeration that deep learning changed my life and career.   I have been more active than my previous life.  e.g.  If you are reading this, you are probably directed from the very popular Facebook group, AIDL, which I admin.

So this article was written at the time I finished watching an older version on Richard Socher's cs224d on-line [1].  That, together with Ng's, Hinton's, Li and Karpathy's and Silvers's, are the 5 classes I recommended in my now widely-circulated "Learning Deep Learning - My Top-Five List".    I think it's fair to give these sets of classes a name - Basic Five. Because IMO, they are the first fives classes you should go through when you start learning deep learning.

In this post I will say a few words on why I chose these five classes as the Five. Compared to more established bloggers such as Kapathy, Olah or Denny Britz, I am more a learner in the space [2], experienced perhaps, yet still a learner.  So this article and my others usually stress on learning.  What you can learn from these classes? Less talk-about, but as important: what is the limitation of learning on-line?   As a learner, I think these are interesting discussion, so here you go.

What are the Five?

Just to be clear, here is the classes I'd recommend:

  1. Andrew Ng's Coursera Machine Learning - my review,
  2. Fei-Fei Li and Andrew Karpathy's Convolutional Neural Networks for Visual Recognition or Stanford cs231n 2015/2016,
  3. Richard Socher's Deep Learning and Natural Language Processing or Stanford cs224d,
  4. David Silver's Reinforcement Learning,
  5. Hinton's Neural Network and Machine Learning - my review.

And the ranking is the same as I wrote in Top-Five List.  Out of the five, four has official video playlist published on-line for free[6]. With a small fee, you can finish the Ng's and Hinton's class with certification.

How much I actually Went Through the Basic Five

Many beginner articles usually come with gigantic set of links.   The authors usually expect you to click through all of them (and learn through them?) When you scrutinize the list, it could amount to more than 100 hours of video watching, and perhaps up to 200 hours of work.  I don't know about you, but I would suspect if the author really go through the list themselves.

So it's fair for me to first tell you what I've actually done with the Basic Five as of the first writing (May 13, 2017)

CoursesMy Progress
Ng's "Machine Learning"Finished the class in entirety without certification.
Li and Karpathy's "Convolutional Neural Networks for Visual Recognition" or cs231nListened through the class lectures about ~1.5 times. Haven't done any of the homework
Socher's "Deep Learning for Natural Language Processing" or cs224dListened through the class lecture once. Haven't done any of the homework.
Silver's "Reinforcement Learning"Listened through the class lecture 1.5 times. Only worked out few starter problems from Denny Britz's companion exercises.
Hinton's "Neural Network for Machine Learning"Finished the class in entirety with certification. Listen through the class for ~2.5 times.

This table is likely to update as I go deep into a certain class, but it should tell you the limitation of my reviews.  For example,  while I have watched through all the class videos, only on Ng's and Hinton's class I have finished the homework.   That means my understanding on two of the three "Stanford Trinities"[3] is weaker, nor my understanding of reinforcement learning is solid.   Together with my work at Voci, the Hinton's class gives me stronger insight than average commenters on topics such as unsupervised learning.

Why The Basic Five? And Three Millennial Machine Learning Problems

Taking classes is for learning of course.  The five classes certainly give you the basics, and if you love to learn the fundamentals of deep learning. And take a look of footnote [7].  The five are not the only classes I sit through last 1.5 years so their choice is not arbitrary.  So oh yeah. Those are the stuffs you want to learn. Got it? That's my criterion. 🙂

But that's what other one thousand bloggers would tell you as well. I want to give you a more interesting reason.  Here you go:

If you go back in time to the Year 2000.  That was the time Google just launched their search engine, and there was no series of Google products and surely there was no Imagenet. What was the most difficult  problems for machine learning?   I think you would see three of them:

  1. Object classification,
  2. Statistical machine learning,
  3. Speech recognition.

So what's so special about these three problems then?  If you think about that, back in 2000, all three were known to be hard problems.  They represent three seemingly different data structures -

  1. Object classification - 2-dimensional, dense array of data
  2. Statistical machine learning (SMT) - discrete symbols, seemingly related by loose rules human called grammars and translation rules
  3. Automatic speech recognition (ASR)- 1-dimensional time series, has similarity to both object classification (through spectrogram), and loosely bound by rules such as dictionary and word grammar.

And you would recall all three problems have interest from the government, big institutions such as Big Four, and startup companies.  If you master one of them, you can make a living. Moreover, once you learn them well, you can transfer the knowledge into other problems.  For example, handwritten character recognition (HWR) resembles with ASR, and conversational agents work similarly as SMT.  That just has to do with the three problems are great metaphor of many other machine learning problems.

Now, okay, let me tell one more thing: even now, there are people still (or trying to) make a living by solving these three problems. Because I never say they are solved.  e.g. What about we increase the number of classes from 1000 to 5000?  What about instead of Switchboard, we work on conference speech or speech from Youtube? What if I ask you to translate so well that even human cannot distinguish it?  That should convince you, "Ah, if there is one method that could solve all these three problems, learning that method would be a great idea!"

And as you can guess, deep learning is that one method revolutionize all these three fields[4].  Now that's why you want to take the Basic Five.  Basic Five is not meant to make you the top researchers in the field of deep learning, rather it teaches you just the basic.   And at this point of your learning, knowing powerful template of solving problems is important.  You would also find going through Basic Five makes you able to read majority of the deep learning problems these days.

So here's why I chose the Five, Ng's and NNML are the essential basics of deep learning.   Li and Kaparthy's teaches you object classification to the state of the art.  Whereas, Socher would teach you where deep learning is on NLP, it forays into SMT and ASR a little bit, but you have enough to start.

My explanation excludes Silver's reinforcement learning.   That admittedly is the goat from the herd.   I add Silver's class because increasingly RL is used in even traditionally supervised learning task. And of course, to know the place of RL, you need a solid understanding.  Silver's class is perfect for the purpose.

What You Actually Learn

In a way, it also reflect what's really important when learning deep learning.  So I will list out 8 points here, because they are repeated them among different courses.

  1. Basics of machine learning:  this is mostly from Ng's class.  But theme such bias-variance would be repeated in NNML and Silver's class.
  2. Gradient descent: its variants (e.g. ADAM), its alternatives (e.g. second-order method), it's a never-ending study.
  3. Backpropagation: how to view it? As optimizing function, as a computational graph, as flowing of gradient.  Different classes give you different points of view. And don't skip them even if you learn it once.
  4. Architecture: The big three family is DNN, CNN and RNN.  Why some of them emerge and re-emerge in history.  The detail of how they are trained and structured.  None of the courses would teach you everything, but going through the five will teach you enough to survive
  5. Image-specific technique: not just classification, but localization/detection/segmentation (as in cs231n 2016 L8, L13). Not just convolution, but "deconvolution" and why we don't like it is called "deconvolution". 🙂
  6. NLP-specific techniques: word2vec, Glovec, how they were applied in NLP-problem such as sentiment classification
  7. (Advanced) Basics of unsupervised learning; mainly from Hinton's, and mainly about techniques 5 years ago such as RBM, DBN, DBM and autoencoders,  but they are the basics if you want to learn more advanced ideas such as GAN.
  8. (Advanced) Basics of reinforcement learning: mainly from Silver's class, from the DP-based model to Monte-Carlo and TD.

The Limitation of Autodidacts

By the time you finish the Basic Five, and if you genuinely learn something out of them.  Recruiters would start to knock your door. What you think and write about deep learning  would appeal to many people.   Perhaps you start to answer questions on forums? Or you might even write LinkedIn articles which has many Likes.

All good, but be cautious! During my year of administering AIDL, I've seen many people who purportedly took many deep learning class, but upon few minutes of discussion, I can point out holes in their understanding.    Some, after some probing, turned out only take 1 class in entirety.  So they don't really grok deeper concept such as back propagation.   In other words, they could still improve, but they just refuse to.   No wonder, with the hype of deep learning, many smart fellows just choose to start a company or code without really taking time to grok the concepts well.

That's a pity.  And all of us should be aware is that self-learning is limited.  If you decide to take a formal education path, like going to grad schools, most of the time you will sit with people who are as smart as you and willing to point out your issues daily.   So any of your weaknesses will be revealed sooner.

You should also be aware that as deep learning is hyping, your holes of misunderstanding is unlikely to be uncovered.  That has nothing to do with whether you work in a job.   Many companies just want to hire someone to work on a task, and expect you learn while working.

So what should you do then?  I guess my first advice is be humble, be aware of Dunning-Kruger Effect.  Self-learning usually give people an intoxicating feeling that they learn a lot.  But learning a lot doesn't mean you know everything.  There are always higher mountains, you are doing your own disservice to stop learning.

The second thought is you should try out your skill.  e.g. It's one thing to know about CNN, it's another to run a training with Imagenet data.   If you are smart, the former took a day.  For the latter, it took much planning, a powerful machine, and some training to get even Alexnet trained.

My final advice is to talk with people and understand your own limitation.  e.g. After reading many posts on AIDL, I notice that while many people understand object classification well enough, they don't really grasp the basics of object localization/detection.  In fact, I didn't too even after the first parse of the videos.   So what did I do?
I just go through the videos on localization/detection again and again until I understand[8].

After the Basic Five.......

So some of you would ask "What's next?" Yes, you finished all these classes, as if you can't learn any more! Shake that feeling off!  There are tons of things you still want to learn.  So I list out several directions you can go:

  • Completionist: As of the first writing, I still haven't really done all the homework on all five classes, notice that doing homework can really help your understand, so if you are like me, I would suggest you to go back to these homework and test your understanding.
  • Intermediate Five:  You just learn the basics so it's time to learn the next level.   I don't have a concrete ideas of the next 5 classes yet, but for now I would go with Koller's Bayesian Network, Columbia's EdX CSMM 102xBerkeley's Deep Reinforcement LearningUdacity's Reinforcement Learning  and finally Oxford Deep NLP 2017.
  • Drilling the Basics of Machine Learning: So this goes another direction - let's work on your fundamentals.  For that, you can any Math topics forever.  I would say the more important and non-trivial parts perhaps Linear Algebra, Matrix Differentiation and Topology.  Also  check out this very good link on how to learn college-level of Math.
  • Specialize on one field: If you want to master just one single field out of the Three Millennial Machine Learning Problems I mentioned, it's important for you to just keep on looking at specialized classes on computer vision or NLP.   Since I don't want to clutter this point, let's say I will discuss the relevant classes/material in future articles.
  • Writing:  That's what many of you have been doing, and I think it helps further your understanding.  One thing I would suggest is to always write something new and something you want to read yourself.  For example, there are too many blog posts on Computer Vision Using Tensorflow in the world.  So why not write one which is all about what people don't know?  For example, practical transfer learning for object detection.  Or what is deconvolution? Or literature review on some non-trivial architectures such as Mask-RCNN? And compare it with existing decoding-encoding structures.  Writing this kind of articles takes more time, but remember quality trumps quantity.
  • Coding/Githubbing: There is a lot of room for re-implementing ideas from papers and open source them.  It is also a very useful skill as many companies need it to repeat many trendy deep learning techniques.
  • Research:  If you genuinely understand deep learning, you might see many techniques need refinement.  Indeed, currently there is plenty of opportunities to come up with better techniques.   Of course, writing papers in the level of a professional researchers is tough and it's out of my scope.  But only when you can publish, people would give you respect as part of the community.
  • Framework: Hacking in C/C++ level of a framework is not for faint of hearts.  But if you are my type who loves low-level coding, try to come up with a framework yourself could be a great idea to learn more.  e.g. Check out Darknet, which is surprisingly C!

Conclusion

So here you go.  The complete Basic Five, what they are, why they were basic, and how you go from here.   In a way, it's also a summary of what I learned so far from various classes since Jun 2015.   As in my other posts, if I learn more in the future, I would keep this post updated.  Hope this post keep you learning deep learning.

Arthur Chan

Footnote:
[1] Before 2017, there was no coherent set of Socher's class available on-line.  Sadly there was also no legitimate version.  So the version I refer to is a mixture of 2015 and 2016 classes.   Of course, you may find a legitimate 2017 version of cs224n on Youtube.

[2] My genuine expertise is speech recognition, unfortunately that's not a topic I can share much due to IP issue.

[3] "Stanford Trinity" is a term I learned from the AI Playbook List from Andreseen Howoritz's list.

[4] Some of you (e.g. from AIDL) would jump up and say "No way! I thought that NLP wasn't solved by deep learning yet!" That's because you are one lost soul and misinformed by misinformed blog post.  ASR is the first field being tackled by deep learning, and it dated back to 2010.  And most systems you see in SMT are seq2seq based.

[5] I was in the business of speech recognition from 1998 when I worked on voice-activated project for my undergraduate degree back in HKUST.  It was a mess, but that's how I started.

[6] And the last one, you may always search it through youtube.  Of course, it is not legit for me to share it here.

[7] I also audit,

I also took,

[8]  It's still a subject that *I* could explore.  For example, just the logistic seems to be hard enough to setup.

* * *
If you like this message, subscribe the Grand Janitor Blog's RSS feed.  You can also find me at twitter, LinkedInPlus, Clarity.fm.  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

* * *

History:

20170513: First version finished

20170514: Fixed many typos.  Rewrite/add some paragraphs.  Ready to publish.

-------------

If you like this post, you might also like:

Learning Deep Learning - My Top-Five List

A Review on Hinton's Coursera "Neural Networks and Machine Learning"

For the Not-So-Uninitiated: Review of Ng's Coursera Machine Learning Class

Learning Machine Learning - Some Personal Experience

 

Some Quick Impression on MIT DL4SDC Class by Lex Friedman

Many of you might know about the MIT DL4SDC class by Lex Friedman. Recently I listen through the 5 videos and decide to write a "quick impression" post. I usually write these "impression posts" when I only gone through some parts of the class' material. So here you go:

* 6.S094, compared to Stanford cs231n or cs224d, is a more a short class which takes <6 hours to watch through all materials.

* ~40-50% of the class was spent basic material such backprop or Q-learning. Mostly because the class is short, the treatment of these topics feels incomplete. e.g. You might want to listen to Silver's class to understand systematically about RL and the place of Q-learning. And you might want to listen to Kaparty's at cs231n to know the basic of backprop. Then finish Hinton's or Socher's to completely grok it. But again, this is a short class, you really can't expect too much.

Actually, I like Friedman's stand on these standard algorithms: he asked audience tough questions on whether human brain ever behave as backprop or RL.

* The rest of the class is mostly on SDC, planning with RL, steering with all-in-one CNN. The part which is gem (Lecture 5) is Friedman's own research on driver's state. If you don't have too much time, I think that's the lecture you want to sit through.

* Now, my experience doesn't quite include the two very interesting homeworks, DeepTraffic or DeepTesla. Both I heard great stories from students. Unfortunately I never try to play with them.

That's what I have. Hope the review is useful for you. 🙂

"ML Engineer" vs "Data Scientist"

(Redacted from a conversation between me and Gautam.)

Q: "Guys, what is the difference between ML engineer and a data scientist? How they work together? How their work activity differ? Can you walk through with an use case example?"

A: (From Arthur)

"Generally, it is hard to decide what a title means unless you know a bit about the nature of the job, usually it is described in the job description.

But then you can asked what are these terms usually imply. So here is my take:

ML vs data: Usually there is the part of testing/integrating an algorithm and the part of analyzing the data. It's hard to say how much the proportion on both sides. But high dimensional data is more refrained form simple exploratory analysis. So usually people would use the term "ML" more, which means "your job is to run/tune algorithms for us, fun for you right?" But if you are looking at table-based data, then it's like to be "data" type of job.

Engineer vs scientist: In large organization, there is usually a difference between the one who come up with the mathematical model (scientist) vs the one who control the production platform (engineer). e.g. If you are solving a prediction problem, usually "scientist" is the one who come up with a model, but the "engineer" is the guy who create the production system. So you can think of them as the "R" and the "D" in the organization.

IMO, healthy companies usually balance R&D. So you would find a lot of companies would have "junior", "senior", "principal", "director", "VP" prefixed the both track of the titles.

You will sometimes see terms such as "Programmer" or "Architect" replacing "engineer"/"scientist". "Programmer" implies their job is more coding-related, i.e. the one who actual write code. "Architect" is rare, they usually oversee big picture issues among programmers, or act as a balance between R&D organizations."

My Social Network Policy

For years, my social networks don't follow a single theme.  For the most part I have varieties of interest and don't feel like pushing any news ....... until there is deep learning.   As Waikit and I started AIDL at FB with a newsletter and a youtube channel, I also start to see more traffic comes to thegrandjanitor as well. Of course, there are also more legit followers on Twitter.

In any case, here are couple of ways you can find me:
Facebook: I am private on Facebook. I don't PM, but you can always find me at the AIDL group.
LinkedIn:  On the other hand, I am very public on LinkedIn. So feel free to contact me at https://www.linkedin.com/in/arthchan2003/ .
Twitter: I am quite public on Twitter https://twitter.com/arthchan2003.
Plus: Not too active, but yeah I am there https://plus.google.com/u/0/+ArthurChan.

Talk to you~
Arthur

 

Links of My Reviews

Since I started to re-learn machine learning.  I wrote several review articles on various classes, books and resources.   Here is a collection of links:

For the Not-So-Uninitiated: Review of Ng's Coursera Machine Learning Class

One Algorithm to rule them all - Reading "The Master Algorithm"

Radev's Coursera Introduction to Natural Language Processing - A Review

Learning Deep Learning - My Top-Five List

Learning Machine Learning - Some Personal Experience

A Review on Hinton's Coursera "Neural Networks and Machine Learning"

Reading Michael Nielsen's "Neural Networks and Deep Learning"

Arthur

Good Old AI vs DNN - A question from AIDL Member

Redacted from this discussion at AIDL.

From Ardian Umam (shortened, rephrased):
"Now I'm taking AI course in my University using Peter Norvig and Stuart J. Russell textbook. In the same time, I'm learning DNN (Deep Neural Network) for visual recognition by watching Standford's Lecure on CNN (Convolutional Neural Networks) knowing how powerful a DNN to learn something from dataset. Whereas, on AI class, I'm learning about KB (Knowledge Base) including such as Logical Agent, First Order Logic that in short is kind of inferring "certain x" from KB, for example using "proportional resolution".

My question : "Is technique like what I learn in AI class I describe above good in solving real AI problem?" I'm still not get strong intuition about what I study in AI class in real AI problem."

Our exchange:

My answer: "We usually call "Is technique .... real AI problem?" GOAI (Good Old Artificial Intelligence). So your question is weather GOAI is still relevant.

Yes, it is. Let's take search as an example. More complicated systems usually have certain components in search. e.g. Many speech recognition these days still use Viterbi algorithm which is large-scaled search. NNMT type of technique still requires some kind of stack decoding. (Edit, was beam search, but I am not quite sure.)

More importantly, you can see many things as a search. e.g. optimization of a function, you can solve it by Calculus, but in practice, you actually use search algorithm to find the best solution. Of course, in real-life, you rarely implement beam search to optimization. But idea of search would give you better feeling many ML algorithms like."

AU: "Ah, I see. Thank you Arthur Chan for your reply. Yes, for search, it is. Many real problems now are still utilizing search approach to solve. As for "Knowledge, reasoning" (Chapter 3 in the Norvig book) for example using "proportional resolution" to do inference from KB (Knowledge Base), is it still relevant?"

My Answer: "I think the answer is it is and it is not. Here is a tl;dr answer:

It is not: because many practical systems these days are probabilistic. So it makes Part V of Norvig's book *feel* more relevant now. Most people in this forum are ML/DL fans. That's probably the first feeling you should have in these days.

But then, it is also relevant. In what sense? There are perhaps 3 reasons. First is it allows you to talk with people who learn A.I. from the last generation, because people in their 50-60s (aka, your boss) learn solving AI problem with logic. So if you want to talk with them, learning logic/knowledge type of system would help. Also in AI, no one knows what topic would revive. e.g. Fractal is now the least talked-about topic in our community now. But you never know what happen in the future 10-20 years. So keep ing breath is a good thing.

Then there is the part of how you think about search, in both Norvig and Russell's books, the first few search problem is to solve logic problem such as first-order logic, chess. While they are only used in fewer system, compare to search which requires probabilities, they are much easier to understand. e.g. You may heard of people in their teens write their first chess engine, but I heard no one write (good) speech recognizer or machine translator before grad school.

The final reason is perhaps more theoretical: many DL/ML system you use, yeah... .they are powerful, but not all of them are making decision human understand. So they are not *interpretable*. That's a big problem. So it is still a research problem of how to link these system to GOAI-type of work."

A Review on Hinton's Coursera "Neural Networks and Machine Learning"

CajalCerebellum
Cajal's drawing chick cerebellum cells, from Estructura de los centros nerviosos de las aves, Madrid, 1905

For me, finishing Hinton's deep learning class, or Neural Networks and Machine Learning(NNML) is a long overdue task. As you know, the class was first launched back in 2012. I was not so convinced by deep learning back then. Of course, my mind changed at around 2013, but the class was archived. Not until 2 years later I decided to take Andrew Ng's class on ML, and finally I was able to loop through the Hinton's class once. But only last year October when the class relaunched, I decided to take it again, i.e watch all videos the second times, finish all homework and get passing grades for the course. As you read through my journey, this class is hard.  So some videos I watched it 4-5 times before groking what Hinton said. Some assignments made me takes long walks to think through. Finally I made through all 20 assignments, even bought a certificate for bragging right; It's a refreshing, thought-provoking and satisfying experience.

So this piece is my review on the class, why you should take it and when.  I also discuss one question which has been floating around forums from time to time: Given all these deep learning classes now, is the Hinton's class outdated?   Or is it still the best beginner class? I will chime in on the issue at the end of this review.

The Old Format Is Tough

I admire people who could finish this class in the Coursera's old format.  NNML is well-known to be much harder than Andrew Ng's Machine Learning as multiple reviews said (here, here).  Many of my friends who have PhD cannot quite follow what Hinton said in the last half of the class.

No wonder: at the time when Kapathay reviewed it in 2013, he noted that there was an influx of non-MLers were working on the course. For new-comers, it must be mesmerizing for them to understand topics such as energy-based models, which many people have hard time to follow.   Or what about deep belief network (DBN)? Which people these days still mix up with deep neural network (DNN).  And quite frankly I still don't grok some of the proofs in lecture 15 after going through the course because deep belief networks are difficult material.

The old format only allows 3 trials in quiz, with tight deadlines, and you only have one chance to finish the course.  One homework requires deriving the matrix form of backprop from scratch.  All of these make the class unsuitable for busy individuals (like me).  But more for second to third year graduate students, or even experienced practitioners who have plenty of time (but, who do?).

The New Format Is Easier, but Still Challenging

I took the class last year October, when Coursera had changed most classes to the new format, which allows students to re-take.  [1]  It strips out some difficulty of the task, but it's more suitable for busy people.   That doesn't mean you can go easy on the class : for the most part, you would need to review the lectures, work out the Math, draft pseudocode etc.   The homework requires you to derive backprop is still there.  The upside: you can still have all the fun of deep learning. 🙂 The downside:  you shouldn't expect going through the class without spending 10-15 hours/week.

Why the Class is Challenging -  I: The Math

Unlike Ng's and cs231n, NNML is not too easy for beginners without background in calculus.   The Math is still not too difficult, mostly differentiation with chain rule, intuition on what Hessian is, and more importantly, vector differentiation - but if you never learn it - the class would be over your head.  Take at least Calculus I and II before you join, and know some basic equations from the Matrix Cookbook.

Why the Class is Challenging - II:  Energy-based Models

Another reason why the class is difficult is that last half of the class was all based on so-called energy-based models. i.e. Models such as Hopfield network (HopfieldNet), Boltzmann machine (BM) and restricted Boltzmann machine (RBM).  Even if you are used to the math of supervised learning method such as linear regression, logistic regression or even backprop, Math of RBM can still throw you off.   No wonder: many of these models have their physical origin such as Ising model.  Deep learning research also frequently use ideas from Bayesian networks such as explaining away.  If you have no basic background on either physics or Bayesian networks, you would feel quite confused.

In my case, I spent quite some time to Google and read through relevant literature, that power me through some of the quizzes, but I don't pretend I understand those topics because they can be deep and unintuitive.

Why the Class is Challenging - III: Recurrent Neural Network

If you learn RNN these days, probably from Socher's cs224d or by reading Mikolov's thesis.  LSTM would easily be your only thought on how  to resolve exploding/vanishing gradients in RNN.  Of course, there are other ways: echo state network (ESN) and Hessian-free methods.  They are seldom talked about these days.   Again, their formulation is quite different from your standard methods such as backprop and gradient-descent.  But learning them give you breadth, and make you think if the status quote is the right thing to do.

But is it Good?

You bet! Let me quantify the statement in next section.

Why is it good?

Suppose you just want to use some of the fancier tools in ML/DL, I guess you can just go through Andrew Ng's class, test out bunches of implementations, then claim yourself an expert - That's what many people do these days.  In fact, Ng's Coursera class is designed to give you a taste of ML, and indeed, you should be able to wield many ML tools after the course.

That's said, you should realize your understanding of ML/DL is still .... rather shallow.  May be you are thinking of "Oh, I have a bunch of data, let's throw them into Algorithm X!".  "Oh, we just want to use XGBoost, right! It always give you the best results!"   You should realize performance number isn't everything.  It's important to understand what's going on with your model.   You easily make costly short-sighted and ill-informed decision when you lack of understanding.  It happens to many of my peers, to me, and sadly even to some of my mentors.

Don't make the mistake!  Always seek for better understanding! Try to grok.  If you only do Ng's neural network assignment, by now you would still wonder how it can be applied to other tasks.   Go for Hinton's class, feel perplexed by the Prof said, and iterate.  Then you would start to build up a better understanding of deep learning.

Another more technical note:  if you want to learn deep unsupervised learning, I think this should be the first course as well.   Prof. Hinton teaches you the intuition of many of these machines, you will also have chance to implement them.   For models such as Hopfield net and RBM, it's quite doable if you know basic octave programming.

So it's good, but is it outdated?

Learners these days are perhaps luckier, they have plenty of choices to learn deep topic such as deep learning.   Just check out my own "Top 5-List".   cs231n, cs224d and even Silver's class are great contenders to be the second class.

But I still recommend NNML.  There are four reasons:

  1. It is deeper and tougher than other classes.  As I explained before, NNML is tough, not exactly mathematically (Socher's, Silver's Maths are also non-trivial), but conceptually.  e.g. energy-based model and different ways to train RNN are some of the examples.
  2. Many concepts in ML/DL can be seen in different ways.  For example, bias/variance is a trade-off for frequentist, but it's seen as "frequentist illusion" for Bayesian.    Same thing can be said about concepts such as backprop, gradient descent.  Once you think about them, they are tough concepts.    So one reason to take a class, is not to just teach you a concept, but to allow you to look at things from different perspective.  In that sense, NNML perfectly fit into the bucket.  I found myself thinking about Hinton's statement during many long promenades.
  3. Hinton's perspective - Prof Hinton has been mostly on the losing side of ML during last 30 years.   But then he persisted, from his lectures, you would get a feeling of how/why he starts a certain line of research, and perhaps ultimately how you would research something yourself in the future.
  4. Prof. Hinton's delivery is humorous.   Check out his view in Lecture 10 about why physicists worked on neural network in early 80s.  (Note: he was a physicist before working no neural networks.)

Conclusion and What's Next?

All-in-all, Prof. Hinton's "Neural Network and Machine Learning" is a must-take class.  All of us, beginners and experts include, will be benefited from the professor's perspective, breadth of the subject.

I do recommend you to first take the Ng's class if you are absolute beginners, and perhaps some Calculus I or II, plus some Linear Algebra, Probability and Statistics, it would make the class more enjoyable (and perhaps doable) for you.  In my view, both Kapathy's and Socher's class are perhaps easier second class than Hinton's class.

If you finish this class, make sure you check out other fundamental class.  Check out my post "Learning Deep Learning - My Top 5 List", you would have plenty of ideas for what's next.   A special mention here perhaps is Daphne Koller's Probabilistic Graphical Model, which found it equally challenging, and perhaps it will give you some insights on very deep topic such as Deep Belief Network.

Another suggestion for you: may be you can take the class again. That's what I plan to do about half a year later - as I mentioned, I don't understand every single nuance in the class.  But I think understanding would come up at my 6th to 7th times going through the material.

Arthur Chan

[1] To me, this makes a lot of sense for both the course's preparer and the students, because students can take more time to really go through the homework, and the course's preparer can monetize their class for infinite period of time.

History:

(20170410) First writing
(20170411) Fixed typos. Smooth up writings.
(20170412) Fixed typos
(20170414) Fixed typos.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus, Clarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

List of Neuroscience MOOCs

I always have side interest in neuroscience.  So here's just a note to myself, I am taking Coursera's Computational Neuroscience for fun.   This is the second time I loop through the course with more understanding on neural networks, and it feels like a completely different class.  The others are just my casual interest - one day I will go through them one by one.

Just a disclaimer: unlike my Top-5 List in deep learning,  I only have amateurish understanding of computer neuroscience.   I also only cursory experience of each of these classes. This might change when I finish a significant portion of them (definition: 50%+).  But for now, caveat emptor!

Computational Neuroscience:

Neurobiology:

Perception:

Measurement and BCI

Medical Neuroscience:

Behavioral Neuroscience

Other Biophysics/Biomedical Engineering-Related

Cognitive Neuroscience

Uncategorized:

  1. Coursera's Neuroeconomics
  2. Coursera's Neuromarketing

Other Interesting Sources of Information: OpenCulture.

Arthur

(Edit at 20170524) Categorize almost all classes into sub-categories.  Not entirely sure I am right.  But on the computational neuroscience ("theoretical") side, things look clear enough.
(Edit at 20170522) Finished UW's Comp N. Sci.  Changed the ranking so that Synapses, Neuron and Brain got a higher ranking.(Edit at 20170427) Create 5 classes which under the umbrella of "Computational Neuroscience".  For the most part they are more quantitative than the other classes.

(Edit at 20170501) Made another 4 classes under the category of "Neurobiology".

My Volunteer Work at MGH

Couple of years ago,  I was having a beer with my neighbor Catherine and Mark.   We start to talk about what we do at work. I tried to brag about my background on speech recognition..... but it turns out both Catherine and Mark are impressive power couple in neuroscience: Catherine is a MD at MGH, specialized on pediatric. epilepsy, neurophysiology.  Mark is Prof. Kramer at BU, focused on computational neuroscience.   Since then, I beg them to teach me the basics about human brains, and they were very kind to share their knowledge, and later I become a volunteer researcher at MGH.

We end up doing a project together.  In layman term, I chip on helping them evaluate couple of signal detector.  The signal is relevant to epilepsy .   Catherine and Mark were kind enough to include me in their publication at Journal of Neuroscience Method.  Their methods end-up improving the speed of previous method, and combined with localization technique, such advance will help provide a cure to epilepsy patients, especially children.

Currently the method is still human-assisted, perhaps the next step is to use ML method to improve the system.   Hopefully,  I can help more in the future.

Arthur