Time flies, I finished Course 4 around a month ago and finally have a chance to write a full review. Course 4 is different from the first three deeplearning.ai courses, which focused on fundamental understanding of deep learning topics such as back propagation (Course 1) , tuning hyperparameters (Course 2) and decide what improvement strategy is the best (Course 3) . Course 4 is more about an important application of deep learning: computer vision.
Focusing on computer vision make designing Course 4 subjects to a distinct challenges as a course: how does Course 4 scales up with other existing computer vision class? Would it be comparable with the greats such as Stanford cs231n? For these questions, I will do a comparison between Course 4 and cs231n in this article. My goal is to answer how you would choose between the two classes in your learning process.
Convolutional Neural Network In the Context of Deep Learning
Convolutional neural networks (CNN) has a very special place in deep learning. For the most part, you can think of it as interesting special case of a vanilla feed-forward network with parameters tied. Computationally, you can parallelize it much better than technique such as recurrent neural networks. Of course, it is prominent in image classification (since LeNet-5). But then it is also frequently used in sequence modeling such as speech recognition and text classification (check out cs224n for details). I guess, more importantly, since image classification is also used a template of development in many other newer application. It makes learning CNN sort of mandatory for students of deep learning.
Learning Deep-Learning-based Computer Vision before deeplearning.ai
Interesting enough, there is a rather standard option to learning deep learning-based computer vision on-line. Yes! You guess it right! It is cs231n which used to be taught by then Stanford PhD candidates, Andrej Karpathy in 2015/16.  To recap, cs231n is not only a good class for computer vision, it is also a good class for learning basics of deep learning. Also as now famous Dr. Karpathy said, it has probably one of the best explanation of back-propagation. My only criticism for the class (as I mentioned in earlier reviews) is that as a first class, it is too focused on image recognition. But as a first class of deep-learning-based computer vision, I think it was the best.
Course 4: Convolutional Neural Networks Briefly
Would Course 4 changes my opinion about cs231n then? I guess we should look at it in perspective. Comparing Course 4 with cs231n is comparing orange and apple. Course 4 is a month-long class which is suitable for absolute beginners. If you look into it course 4 basically is a quick introductory class. Week 1 focuses on what CNN is, Week 2 and 3 talks about 2 prominent applications: image classification, image detection. Whereas Week 4 are about fun stuffs such as face verification and image transfer.
Many people I know finish the class within 3 days when the class started. Whereas cs231n is a semester-long course which contain ~18 hours of video to watch with more substantial (and difficult) homework problems. It is more suitable for people who already have at least one or two machine learning full courses at their belt.
So my take is that Course 4 can be a good first class of deep-learning-based computer vision, but it is not a replacement of cs231n. So if you only took Course 4, you will find that there are still a lot in computer vision you don't grok. My advice is you should then audit cs231n afterward, or else your understanding would still have holes.
What if I already took cs231n? Would Course 4 still helps me?
Absolutely. While Course 4 is much shorter - remember that a lot of deep learning concepts are obscure. It doesn't hurt to learn the same thing in different ways. And Course 4 offer different perspectives on several topics:
For starter, Course 4, just like all other deeplearning.ai has homework which require code verification at every step. As I argued in an earlier review, that's a huge plus for learning.
Then there is the treatment of individual topics, I found that Ng's treatment on image detection is refreshing - the more conventional view (which cs231n took) was to start from RCNN and its two faster variants, then bring up YOLO. But Andrew just decide to go with YOLO instead. Notice that neither of the classes had gave detail description of the algorithm. (Reading the paper is probably the best.) But YOLO is indeed more practical than RCNN variants.
On Week 4 about applications, such as face verification and Siamese networks are actually new to me. Andrew also give a very nice explanation on why image transfer really works.
As always, even a new note for old topics matter. E.g. This is the first time I am aware the convolution in deep learning is different from convolution in signal processing. (See Week 1). I also found that Andrew's note on various image classification papers are gems. Even if you you read those paper, I do suggest you to listen to him again.
Since I admin an unofficial forum for the course, I learn that there are fairly obvious problems with the courses. For example, back in December when I took the course, there is one homework you need to submit an algorithm which wouldn't match the notebook. Also, there was also a period of time where submission was very slow, which I need to fix the file downloading to straighten it up. I do think those are frustrating issue. Hopefully, by the time when you read this article, the staff has already fixed the issues. 
To be fair, even the great NNML by Hinton has glitches here and there in their homeworks. So I am not entirely surprised glitches happen in deeplearning.ai. Of course, I would still highly recommend the class.
There you have it - I reviewed Course 4 of deeplearning.ai.Unlike earlier parts of the courses, Course 4 has a very obvious competitor: cs231n.And I don't quite put Course 4 as the one course you can take and master computer vision. My belief is you need to go through both Course4 and cs23n to have reasonable understanding.
But as a first class of DL-based computer vision.I still think Course 4 has tremendous value.So once again I highly recommend yo all to take the class.
As a final note, I was able to catch up reviews for all classes in deeplearning.ai.Now all eyes on Course 5 and currently (as of Jan 23), it is set to launch at Jan 31.Before that, do check out ourforum AIDL and Coursera deeplearning.ai for more discussion!
First published at http://thegrandjanitor.com/2018/01/24/review-of-ngs-deeplearning-ai-course-4-convolutional-neural-networks/
 Funny enough, while I went through all cs231n 2016 videos a while ago, I never wrote a review about the course.
 As a side note, I think it has to do with Andrew and the staffs are probably rushing to create the class. That's why I was actually relieved when I learn that Course 5 will be released in January. Hopefully this gives more time for the staffs to perfect the class.
As you might know, deeplearning.ai courses were released in two batches. The first batch contains Course 1 to 3. And only recently (as of November 15), Course 4, "Convolution Neural Networks" was released. And Course 5 is supposedly released in late November. So Course 3, "Structuring Machine Learning Projects" was more the "final" course in the first batch. It is also a good pause of the first and second half of the course: The first half was more the foundation of deep learning, whereas the second half was more about applications of deep learning.
So here you are, learning something new in deep learning now, isn't it time to apply these new found knowledge? Course 3 says "Hold on!" It turns out before you start to do machine learning, you need to slow down and think about how to plan a task.
In fact, in practice, Course 3 is perhaps the most important course among all the courses in the specialization. The Math in Course 1 may be tougher, and Course 5 could have difficult concepts such as RNN or LSTM which are hard to grok. They are also longer than Course 3 (which only last 2 weeks). But in grand scheme of things, they are not as important as Course 3. I am going to discuss why.
What do you actually do as an ML Engineer?
Let me digress a bit: I know many of my readers are young college students who are looking for careers in data science or machine learning. But what do people actually do in the business of machine learning or AI? I think this is a legit question because I was very confused when I first started out.
Oh well, it really depends on how much you are on the development side or research side of your team. Terms like "Research" and "Development" can have various meaning depends on the title. But you can think "researcher" are the people who try to get a new techniques working - usually the criterion is whether it beats the status quo such as accuracy performance. "Developers" on the other hand, are people come up with a production implementation. You can think that many ML jobs are really in between the spectrum of "developers" and "researchers". e.g. I am usually known for my skill as a architect. That usually means I have the knowledge on both sides. My quote on my skills is usually "50% development and 50% research". There are also people who are highly specialized in either side. But I will focus on the research-side more in this article.
So, What do you actually do as an ML Researcher then?
Now I can see a lot of you jump up and say "OH I WANT TO BE A RESEARCHER!" Yes, because doing research is fun, right? You just need to train some models and beat the baseline and write a paper. BOOM! In fact, if you are good, you just need to ask people to do your research. Woohoo, you are happy and done, right?
Oh well, in reality, good researchers are usually fairly good coders themselves. Especially in applied field such as machine learning, my guess is out of 100 researchers in an institute, may be there is perhaps 1 person who is really a "thinking staff". i.e. They do nothing other than coming up with new theory or writing proposal. Just like you, I admire the life of a pure academician. But in our time, you usually have to be either very smart and very lucky to be one of them. (There is a tl;dr explanation here, but it is out of scope of this article.)
"Okay, okay, got it..... so can we start to have some fun now? We just need to do some coding, right? " Not really, the first step before you can work on fun stuffs such as modeling, or implement new algorithm, is to clean-up data. So say if you work on a fraudulent transaction detection, the first is to load a giant table somewhere so that you can query it and get the training data. Then you want to clean the data, and massage the data so that it can be an input of ML engine. Notice that by themselves these tasks can be non-trivial as well.
Course 3: Structuring Machine Learning Projects
Then there you are, after you code, you clean up your data, finally you have some time to do machine learning. Notice that your time after all these "chores" are actually quite limited. That makes how to use your time effectively a very important topic. And here comes why you want to take Course 3: Andrew teaches you the basics of how to assign time/resource in a deep learning task. e.g. How large are your train/validation/test sets? When should you stop your development? What is human performance? What if there are mismatches between your train set/test set? If you are stuck, should you tune your hyperparemeters more? Or should you regularize?
In a way, Course 3 is a reminiscence of "Machine Learning"'s Week 6 and Week 11, basically what you try to learn is to make good "meta-decision"e of all your projects you will work for your life time. I also think it's the right stuffs in your ML career.
One final note: as you might notice in my last two reviews, I usually tried to compare deeplearning.ai with other classes. But Course 3 is quite unique, so you might only find similar material on machine learning course which focus on theory. But Ng's treatment is unique: first what he gave is practical and easy to understand advice. Then his advice focused on deep learning - while we are talking about similar principle. Working on deep learning usually implies special circumstance - such as close to human performance, and you might just have low train and test set performance. Those scenarios did appear in the past - but only in cutting edge ML evaluation involved the best ML teams. So you don't normally hear about it in a course, but now Andrew tell you all. Doesn't that worth the price of $49? 🙂
So here you have it. This is my review of Course 3 of deeplearning.ai. Surprising even to me, I actually write more than I expect for these two-week course. Perhaps the main reason is - I really hope this course were there say 3 years ago. This would have change the course of some projects I develop.
May be it's too late for me..... but if you are early in deep learning, do recognize the importance of Course 3, or any advices you hear similar to what Course 3 taught. It will save you much time - not just on one ML task but many ML tasks you will work in your career.
In your life, there are times you think you know something, yet genuine understanding seems to elude you. It's always frustrating, isn't it? For example, why would all these seemingly simple concepts such as gradients or regularization can throw us off when we learn them since Day 1 of our learning in machine learning?
In programming, there's a term called "grok", grokking something usually means that not only you know the term, but you also have intuitive understanding of the concept. Or as in "Zen and the Art of Motorcycle Maintenance" , you just try to dive deep into a concept, as if it is a journey...... For example, if you really think about speech recognition, then you would realize the frame independence assumption  is very important. Because it simplifies the problem in both search and parameter estimation. Yet it certainly introduces a modeling error. These small things which are not mentioned in classes or lectures are things you need to grok.
That brings us to Course 2 of deeplearning.ai. What are you grokking in this Course? After you take Course 1, should you take Course 2? My answer is yes and here is my reasoning.
Really, What is Gradient Descent?
Gradient descent is a seemingly simple subject - say you want to find a minima of the function a convex function, so you follow the gradient down hill and after many iterations, you eventually hit the minima. Sounds simple right?
Of course, once you start to realize that functions are normally not convex, and they are n-dimensional, and there can be plateaus. Or when you follow the gradient, but it happens to be a wrong direction! So you will have zigzagging when you try to descend. It's a little bit like descending from a real mountain, yet you don't really can't see n-dimensional space!
That explains the early difficulty of deep learning development - Stochastic gradient descent (SGD) was just too slow back in 2000 for DNN. That results in very interesting research of restricted Boltzmann machine (RBM) which was stacked and used to initialize DNN, which was prominent subject of Hinton's NNML after Lecture 8, or pretraining, which is still being used in some recipes in speech recognition as well as financial prediction.
But we are not doing RBM any more! In fact, research in RBM is not as fervent as in 2008.  Why? It has to do with people just understand more about SGD and can run it better - it has to do with initialization, e.g. Glorot's and He's initialization. It also has to do with how gradient descent is done - ADAM is our current best.
So how do you learn these stuffs? Before Ng deeplearning.ai's class, I would say knowledge like this spread out on courses such as cs231n or cs224n. But as I mentioned in the Course 1's review, those are really courses with specific applications in mind. Or you can go to read Michael Nielsen's Neural Network and Deep Learning. Of course, Nielsen's work is a book. So it really depends on whether you have the patience to work through the details while reading. (Also see my review of the book.)
Now you don't have to. The one-stop shop is Course 2. Course 2 actually covers the material I just mentioned such as initialization, gradient descent, as well as deeper concepts such as regularization and batch normalization. That makes me recommend you to keep on taking the course after you finish Course 1. If you take the class, and are also willing to read Sebastian Ruder's Review of SGD or Grabriel Goh's Why Momentum Really Works, you would be much ahead of the game.
As a note, I also like Andrew breaks down many of the SGD algorithm as a smoothing algorithm. That's a new insight for me even after I used SGD many times.
Is it hard?
Nope, as Math goes, Course 1 is probably toughest. Of course, even in Course 1, you will finish coursework faster if you don't overthink the problem. Most notebooks have the derived results for you. On the other hand, you do want to derive the formulae, you do need to have decent skill in matrix calculus.
Is it Necessary to Understand These Details?; Also Top-Down vs Bottom-Up learning, which is Better?
A legitimate question here is that : well, in our current state of deep learning which we have so many toolkits which already implemented techniques such as ADAM. Do I really need to dig so deep?
I do think there are always two views in learning - one is from top-down, which in deep learning, perhaps is to read a bunch of papers, learn the concepts and see if you can wrap you head around them. the fast.ai class is one of them. And 95% of the current AI enthusiasts are following such paths.
What's the problem of the top-down approach? Let me go back to my first paragraph - which is - do you really grok something when you do something top-down? I frequently can't. In my work life, I also heard senior people say that top-down is the way to go. Yet, when I went ahead to check if they truly understand an implementation. They frequently can't give a satisfactory answer. That happens to a lot of senior technical people who later turn to more management. Literally, they lost their touch.
On the other hand, every time, I pop up an editor and write an algorithm, I gain tremendous understanding! For example, I was asked to write a forward inference once with C, you better know what you are doing when you write in C! In fact, I come to have opinion these days that you have to implement an algorithm once before you can claim you understand it.
So how come there are two sides of the opinion then? One of my speculations is that back in 80s/90s, students are often taught to learn how to write program in first writing. That create mindsets that you have to think up a perfect program before you start to write one. Of course, in ML, such mindset is highly impractical because and the ML development process are really experimental. You can't always assume you perfect the settings before you try something.
Another equally dangerous mindset is to say "if you are too focused on details, then you miss the big picture won't come up with something new!" . This I heard a lot when I first do research and it's close to most BS-ty thing I've heard. If you want to come up with something new, the first thing you should learn is all the details of existing works. The so called "big picture" and "details" are always interconnected. That's why in the AIDL forum, we never see young kids, who say "Oh I have this brand new idea, which is completely different from all previous works!", would go anywhere. That's because you always learn how to walk before you run. And knowing the details has no downsides.
Perhaps this is my long reasons why Ng's class is useful for me, even after I read many literature. I distrust people who only talk about theory but don't show any implementation.
This concludes my review of Course 2. To many people, after they took Course 1, they just decide to take Course 2, I don't blame them, but you always want to ask if your time is well-spent.
To me though, taking Course 2 is not just about understanding more on deep learning. It is also my hope to grok some of the seemingly simple concepts in the field. Hope that my review is useful and I will keep you all posted when my Course 3's review is done.
 As Pirsig said - it's really not about motorcycle maintenance.
 Strictly speaking, it is conditional frame independence assumption. But practitioners in ASR frequently just called it frame independence assumption.
As you all know, Prof. Ng has a new specialization on Deep Learning. I wrote about the course extensively yet informally, which include two "Quick Impressions" before and after I finished Course 1 to 3 of the specialization. I also wrote three posts just on Heroes on Deep Learning including Prof. Geoffrey Hinton, Prof. Yoshua Bengio and Prof. Pieter Abbeel and Dr. Yuanqing Lin . And Waikit and I started a study group, Coursera deeplearning.ai (C. dl-ai), focused on just the specialization. This is my full review of Course 1 after finish watching all the videos. I will give a description on what the course is about, and why you want to take it. There are already few very good reviews (from Arvind and Gautam). I will write based on my experience as the admin of AIDL, as well as a deep learning learner.
The Most Frequently Asked Question in AIDL
If you don't know, AIDL is one of most active Facebook group on the matter of A.I. and deep learning. So what is the most frequently asked question (FAQ) in our group then? Well, nothing fancy:
How do I start deep learning?
In fact, we got asked that question daily and I have personally answered that question for more than 500 times. Eventually I decided to create an FAQ - which basically points back to "My Top-5 List" which gives a list of resources for beginners.
The Second Most Important Class
That brings us to the question what should be the most important class to take? Oh well, for 90% of the learners these days, I would first recommend Andrew Ng's "Machine Learning", which is both good for beginners or more experienced practitioners (like me). Lucky for me, I took it around 2 years ago and got benefited from the class since then.
But what's next? What would be a good second class? That's always the question on my mind. Karpathy cs231n comes to mind, or may be Socher's cs224[dn] is another choice. But they are too specialized in the subfields. E.g. If you view them from the study of general deep learning, the material in both classes on model architecture are incomplete.
Or you can think of general class such as Hinton's NNML. But the class confuses even PhD friends I know. Indeed, asking beginners to learn restricted Boltzmann machine is just too much. Same can be said for Koller's PGM. Hinton's and Koller's class, to be frank, are quite advanced. It's better to take them if you already know the basics of ML.
That narrows us to several choices which you might already consider: first is fast.ai by Jeremy Howard, second is deep learning specialization from Udacity. But in my view, those class also seems to miss something essential - e.g., fast.ai adopts a top-down approach. But that's not how I learn. I alway love to approach a technical subject from ground up. e.g. If I want to study string search, I would want to rewrite some classic algorithms such as KMP. And for deep learning, I always think you should start with a good implementation of back-propagation.
That's why for a long time, Top-5 List picked cs231n and cs224d as the second and third class. They are the best I can think of after researching ~20 DL classes. Of course, deeplearning.ai changes my belief that either cs231n and cs224d should be the best second class.
Learning Deep Learning by Program Verification
So what so special about deeplearning.ai? Just like Andrew's Machine Learning class, deeplearning.ai follows an approach what I would call program verification. What that means is that instead of guessing whether your algorithm is right just by staring at the code, deeplearning.ai gives you an opportunity to come up with an implementation your own provided that you match with its official one.
Why is it important then? First off, let me say that not everyone believes this is right approach. e.g. Back when I started, many well-intentioned senior scientists told me that such a matching approach is not really good experimentally. Because supposed your experiment have randomness, you should simply run your experiment N times, and calculate the variance. Matching would remove this experimental aspect of your work.
So I certainly understand the point of what the scientists said. But then, in practice, it was a huge pain in the neck to verify if you program is correct. That's why in most of my work I adopt the matching approach. You need to learn a lot about numerical properties of algorithm this way. But once you follow this approach, you will also get an ML tasks done efficiently.
But can you learn in another way? Nope, you got to have some practical experience in implementation. Many people would advocate learning by just reading paper, or just by running pre-prepared programs. I always think that's missing the point - you would lose a lot of understanding if you skip an implementation.
What do you Learn in Course 1?
For the most part, implementing feed-forward (FF) algorithm and back-propagation (BP) algorithm from scratch. Since for most of us, we are just using frameworks such as TF or Keras, such implementation from scratch experience is invaluable. The nice thing about the class is that the mathematical formulation of BP is fined tuned such that it is suitable for implementing on Python numpy, the course designated language.
Wow, Implementing Back Propagation from scratch? Wouldn't it be very difficult?
Not really, in fact, many members finish the class in less than a week. So the key here: when many of us calling it a from-scratch implementation, in fact it is highly guided. All the tough matrix differentiation is done for you. There are also strong hints on what numpy functions you should use. At least for me, homework is very simple. (Also see Footnote )
Do you need to take Ng's "Machine Learning" before you take this class?
That's preferable but not mandatory. Although without knowing the more classical view of ML, you won't be able to understand some of the ideas in the class. e.g. the difference how bias and variance are viewed. In general, all good-old machine learning (GOML) techniques are still used in practice. Learning it up doesn't seem to have any downsides.
You may also notice that both "Machine Learning" and deeplearning.ai covers neural network. So will the material duplicated? Not really. deeplearning.ai would guide you through implementation of multi-layer of deep neural networks, IMO which requires a more careful and consistent formulation than a simple network with one hidden layer. So doing both won't hurt and in fact it's likely that you will have to implement a certain method multiple times in your life anyway.
Wouldn't this class be too Simple for Me?
So another question you might ask. If the class is so simple, does it even make sense to take it? The answer is a resounding yes. I am quite experienced in deep learning (~4 years by now) and I learn machine learning since college. I still found the course very useful, because it offers many useful insights which only industry expert knows. And of course, when a luminary such as Andrew speaks, you do want to listen.
In my case, I also want to take the course so that I can write reviews about it and my colleagues in Voci can ask me questions. But with that in mind, I still learn several things new through listening to Andrew.
That's what I have so far. Follow us on Facebook AIDL, I will post reviews of the later courses in the future.
 So what is a true from-scratch implementation? Perhaps you write everything from C and even the matrix manipulation part?
I have been taking a break from deep learning, and I am quite into graphical models (GM) lately. So that's why I am gathering resources of understanding various concepts of GM.
Here are some useful courses one can use. They are not sorted/categorized, it's just useful for me to look them through later.
Note that except Koller's class, not all of the following classes have video available.
Daphne Koller's Probabilistic Graphical Models on Coursera. This is perhaps the best yet the most difficult one. All quiz and exams are filled with trick questions which can challenge even very experienced MLers.
What is the Difference between Deep Learning and Machine Learning?
Usually I don't write a full blog message to answer member's questions. But what is "deep" is such a fundamental concept in deep learning, yet there are many well-meaning but incorrect answers floating around. So I think it is a great idea to answer the question clearly and hopefully disabuse some of the misconceptions as well. Here is a cleaned up and expanded version of my comment to the thread.
Deep Learning is Just a Subset of Machine Learning
First of all deep learning is just a subset of techniques of machine learning. You may heard from many "Deep Learning Consultants"-type: "deep learning is completely different from from Machine Learning". But then when we are talking about "deep learning" these days, we are really talking about "neural networks which has more than one layer". Since neural network is just one type of ML techniques, it doesn't make any sense to call DL as "different" from ML. It might work for marketing purpose, but the thought was clearly misleading.
Deep Learning is a kind of Representation Learning
So now we know that deep learning is a kind of machine learning. We still can't quite answer why it is special. So let's be more specific, deep learning is a kind of representation learning. What is representation learning? Representation learning is an opposite of another school of thought/practice: feature engineering. In feature engineering, humans are supposed to hand-craft features to make machine works better. If you Kaggle before, this should be obvious to you, sometimes you just want to manipulate the raw inputs and create new feature to represent your data.
Yet in some domains which involve high-dimensional data such as images, speech or text, hand-crafting feature was found to be very difficult. e.g. Using HOG type of approaches to do computer vision usually takes a 4-5 years of a PhD student. So here we come back to representation learning - can computer automatically learn good features?
What is a "Deep" Technique?
Now we come to the part why deep learning is "deep" - usually we call a method "deep" when we are optimizing a nested function in the method. So for example, if you can express such functions as a graph, you would find that it has multiple layers. The term "deep" really is describing such "nestedness". That should explain why we typically called any artificial neural network (ANN) with more than 1 hidden layer as "deep". Or the general saying, "deep learning is just neural network which has more layers".
(Another appropriate term is "hierarchical". See footnote  for more detail.)
This is also the moment Karpathy in cs231n will show you the multi-layer CNN such that features are automatically learned from the simplest to more complex one. Eventually your last layer can just differentiate them using a linear classifier. As there is a "deep" structure that learn the right feature (last layer). Note the key term here is "automatic", all these Gabor-filter like feature are not hand-made. Rather, they are results from back-propagation .
Are there Anything which is "Deep" but not a Neural Network?
Yes and no. It depends on who you talk to. If you talk with ANN researchers/practitioners, they would just tell you "deep learning is just neural network which has more than 1 hidden layer". Indeed, if you think from their perspective, the term "deep learning" could just be a short-form. Yet as we just said, you can also called other methods "deep". So the adjective is not totally void of meaning. But many people would also tell you that because "deep learning" has become such a marketing term, it can now mean many different things. I will say more in next section.
Also the term "deep learning" has been there for a century. Check out Prof. Schmidhuber's thread for more details?
"No Way! X is not Deep but it is also taught in Deep Learning Class, You made a Horrible Mistake!"
I said it with much authority and I know some of you guys would just jump in and argue:
"What about word2vec? It is nothing deep at all, but people still call it Deep learning!!!" "What about all wide architectures such as "wide-deep learning"?" "Arthur, You are Making a HORRIBLE MISTAKE!"
Indeed, the term "deep learning" is being abused these days. More learned people, on the other hand, are usually careful to call certain techniques "deep learning" For example, in cs221d 2015/2016 lectures, Dr. Richard Socher was quite cautious to call word2vec as "deep". His supervisor, Prof. Chris Manning, who is an authority in NLP, is known to dispute whether deep learning is always useful in NLP, simply because some recent advances in NLP really due to deep learning .
I think these cautions make sense. Part of it is that calling everything "deep learning" just blurs what really should be credited in certain technical improvement. The other part is we shouldn't see deep learning as the only type of ML we want to study. There are many ML techniques, some of them are more interesting and practical than deep learning in practice. For example, deep learning is not known to work well with small data scenario. Would I just yell at my boss and say "Because I can't use deep learning, so I can't solve this problem!"? No, I would just test out random forest, support vector machines, GMM and all these nifty methods I learn over the years.
Misleading Claim About Deep Learning (I) - "Deep Learning is about Machine Learning Methods which use a lot of Data!"
So now we come to the arena of misconceptions, I am going to discuss two claims which many people have been drumming about deep learning. But neither of them is the right answer to the question "What is the Difference between Deep and Machine Learning?
The first one you probably heard all the time, "Deep Learning is about ML methods which use a lot of data". Or people would tell you "Oh, deep learning just use a lot of data, right?" This sounds about right, deep learning in these days does use a lot of data. So what's wrong with the statement?
Here is the answer: while deep learning does use a lot of data, before deep learning, other techniques use tons of data too! e.g. Speech recognition before deep learning, i.e. HMM+GMM, can use up to 10k hours of speech. Same for SMT. And you can do SVM+HOG on Imagenet. And more data is always better for those techniques as well. So if you say "deep learning use more data", then you forgot the older techniques also can use more data.
What you can claim is that "deep learning is a more effective way to utilize data". That's very true, because once you get into either GMM or SVM, they would have scalability issues. GMM scales badly when the amount of data is around 10k hour. SVM (with RBF-kernel in particular) is super tough/slow to use when you have ~1 million point of data.
Misleading Claim About Deep Learning II - "Deep Learning is About Using GPU and Having Data Center!"
This particular claim is different from the previous "Data Requirement" claim, but we can debunk it in a similar manner. The reason why it is wrong? Again before deep learning, people have GPUs to do machine learning already. For example, you can use GPU to speed up GMM. Before deep learning is hot, you need a cluster of machines to train acoustic model or language model for speech recognition. You also need tons of RAM to train a language model for SMT. So calling GPU/Data Center/RAM/ASIC/FPGA a differentiator of deep learning is just misleading.
You can say though "Deep Learning has change the computational model from distributed network model to more a single machine-centric paradigm (which each machine has one GPU). But later approaches also tried to combine both CPU-GPU processing together".
Conclusion and "What you say is Just Your Opinion! My Theory makes Equal Sense!"
Indeed, you should always treat what you read on-line with a grain of salt. Being critical is a good thing, having your own opinion is good. But you should also try to avoid equivocate an issue. Meaning: sometimes things have only one side, but you insist there are two equally valid answers. If you do so, you are perhaps making a logical error in your thinking. And a lot of people who made claims such as "deep learning is learning which use more data and use a lot of GPUS" are probably making such thinking errors.
Saying so, I would suggest you to read several good sources to judge my answer, they are:
In any case, I hope that this article helps you. I thank Bob to ask the question, Armaghan Rumi Naik has debunked many misconceptions in the original thread - his understanding on machine learning is clearly above mine and he was able to point out mistakes from other commenters. It is worthwhile for your reading time.
 See "Last Words: Computational Linguistics and Deep Learning"
 Generally whether DL is useful in NLP is widely disputed topic. Take a look of Yoav Goldberg's view on some recent GAN results on language generation. AIDL Weekly #18 also gave an expose on the issue.
 Perhaps another useful term is "hierarchical". In the case of ConvNet the term is right on. As Eric Heitzman comments at AIDL: "(deep structure) They are *not* necessarily recursive, but they *are* necessarily hierarchical since layers always form a hierarchical structure." After Eric's comment, I think both "deep" and "hierarchical" are fair terms to describe methods in "deep learning". (Of course, "hierarchical learning" is a much a poorer marketing term.)
 In earlier draft. I use the term recursive to describe the term "deep", which as Eric Heitzman at AIDL, is not entirely appropriate. "Recursive" give people a feeling that the function is self-recursive or. but actual function are more "nested", like . As a result, I removed the term "recursive" but just call the function "nested function".
Of course, you should be aware that my description is not too mathematically rigorous neither. (I guess it is a fair wordy description though)
20170709 at 6: fix some typos.
20170711: fix more typos.
20170711 at 7:05 p.m.: I got a feedback from Eric Heitzman who points out that the term "recursive" can be deceiving. Thus I wrote footnote .
I have been self-learning deep learning for a while, informally from 2013 when I first read Hinton's "Deep Neural Networks for Acoustic Modeling in Speech Recognition" and through Theano, more "formally" from various classes since the 2015 Summer when I got freshly promoted to Principal Speech Architect . It's not an exaggeration that deep learning changed my life and career. I have been more active than my previous life. e.g. If you are reading this, you are probably directed from the very popular Facebook group, AIDL, which I admin.
So this article was written at the time I finished watching an older version on Richard Socher's cs224d on-line . That, together with Ng's, Hinton's, Li and Karpathy's and Silvers's, are the 5 classes I recommended in my now widely-circulated "Learning Deep Learning - My Top-Five List". I think it's fair to give these sets of classes a name - Basic Five. Because IMO, they are the first fives classes you should go through when you start learning deep learning.
In this post I will say a few words on why I chose these five classes as the Five. Compared to more established bloggers such as Kapathy, Olah or Denny Britz, I am more a learner in the space , experienced perhaps, yet still a learner. So this article and my others usually stress on learning. What you can learn from these classes? Less talk-about, but as important: what is the limitation of learning on-line? As a learner, I think these are interesting discussion, so here you go.
What are the Five?
Just to be clear, here is the classes I'd recommend:
And the ranking is the same as I wrote in Top-Five List. Out of the five, four has official video playlist published on-line for free. With a small fee, you can finish the Ng's and Hinton's class with certification.
How much I actually Went Through the Basic Five
Many beginner articles usually come with gigantic set of links. The authors usually expect you to click through all of them (and learn through them?) When you scrutinize the list, it could amount to more than 100 hours of video watching, and perhaps up to 200 hours of work. I don't know about you, but I would suspect if the author really go through the list themselves.
So it's fair for me to first tell you what I've actually done with the Basic Five as of the first writing (May 13, 2017)
Ng's "Machine Learning"
Finished the class in entirety without certification.
Li and Karpathy's "Convolutional Neural Networks for Visual Recognition" or cs231n
Listened through the class lectures about ~1.5 times. Haven't done any of the homework
Socher's "Deep Learning for Natural Language Processing" or cs224d
Listened through the class lecture once. Haven't done any of the homework.
Silver's "Reinforcement Learning"
Listened through the class lecture 1.5 times. Only worked out few starter problems from Denny Britz's companion exercises.
Hinton's "Neural Network for Machine Learning"
Finished the class in entirety with certification. Listen through the class for ~2.5 times.
This table is likely to update as I go deep into a certain class, but it should tell you the limitation of my reviews. For example, while I have watched through all the class videos, only on Ng's and Hinton's class I have finished the homework. That means my understanding on two of the three "Stanford Trinities" is weaker, nor my understanding of reinforcement learning is solid. Together with my work at Voci, the Hinton's class gives me stronger insight than average commenters on topics such as unsupervised learning.
Why The Basic Five? And Three Millennial Machine Learning Problems
Taking classes is for learning of course. The five classes certainly give you the basics, and if you love to learn the fundamentals of deep learning. And take a look of footnote . The five are not the only classes I sit through last 1.5 years so their choice is not arbitrary. So oh yeah. Those are the stuffs you want to learn. Got it? That's my criterion. 🙂
But that's what other one thousand bloggers would tell you as well. I want to give you a more interesting reason. Here you go:
If you go back in time to the Year 2000. That was the time Google just launched their search engine, and there was no series of Google products and surely there was no Imagenet. What was the most difficult problems for machine learning? I think you would see three of them:
Statistical machine learning,
So what's so special about these three problems then? If you think about that, back in 2000, all three were known to be hard problems. They represent three seemingly different data structures -
Object classification - 2-dimensional, dense array of data
Statistical machine learning (SMT) - discrete symbols, seemingly related by loose rules human called grammars and translation rules
Automatic speech recognition (ASR)- 1-dimensional time series, has similarity to both object classification (through spectrogram), and loosely bound by rules such as dictionary and word grammar.
And you would recall all three problems have interest from the government, big institutions such as Big Four, and startup companies. If you master one of them, you can make a living. Moreover, once you learn them well, you can transfer the knowledge into other problems. For example, handwritten character recognition (HWR) resembles with ASR, and conversational agents work similarly as SMT. That just has to do with the three problems are great metaphor of many other machine learning problems.
Now, okay, let me tell one more thing: even now, there are people still (or trying to) make a living by solving these three problems. Because I never say they are solved. e.g. What about we increase the number of classes from 1000 to 5000? What about instead of Switchboard, we work on conference speech or speech from Youtube? What if I ask you to translate so well that even human cannot distinguish it? That should convince you, "Ah, if there is one method that could solve all these three problems, learning that method would be a great idea!"
And as you can guess, deep learningis that one method revolutionize all these three fields. Now that's why you want to take the Basic Five. Basic Five is not meant to make you the top researchers in the field of deep learning, rather it teaches you just the basic. And at this point of your learning, knowing powerful template of solving problems is important. You would also find going through Basic Five makes you able to read majority of the deep learning problems these days.
So here's why I chose the Five, Ng's and NNML are the essential basics of deep learning. Li and Kaparthy's teaches you object classification to the state of the art. Whereas, Socher would teach you where deep learning is on NLP, it forays into SMT and ASR a little bit, but you have enough to start.
My explanation excludes Silver's reinforcement learning. That admittedly is the goat from the herd. I add Silver's class because increasingly RL is used in even traditionally supervised learning task. And of course, to know the place of RL, you need a solid understanding. Silver's class is perfect for the purpose.
What You Actually Learn
In a way, it also reflect what's really important when learning deep learning. So I will list out 8 points here, because they are repeated them among different courses.
Basics of machine learning: this is mostly from Ng's class. But theme such bias-variance would be repeated in NNML and Silver's class.
Gradient descent: its variants (e.g. ADAM), its alternatives (e.g. second-order method), it's a never-ending study.
Backpropagation: how to view it? As optimizing function, as a computational graph, as flowing of gradient. Different classes give you different points of view. And don't skip them even if you learn it once.
Architecture: The big three family is DNN, CNN and RNN. Why some of them emerge and re-emerge in history. The detail of how they are trained and structured. None of the courses would teach you everything, but going through the five will teach you enough to survive
Image-specific technique: not just classification, but localization/detection/segmentation (as in cs231n 2016 L8, L13). Not just convolution, but "deconvolution" and why we don't like it is called "deconvolution". 🙂
NLP-specific techniques: word2vec, Glovec, how they were applied in NLP-problem such as sentiment classification
(Advanced) Basics of unsupervised learning; mainly from Hinton's, and mainly about techniques 5 years ago such as RBM, DBN, DBM and autoencoders, but they are the basics if you want to learn more advanced ideas such as GAN.
(Advanced) Basics of reinforcement learning: mainly from Silver's class, from the DP-based model to Monte-Carlo and TD.
The Limitation of Autodidacts
By the time you finish the Basic Five, and if you genuinely learn something out of them. Recruiters would start to knock your door. What you think and write about deep learning would appeal to many people. Perhaps you start to answer questions on forums? Or you might even write LinkedIn articles which has many Likes.
All good, but be cautious! During my year of administering AIDL, I've seen many people who purportedly took many deep learning class, but upon few minutes of discussion, I can point out holes in their understanding. Some, after some probing, turned out only take 1 class in entirety. So they don't really grok deeper concept such as back propagation. In other words, they could still improve, but they just refuse to. No wonder, with the hype of deep learning, many smart fellows just choose to start a company or code without really taking time to grok the concepts well.
That's a pity. And all of us should be aware is that self-learning is limited. If you decide to take a formal education path, like going to grad schools, most of the time you will sit with people who are as smart as you and willing to point out your issues daily. So any of your weaknesses will be revealed sooner.
You should also be aware that as deep learning is hyping, your holes of misunderstanding is unlikely to be uncovered. That has nothing to do with whether you work in a job. Many companies just want to hire someone to work on a task, and expect you learn while working.
So what should you do then? I guess my first advice is be humble, be aware of Dunning-Kruger Effect. Self-learning usually give people an intoxicating feeling that they learn a lot. But learning a lot doesn't mean you know everything. There are always higher mountains, you are doing your own disservice to stop learning.
The second thought is you should try out your skill. e.g. It's one thing to know about CNN, it's another to run a training with Imagenet data. If you are smart, the former took a day. For the latter, it took much planning, a powerful machine, and some training to get even Alexnet trained.
My final advice is to talk with people and understand your own limitation. e.g. After reading many posts on AIDL, I notice that while many people understand object classification well enough, they don't really grasp the basics of object localization/detection. In fact, I didn't too even after the first parse of the videos. So what did I do?
I just go through the videos on localization/detection again and again until I understand.
After the Basic Five.......
So some of you would ask "What's next?" Yes, you finished all these classes, as if you can't learn any more! Shake that feeling off! There are tons of things you still want to learn. So I list out several directions you can go:
Completionist: As of the first writing, I still haven't really done all the homework on all five classes, notice that doing homework can really help your understand, so if you are like me, I would suggest you to go back to these homework and test your understanding.
Drilling the Basics of Machine Learning: So this goes another direction - let's work on your fundamentals. For that, you can any Math topics forever. I would say the more important and non-trivial parts perhaps Linear Algebra, Matrix Differentiation and Topology. Also check out this very good link on how to learn college-level of Math.
Specialize on one field: If you want to master just one single field out of the Three Millennial Machine Learning Problems I mentioned, it's important for you to just keep on looking at specialized classes on computer vision or NLP. Since I don't want to clutter this point, let's say I will discuss the relevant classes/material in future articles.
Writing: That's what many of you have been doing, and I think it helps further your understanding. One thing I would suggest is to always write something new and something you want to read yourself. For example, there are too many blog posts on Computer Vision Using Tensorflow in the world. So why not write one which is all about what people don't know? For example, practical transfer learning for object detection. Or what is deconvolution? Or literature review on some non-trivial architectures such as Mask-RCNN? And compare it with existing decoding-encoding structures. Writing this kind of articles takes more time, but remember quality trumps quantity.
Coding/Githubbing: There is a lot of room for re-implementing ideas from papers and open source them. It is also a very useful skill as many companies need it to repeat many trendy deep learning techniques.
Research: If you genuinely understand deep learning, you might see many techniques need refinement. Indeed, currently there is plenty of opportunities to come up with better techniques. Of course, writing papers in the level of a professional researchers is tough and it's out of my scope. But only when you can publish, people would give you respect as part of the community.
Framework: Hacking in C/C++ level of a framework is not for faint of hearts. But if you are my type who loves low-level coding, try to come up with a framework yourself could be a great idea to learn more. e.g. Check out Darknet, which is surprisingly C!
So here you go. The complete Basic Five, what they are, why they were basic, and how you go from here. In a way, it's also a summary of what I learned so far from various classes since Jun 2015. As in my other posts, if I learn more in the future, I would keep this post updated. Hope this post keep you learning deep learning.
 Before 2017, there was no coherent set of Socher's class available on-line. Sadly there was also no legitimate version. So the version I refer to is a mixture of 2015 and 2016 classes. Of course, you may find a legitimate 2017 version of cs224n on Youtube.
 My genuine expertise is speech recognition, unfortunately that's not a topic I can share much due to IP issue.
 Some of you (e.g. from AIDL) would jump up and say "No way! I thought that NLP wasn't solved by deep learning yet!" That's because you are one lost soul and misinformed by misinformed blog post. ASR is the first field being tackled by deep learning, and it dated back to 2010. And most systems you see in SMT are seq2seq based.
 I was in the business of speech recognition from 1998 when I worked on voice-activated project for my undergraduate degree back in HKUST. It was a mess, but that's how I started.
 And the last one, you may always search it through youtube. Of course, it is not legit for me to share it here.
(Redacted from a conversation between me and Gautam.)
Q: "Guys, what is the difference between ML engineer and a data scientist? How they work together? How their work activity differ? Can you walk through with an use case example?"
A: (From Arthur)
"Generally, it is hard to decide what a title means unless you know a bit about the nature of the job, usually it is described in the job description.
But then you can asked what are these terms usually imply. So here is my take:
ML vs data: Usually there is the part of testing/integrating an algorithm and the part of analyzing the data. It's hard to say how much the proportion on both sides. But high dimensional data is more refrained form simple exploratory analysis. So usually people would use the term "ML" more, which means "your job is to run/tune algorithms for us, fun for you right?" But if you are looking at table-based data, then it's like to be "data" type of job.
Engineer vs scientist: In large organization, there is usually a difference between the one who come up with the mathematical model (scientist) vs the one who control the production platform (engineer). e.g. If you are solving a prediction problem, usually "scientist" is the one who come up with a model, but the "engineer" is the guy who create the production system. So you can think of them as the "R" and the "D" in the organization.
IMO, healthy companies usually balance R&D. So you would find a lot of companies would have "junior", "senior", "principal", "director", "VP" prefixed the both track of the titles.
You will sometimes see terms such as "Programmer" or "Architect" replacing "engineer"/"scientist". "Programmer" implies their job is more coding-related, i.e. the one who actual write code. "Architect" is rare, they usually oversee big picture issues among programmers, or act as a balance between R&D organizations."