In your life, there are times you think you know something, yet genuine understanding seems to elude you. It's always frustrating, isn't it? For example, why would all these seemingly simple concepts such as gradients or regularization can throw us off when we learn them since Day 1 of our learning in machine learning?
In programming, there's a term called "grok", grokking something usually means that not only you know the term, but you also have intuitive understanding of the concept. Or as in "Zen and the Art of Motorcycle Maintenance" , you just try to dive deep into a concept, as if it is a journey...... For example, if you really think about speech recognition, then you would realize the frame independence assumption  is very important. Because it simplifies the problem in both search and parameter estimation. Yet it certainly introduces a modeling error. These small things which are not mentioned in classes or lectures are things you need to grok.
That brings us to Course 2 of deeplearning.ai. What are you grokking in this Course? After you take Course 1, should you take Course 2? My answer is yes and here is my reasoning.
Really, What is Gradient Descent?
Gradient descent is a seemingly simple subject - say you want to find a minima of the function a convex function, so you follow the gradient down hill and after many iterations, you eventually hit the minima. Sounds simple right?
Of course, once you start to realize that functions are normally not convex, and they are n-dimensional, and there can be plateaus. Or when you follow the gradient, but it happens to be a wrong direction! So you will have zigzagging when you try to descend. It's a little bit like descending from a real mountain, yet you don't really can't see n-dimensional space!
That explains the early difficulty of deep learning development - Stochastic gradient descent (SGD) was just too slow back in 2000 for DNN. That results in very interesting research of restricted Boltzmann machine (RBM) which was stacked and used to initialize DNN, which was prominent subject of Hinton's NNML after Lecture 8, or pretraining, which is still being used in some recipes in speech recognition as well as financial prediction.
But we are not doing RBM any more! In fact, research in RBM is not as fervent as in 2008.  Why? It has to do with people just understand more about SGD and can run it better - it has to do with initialization, e.g. Glorot's and He's initialization. It also has to do with how gradient descent is done - ADAM is our current best.
So how do you learn these stuffs? Before Ng deeplearning.ai's class, I would say knowledge like this spread out on courses such as cs231n or cs224n. But as I mentioned in the Course 1's review, those are really courses with specific applications in mind. Or you can go to read Michael Nielsen's Neural Network and Deep Learning. Of course, Nielsen's work is a book. So it really depends on whether you have the patience to work through the details while reading. (Also see my review of the book.)
Now you don't have to. The one-stop shop is Course 2. Course 2 actually covers the material I just mentioned such as initialization, gradient descent, as well as deeper concepts such as regularization and batch normalization. That makes me recommend you to keep on taking the course after you finish Course 1. If you take the class, and are also willing to read Sebastian Ruder's Review of SGD or Grabriel Goh's Why Momentum Really Works, you would be much ahead of the game.
As a note, I also like Andrew breaks down many of the SGD algorithm as a smoothing algorithm. That's a new insight for me even after I used SGD many times.
Is it hard?
Nope, as Math goes, Course 1 is probably toughest. Of course, even in Course 1, you will finish coursework faster if you don't overthink the problem. Most notebooks have the derived results for you. On the other hand, you do want to derive the formulae, you do need to have decent skill in matrix calculus.
Is it Necessary to Understand These Details?; Also Top-Down vs Bottom-Up learning, which is Better?
A legitimate question here is that : well, in our current state of deep learning which we have so many toolkits which already implemented techniques such as ADAM. Do I really need to dig so deep?
I do think there are always two views in learning - one is from top-down, which in deep learning, perhaps is to read a bunch of papers, learn the concepts and see if you can wrap you head around them. the fast.ai class is one of them. And 95% of the current AI enthusiasts are following such paths.
What's the problem of the top-down approach? Let me go back to my first paragraph - which is - do you really grok something when you do something top-down? I frequently can't. In my work life, I also heard senior people say that top-down is the way to go. Yet, when I went ahead to check if they truly understand an implementation. They frequently can't give a satisfactory answer. That happens to a lot of senior technical people who later turn to more management. Literally, they lost their touch.
On the other hand, every time, I pop up an editor and write an algorithm, I gain tremendous understanding! For example, I was asked to write a forward inference once with C, you better know what you are doing when you write in C! In fact, I come to have opinion these days that you have to implement an algorithm once before you can claim you understand it.
So how come there are two sides of the opinion then? One of my speculations is that back in 80s/90s, students are often taught to learn how to write program in first writing. That create mindsets that you have to think up a perfect program before you start to write one. Of course, in ML, such mindset is highly impractical because and the ML development process are really experimental. You can't always assume you perfect the settings before you try something.
Another equally dangerous mindset is to say "if you are too focused on details, then you miss the big picture won't come up with something new!" . This I heard a lot when I first do research and it's close to most BS-ty thing I've heard. If you want to come up with something new, the first thing you should learn is all the details of existing works. The so called "big picture" and "details" are always interconnected. That's why in the AIDL forum, we never see young kids, who say "Oh I have this brand new idea, which is completely different from all previous works!", would go anywhere. That's because you always learn how to walk before you run. And knowing the details has no downsides.
Perhaps this is my long reasons why Ng's class is useful for me, even after I read many literature. I distrust people who only talk about theory but don't show any implementation.
This concludes my review of Course 2. To many people, after they took Course 1, they just decide to take Course 2, I don't blame them, but you always want to ask if your time is well-spent.
To me though, taking Course 2 is not just about understanding more on deep learning. It is also my hope to grok some of the seemingly simple concepts in the field. Hope that my review is useful and I will keep you all posted when my Course 3's review is done.
 As Pirsig said - it's really not about motorcycle maintenance.
 Strictly speaking, it is conditional frame independence assumption. But practitioners in ASR frequently just called it frame independence assumption.
 Also see HODL's interview with Ruslan Salakhutdinov, his account is first hand on the rise and fall of RBM.
As you all know, Prof. Ng has a new specialization on Deep Learning. I wrote about the course extensively yet informally, which include two "Quick Impressions" before and after I finished Course 1 to 3 of the specialization. I also wrote three posts just on Heroes on Deep Learning including Prof. Geoffrey Hinton, Prof. Yoshua Bengio and Prof. Pieter Abbeel and Dr. Yuanqing Lin . And Waikit and I started a study group, Coursera deeplearning.ai (C. dl-ai), focused on just the specialization. This is my full review of Course 1 after finish watching all the videos. I will give a description on what the course is about, and why you want to take it. There are already few very good reviews (from Arvind and Gautam). I will write based on my experience as the admin of AIDL, as well as a deep learning learner.
The Most Frequently Asked Question in AIDL
If you don't know, AIDL is one of most active Facebook group on the matter of A.I. and deep learning. So what is the most frequently asked question (FAQ) in our group then? Well, nothing fancy:
How do I start deep learning?
In fact, we got asked that question daily and I have personally answered that question for more than 500 times. Eventually I decided to create an FAQ - which basically points back to "My Top-5 List" which gives a list of resources for beginners.
The Second Most Important Class
That brings us to the question what should be the most important class to take? Oh well, for 90% of the learners these days, I would first recommend Andrew Ng's "Machine Learning", which is both good for beginners or more experienced practitioners (like me). Lucky for me, I took it around 2 years ago and got benefited from the class since then.
But what's next? What would be a good second class? That's always the question on my mind. Karpathy cs231n comes to mind, or may be Socher's cs224[dn] is another choice. But they are too specialized in the subfields. E.g. If you view them from the study of general deep learning, the material in both classes on model architecture are incomplete.
Or you can think of general class such as Hinton's NNML. But the class confuses even PhD friends I know. Indeed, asking beginners to learn restricted Boltzmann machine is just too much. Same can be said for Koller's PGM. Hinton's and Koller's class, to be frank, are quite advanced. It's better to take them if you already know the basics of ML.
That narrows us to several choices which you might already consider: first is fast.ai by Jeremy Howard, second is deep learning specialization from Udacity. But in my view, those class also seems to miss something essential - e.g., fast.ai adopts a top-down approach. But that's not how I learn. I alway love to approach a technical subject from ground up. e.g. If I want to study string search, I would want to rewrite some classic algorithms such as KMP. And for deep learning, I always think you should start with a good implementation of back-propagation.
That's why for a long time, Top-5 List picked cs231n and cs224d as the second and third class. They are the best I can think of after researching ~20 DL classes. Of course, deeplearning.ai changes my belief that either cs231n and cs224d should be the best second class.
Learning Deep Learning by Program Verification
So what so special about deeplearning.ai? Just like Andrew's Machine Learning class, deeplearning.ai follows an approach what I would call program verification. What that means is that instead of guessing whether your algorithm is right just by staring at the code, deeplearning.ai gives you an opportunity to come up with an implementation your own provided that you match with its official one.
Why is it important then? First off, let me say that not everyone believes this is right approach. e.g. Back when I started, many well-intentioned senior scientists told me that such a matching approach is not really good experimentally. Because supposed your experiment have randomness, you should simply run your experiment N times, and calculate the variance. Matching would remove this experimental aspect of your work.
So I certainly understand the point of what the scientists said. But then, in practice, it was a huge pain in the neck to verify if you program is correct. That's why in most of my work I adopt the matching approach. You need to learn a lot about numerical properties of algorithm this way. But once you follow this approach, you will also get an ML tasks done efficiently.
But can you learn in another way? Nope, you got to have some practical experience in implementation. Many people would advocate learning by just reading paper, or just by running pre-prepared programs. I always think that's missing the point - you would lose a lot of understanding if you skip an implementation.
What do you Learn in Course 1?
For the most part, implementing feed-forward (FF) algorithm and back-propagation (BP) algorithm from scratch. Since for most of us, we are just using frameworks such as TF or Keras, such implementation from scratch experience is invaluable. The nice thing about the class is that the mathematical formulation of BP is fined tuned such that it is suitable for implementing on Python numpy, the course designated language.
Wow, Implementing Back Propagation from scratch? Wouldn't it be very difficult?
Not really, in fact, many members finish the class in less than a week. So the key here: when many of us calling it a from-scratch implementation, in fact it is highly guided. All the tough matrix differentiation is done for you. There are also strong hints on what numpy functions you should use. At least for me, homework is very simple. (Also see Footnote )
Do you need to take Ng's "Machine Learning" before you take this class?
That's preferable but not mandatory. Although without knowing the more classical view of ML, you won't be able to understand some of the ideas in the class. e.g. the difference how bias and variance are viewed. In general, all good-old machine learning (GOML) techniques are still used in practice. Learning it up doesn't seem to have any downsides.
You may also notice that both "Machine Learning" and deeplearning.ai covers neural network. So will the material duplicated? Not really. deeplearning.ai would guide you through implementation of multi-layer of deep neural networks, IMO which requires a more careful and consistent formulation than a simple network with one hidden layer. So doing both won't hurt and in fact it's likely that you will have to implement a certain method multiple times in your life anyway.
Wouldn't this class be too Simple for Me?
So another question you might ask. If the class is so simple, does it even make sense to take it? The answer is a resounding yes. I am quite experienced in deep learning (~4 years by now) and I learn machine learning since college. I still found the course very useful, because it offers many useful insights which only industry expert knows. And of course, when a luminary such as Andrew speaks, you do want to listen.
In my case, I also want to take the course so that I can write reviews about it and my colleagues in Voci can ask me questions. But with that in mind, I still learn several things new through listening to Andrew.
That's what I have so far. Follow us on Facebook AIDL, I will post reviews of the later courses in the future.
 So what is a true from-scratch implementation? Perhaps you write everything from C and even the matrix manipulation part?
If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedIn, Plus, Clarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum. Also check out my awesome employer: Voci.
Nov 29, 2017: revised the text once. Mostly rewriting the clunky parts.
Oct 16, 2017: fixed typoes and misc. changes.
Oct 14, 2017: first published
(I wrote it back in Feb 14, 2017.)
I have some leisure lately to browse "Deep Learning" by Goodfellow for the first time. Since it is known as the bible of deep learning, I decide to write a short afterthought post, they are in point form and not too structured.
* If you want to learn the zen of deep learning, "Deep Learning" is the book. In a nutshell, "Deep Learning" is an introductory style text book on nearly every contemporary fields in deep learning. It has a thorough chapter covered Backprop, perhaps best introductory material on SGD, computational graph and Convnet. So the book is very suitable for those who want to further their knowledge after going through 4-5 introductory DL classes.
* Chapter 2 is supposed to go through the basic Math, but it's unlikely to cover everything the book requires. PRML Chapter 6 seems to be a good preliminary before you start reading the book. If you don't feel comfortable about matrix calculus, perhaps you want to read "Matrix Algebra" by Abadir as well.
* There are three parts of the book, Part 1 is all about the basics: math, basic ML, backprop, SGD and such. Part 2 is about how DL is used in real-life applications, Part 3 is about research topics such as E.M. and graphical model in deep learning, or generative models. All three parts deserve your time. The Math and general ML in Part 1 may be better replaced by more technical text such as PRML. But then the rest of the materials are deeper than the popular DL classes. You will also find relevant citations easily.
* I enjoyed Part 1 and 2 a lot, mostly because they are deeper and fill me with interesting details. What about Part 3? While I don't quite grok all the Math, Part 3 is strangely inspiring. For example, I notice a comparison of graphical models and NN. There is also how E.M. is used in latent model. Of course, there is an extensive survey on generative models. It covers difficult models such as deep Boltmann machine, spike-and-slab RBM and many variations. Reading Part 3 makes me want to learn classical machine learning techniques, such as mixture models and graphical models better.
* So I will say you will enjoy Part 3 if you are,
-a DL researcher in unsupervised learning and generative model or
-someone wants to squeeze out the last bit of performance through pre-training.
-someone who want to compare other deep methods such as mixture models or graphical model and NN.
Anyway, that's what I have now. May be I will summarize in a blog post later on, but enjoy these random thoughts for now.
(I am editing my site, so I decide to separate the book list into a separate page.)
I am often asked what the best beginner books on machine learning. Here I list several notable references and they are usually known as "Bibles" in the field. Also read the comments on why they are useful and how you may read them.
Pattern Recognition and Machine Learning by Christopher Bishop
One of the most popular and useful references in general machine learning. It is also the tougher book to read among this list. Generally known as PRML, Pattern Recognition and Machine Learning is a comprehensive treatment on several important and relevant machine learning techniques such as neural networks, graphical models and boosting. There are in-depth discussion as well as supplementary exercises on each techniques.
The book is very Bayesian, and rightly so because Bayesian thinking is very useful in practice. e.g. It's treatment of bias-variance is to treat it as the "frequentist illusion", which is a more advanced view point compared to most beginner classes you would take. (I think only Hinton's class fairly discuss the merit of Bayesian approach.)
While it is a huge tomb, I would still consider the book as a beginner book, because it doesn't really touch all important issues in all techniques. e.g. there is no in-depth discussion in sequential minimal optimization (SMO) in SVM. It is also not a deep learning /deep neural network book. For that Bengio/GoodFellow's book seem to be a much better read.
If you want to reap benefit out of this book, consider to do exercise from the back of the books. Sure it will take you a while, but doing any one of the exercises would give you incredible insight on how different machine techniques work.
Pattern Classification 3rd Edition by R. Duda, P.E. Hart and D.G Stork
Commonly known as "Duda and Hart", its 2nd Edition titled "Pattern Classification and Scene Analysis" was more known to be bible of pattern classification. Of course, nowadays "machine learning" is the more trendy term, and in my view the two topics are quite similar.
The book is highly technical (and perhaps terse) description of machine learning, which I found more senior scientists usually referred to back when I was working at Raytheon BBN.
Compare to PRML, I found that "Duda and Hart" is slightly outdated, but it's treatment on linear classifiers is still very illuminating. The 3rd edition is updated so that there are computer exercises. Since I usually learn an algorithm directly looking at either the original paper or source code, I found these exercises are not as useful. But some of my first mathematical drilling (back in 2000s) on pattern recognition does come from the guided exercises of this book, so I still recommend this book to beginners.
Machine Learning by Tom Mitchell
Compared to PRML and Duda & Hart, Mitchell's book is much shorter and concise, thus more readable. It is also more "rule-based" so there are discussion on concept learning, decision trees e.g.
If you want to read an entire book of machine learning, this could be your first choice. Both PRML and Duda&Hart are not for faint of heart. While Mitchell's book is perhaps less relevant for today's purpose, I still found its discussion of decision tree and artificial neural network very illuminating.
The Master Algorithm : How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos
You can think of it as a popular sci non-fi. It's also a great introduction on several schools of thoughts in machine learning.
Books I heard which are Good.
- Hasti/Tibshirani/Friedman's Elements of Statistical Learning
- Barber's Bayesian Reasoning and Machine Learning
- Murphy's Machine Learning: a Probabilistic Perspective
- MacKay's Information Theory, Inference and Learning Algorithms
- Goodfellow/Bengio/Courville's Deep Learning - the only one on this list which is related to deep learning. (See my impression here.)
More Advanced Books (i.e. They are good but I don't fully Grok them.)
- Perceptrons: An Introduction to Computational Geometry, Expanded Edition by Marvin Minsky and Seymour Papert - an important book which change history of neural network development.
- Parallel Models of Associative Memory by Geoff Hinton - another book of historical interest.
Hi Guys, I recently finished the coursework of Synapses, Neurons and Brains (SNB below). Since neuroscience is really not my expertise, I just want to write a "Quick Impression" post to summarize what I learned:
* Idan Segev is an inspiring professor and you can feel his passion of the topic of the brain throughout the class.
* Prof. Segev is fond of the use of computational neuroscience, and thus simulation approach of connectome. Perhaps thus the topics taught in the class, Hudgins-Huxley model, Rall's cable Model, dendritic computation, the Blue brain projects.
* Compare to Fairhall and Rao's computational neuroscience, which has a general sense of applying ML approach/thinking in Neuroscience. SNB has a stronger emphasis on discussing motivating neuroscientific experiments such as Hubel and Wiesel when discussing the neurocortex. And the "squid experiment" when developing the HH model. So I found it very educational.
* The course also touches on seemingly more philosophical issues such as "Can you download a mind?", "Is it possible to read mind?", "Is there such thing call free will?" Prof. Segev presented his point of view and supporting experiments. I don't want to spoil it out, check it out if you like.
* Finally, it's on coursework - ah, it's all multiple choices and you can try up to 10 times per 8 hours but this is a tough course to pass. The course feature many multi-multiple choices question and doesn't give you any feedback on your mistakes. And you need to understand the course material quite well to get them right correctly.
* Some students even complained that some of the questions don't make sense - I think it is going a bit too far. But it's fair to say that the course wasn't really well maintained in the last 2 years or so. And you don't really see any mentors chime in to help students. That could be a downside for all of us learners.
* But I would say I still learn a lot in the process. So I do recommend you to listen through the lecture if you are into neuroscience. May be what you should decide is if you want to finish all the coursework.
That's what I have. Enjoy!
Re Dorin Ioniţă (Also a longer write-up Sergey Zelvenskiy's post) Whenever people asked me about AI winter. I couldn't help but think of on-line poker in 2008 and web programming in 2000. But let me just focus on web-programming?
At around 1995-2001, there was the time people keep on telling you "web programming" is the future. Many young people were told that if you know html and CGI programming, you would have a bright future. That's not too untrue. In fact, if you get good at web programming at 2000, you probably started a company and made a decent living for .... 3-4 years. But then competition arises, many college starts to include web as a core curriculum - as a result, web programming is sort of a very common skills nowadays. I am not saying it is not useful - but you are usually competing with 100 programmers to get one job.
So back to AI. Since we start to realize AI/DL can be useful, now everyone is jumping onto the wagon. Of course, there are more senior people who has been 'there', joined couple of DARPA projects or worked in ML startup years before deep learning. But most of them are frankly young college kids, who try to have a future with AI/DL. (Check out our forum?) For them, I am afraid it's likely that they will encounter a future similar to web programmers in 2000. The supply of labor will one day surpass the demand. So it's very likely that data science/machine learning is not their final destination.
So am I arguing there is an AI winter coming? Not in the old classical sense of "AI Winter" when research funding dried up. But more on like AI as a product in a product cycle - just like every technology - it will go through a hype cycle. And one day when the reality of the product doesn't meet expectation, things would just bust. It's just the way it is. We can argue to the death on whether deep learning is different or not. But you should know every technology follow similar hype cycle. Some last longer, some don't. We will have to wait and see.
For OP: If you are asking of a career advice though, so here is something I learn from poker (tl;dr story) and many other things in life - if you are genuinely smart, you can always learn up a new topic quicker than other people. That's usually what determine if you can make a living. The rest are luck, karma and whether you buy beers for your friends.
Rephrase: How to come up with an idea in A.I. or Machine Learning?
1, What other people are doing and is it possible to make a twist about it?
2, What is a problem which *you* want to solve in your life. Then think, is there anyway AI/ML can help you? Everyone has some - e.g. I really like to make a the old-style Nintendo Final Fantasy style game. But then drawing the graphics of bitmap character takes insanely amount of time. So is there any way A.I. can help me? Yes, one potential idea is to create an image generator.
Would these ideas work? Who knows? But that's how you come up with ideas. You ignore the feasibility part for the moment. If you feel it is really hard for you to come up with ideas, chances are you are too caught up with the technical field. Read some books, listen to music, make some art and daydream a bit. Then ideas will come.
- Connectionist Temporal Classification <- the book
- But I found that Grave's thesis is easier to follow. e.g. the definition of alpha and beta in the book doesn't make sense to me.
- Few Alex Grave's papers. (here, here, here)
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Baidu's production system based on CTC
- Flat Start Training of CD-CTC-SMBR LSTM RNN Acoustic Models
- Very good explanation on the Math by Andrew Gibiansky: http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-networks/
- A comprehensive explanation of CTC on distll.
- Attention-based seq2seq model:
- END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
- Work from Bengio's group
- Listen, Attend and Spell by William Chan (his thesis)
- Very good presentation by Markus Nussbaum-Thom.
- Wav2letter: https://www.openreview.net/pdf?id=BkUDvt5gg
- EESEN: https://github.com/srvk/eesen
- Stanford-CTC: https://github.com/amaas/stanford-ctc
- Warp-CTC from Baidu: https://github.com/baidu-research/warp-ctc
- Mozilla's implementation
- Neon's implementation
For reference, here are some papers on the hybrid approach:
Hours is one of the taboo topics in the tech industry. I can say couple of things, hopefully not fluffy:
- Most hours are self-reported, so from a data perspective. It's really unclean. Funny story: Since I was 23, I work on weekends regularly, so in my past jobs, there were moments I note down some colleagues of mine who claim who work 60+ hours. What really happen is they only work 35-40. Most of them are stunned when I give them the measurement. There are few of them refused to talk with me later on. (Oh I work for some of them too.)
- Then there is what it means by working long hours (60+ hours). And practically you should wonder why that's the case. How come one can't just issue an Unix command to solve a problem? Or if you want to know what you are doing, how come writing a 2000 words note take one more than 8 hours? How come it takes such a long time to solve your weekly issues? If we talk about coding, it also doesn't make sense. Because once you have the breakdown of a coding problem, you just have to solve them iteratively in small chunks. Usually it doesn't take more than 2 hours.
- So here is a realistic portrait of respectable people I work with which you feel like he works long hours. What they did actually do?
1, They do some work everyday even on holidays/vacations/weekends.
2, They respond to you even at hours such as 1 or 2.
3, They look agitated when things go wrong in their projects.
- Now once you really analyze these behaviors : it doesn't really prove that the person works N hours. What it really means is that they stay up all the time. For the agitation part, it also makes more sense to say "Oh, this guy probably has anger issue, but at least he cares."
- Sadly, there are also many people who really work more than 40, but they are also the least effective people I ever know.
- I should mention that there are more positive part of long hours: first off learning. And my guess it is what the job description really means - you spent all your moments to learn. You might code daily but if you don't learn, then your speed won't improve at all. So this extra cost of learning is always worthwhile to pay. And that's why we always encourage members to learn.
- Before I go, I actually follow the scheduling method from "Learning How to Learn". i.e. I took frequent breaks after 45-60 mins intense works. And my view of productivity is to continuously learn. Because new skills usually improve your workflow. Some of my past employers have huge issues with my approach. So you should understand my view is biased.
- I would also add, there are individuals who can really work 80 hours and actually code. Usually they are either obliged by culture, influenced by drugs or shaped by their very special genes.
Hope this helps,