# Some Notes on Building a DL-Machine (Installing CUDA 8.0 on an Ubuntu 14.04)

I mostly just follow this link by Slav IvanovThe build is for a friend, so nothing fancy.  The only thing different is my friend got a Titan X instead of a 1080, and he requires Ubuntu 14.04.

As a rule of Linux installation, you can't always follow the instruction as if it is casted in stone.   So what I did differently?  So I did:

1. sudo apt-get update
2. sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get --assume-yes install software-properties-common
sudo apt-get --assume-yes install git

(Notice unlike Slavv, I didn't do an upgrade because upgrade seems to easily screw up CUDA 8.0 installation later on.)

3.  So this is a major difference, Slavv suggested to install CUDA directly. No, no, no.  What you should do is to make sure driver of your graphic card is installed first.  And Ubuntu/Nvidia has good support on it.  Following this thread, I found that installing Titan require updating driving to nvidia-367.  So I just did an apt-get install nvidia-367.
4. At this point if you reboot, you will notice that 14.04 recognize the display card. Usually what it means is the display is in the right resolution.   (If the driver is not installed properly, then you will find a display with overlarged icons, etc.)
5. So now, you can test your setting, by typing nvidia-smi.  Normally a screen would look like this one.  If you are running within a GUI, there should be at least one process running on the GPU.
6. Now all good, you now have the driver of the display card, now you can really follow Slavv's procedure :
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_8.0.61-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-toolkit-8.0
7. This is the point when I stopped.  And I left it to my friend to install more software.   Usually, installing display card driver and CUDA are the toughest steps in a Linux software build.  So the rest should be quite smooth.

Arthur Chan

Acknowledgement: Thanks for the great post by Slav Ivanov!

# Review of Ng's deeplearning.ai Course 4: Convolutional Neural Networks

(You can find my reviews on previous courses here: Course 1, Course 2 and Course 3. )

Time flies, I finished Course 4 around a month ago and finally have a chance to write a full review.   Course 4 is different from the first three deeplearning.ai courses, which focused on fundamental understanding of deep learning topics such as back propagation (Course 1) , tuning hyperparameters (Course 2) and decide what improvement strategy is the best (Course 3) .  Course 4 is more about an important application of deep learning: computer vision.

Focusing on computer vision make designing Course 4 subjects to a distinct challenges as a course: how does Course 4 scales up with other existing computer vision class?   Would it be comparable with the greats such as Stanford cs231n?  For these questions, I will do a comparison between Course 4 and cs231n in this article.   My goal is to answer how you would choose between the two classes in your learning process.

# Convolutional Neural Network In the Context of Deep Learning

Convolutional neural networks (CNN) has a very special place in deep learning.   For the most part, you can think of it as interesting special case of a vanilla feed-forward network with parameters tied. Computationally, you can parallelize it much better than technique such as recurrent neural networks.   Of course, it is prominent in image classification (since LeNet-5).   But then it is also frequently used in sequence modeling such as speech recognition and text classification (check out cs224n for details).   I guess, more importantly, since image classification is also used a template of development in many other newer application.  It makes learning CNN sort of mandatory for students of deep learning.

# Learning Deep-Learning-based Computer Vision before deeplearning.ai

Interesting enough, there is a rather standard option to learning deep learning-based computer vision on-line.   Yes! You guess it right! It is cs231n which used to be taught by then Stanford PhD candidates, Andrej Karpathy in 2015/16.   [1]  To recap, cs231n is not only a good class for computer vision, it is also a good class for learning basics of deep learning.   Also as now famous Dr. Karpathy said, it has probably one of the best explanation of back-propagation.    My only criticism for the class (as I mentioned in earlier reviews) is that as a first class, it is too focused on image recognition.   But as a first class of deep-learning-based computer vision, I think it was the best.

# Course 4: Convolutional Neural Networks Briefly

Would Course 4 changes my opinion about cs231n then?   I guess we should look at it in perspective.   Comparing Course 4 with cs231n is comparing orange and apple.  Course 4 is a month-long class which is suitable for absolute beginners.   If you look into it course 4 basically is a quick introductory class.  Week 1 focuses on what CNN is, Week 2 and 3 talks about 2 prominent applications: image classification, image detection.  Whereas Week 4 are about fun stuffs such as face verification and  image transfer.

Many people I know finish the class within 3 days when the class started.   Whereas cs231n is a semester-long course which contain ~18 hours of video to watch with more substantial (and difficult) homework problems.   It is more suitable for people who already have at least one or two machine learning full courses at their belt.

So my take is that Course 4 can be a good first class of deep-learning-based computer vision, but it is not a replacement of cs231n.  So if you only took Course 4, you will find that there are still a lot in computer vision you don't grok.   My advice is you should then audit cs231n afterward, or else your understanding would still have holes.

# What if I already took cs231n? Would Course 4 still helps me?

Absolutely.   While Course 4 is much shorter - remember that a lot of deep learning concepts are obscure.  It doesn't hurt to learn the same thing in different ways.    And Course 4 offer different perspectives on several topics:

• For starter, Course 4, just like all other deeplearning.ai has homework which require code verification at every step.  As I argued in an earlier review, that's a huge plus for learning.
• Then there is the treatment of individual topics,  I found that Ng's treatment on image detection is refreshing - the more conventional view (which cs231n took) was to start from RCNN and its two faster variants, then bring up YOLO.   But Andrew just decide to go with YOLO instead.   Notice that neither of the classes had gave detail description of the algorithm.  (Reading the paper is probably the best.)  But YOLO is indeed more practical than RCNN variants.
• On Week 4 about applications,  such as face verification and Siamese networks are actually new to me.   Andrew also give a very nice explanation on why image transfer really works.
• As always, even a new note for old topics matter.  E.g.  This is the first time I am aware the convolution in deep learning is different from convolution in signal processing. (See Week 1).   I also found that Andrew's note on various image classification papers are gems.  Even if you you read those paper, I do suggest you to listen to him again.

# Weakness(es)

Since I admin an unofficial forum for the course,  I learn that there are fairly obvious problems with the courses.   For example, back in December when I took the course, there is one homework you need to submit an algorithm which wouldn't match the notebook.   Also, there was also a period of time where submission was very slow, which I need to fix the file downloading to straighten it up.   I do think those are frustrating issue.  Hopefully, by the time when you read this article, the staff has already fixed the issues. [2]

To be fair, even the great NNML by Hinton has glitches here and there in their homeworks.   So I am not entirely surprised glitches happen in deeplearning.ai.   Of course, I would still highly recommend the class.

# Conclusion

There you have it - I reviewed Course 4 of deeplearning.ai.  Unlike earlier parts of the courses, Course 4 has a very obvious competitor: cs231n.  And I don't quite put Course 4 as the one course you can take and master computer vision.   My belief is you need to go through both Course4 and cs23n to have reasonable understanding.

But as a first class of DL-based computer vision.  I still think Course 4 has tremendous value.  So once again I highly recommend yo all to take the class.

As a final note, I was able to catch up reviews for all classes in deeplearning.ai.  Now all eyes on Course 5 and currently (as of Jan 23), it is set to launch at Jan 31.  Before that, do check out ourforum AIDL and Coursera deeplearning.ai for more discussion!

Arthur Chan

First published at http://thegrandjanitor.com/2018/01/24/review-of-ngs-deeplearning-ai-course-4-convolutional-neural-networks/

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitterLinkedInPlusClarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Footnotes:

[1] Funny enough, while I went through all cs231n 2016 videos a while ago, I never wrote a review about the course.

[2] As a side note, I think it has to do with Andrew and the staffs are probably rushing to create the class.   That's why I was actually relieved when I learn that Course 5 will be released in January.  Hopefully this gives more time for the staffs to perfect the class.

# Review of Ng's deeplearning.ai Course 3: Structuring Machine Learning Projects

(Also see my review of Course 1 and Course 2.)

As you might know, deeplearning.ai courses were released in two batches.  The first batch contains Course 1 to 3.  And only recently (as of November 15),  Course 4, "Convolution Neural Networks" was released.  And Course 5 is supposedly released in late November.   So Course 3, "Structuring Machine Learning Projects" was more the "final" course in the first batch.  It is also a good pause of the first and second half of the course:  The first half  was more the foundation of deep learning, whereas the second half was more about applications of deep learning.

So here you are, learning something new in deep learning now, isn't it time to apply these new found knowledge?  Course 3 says "Hold on!"  It turns out before you start to do machine learning,  you need to slow down and think about how to plan a task.

In fact, in practice, Course 3 is perhaps the most important course among all the courses in the specialization.   The Math in Course 1 may be tougher, and Course 5 could have difficult concepts such as RNN or LSTM which are hard to grok.  They are also longer than Course 3 (which only last 2 weeks). But in grand scheme of things, they are not as important as Course 3.  I am going to discuss why.

# What do you actually do as an ML Engineer?

Let me digress a bit: I know many of my readers are young college students who are looking for careers in data science or machine learning.  But what do people actually do in the business of machine learning or AI?   I think this is a legit question because I was very confused when I first started out.

Oh well, it really depends on how much you are on the development side or research side of your team.  Terms like "Research" and "Development" can have various meaning depends on the title.  But you can think "researcher" are the people who try to get a new techniques working - usually the criterion is whether it beats the status quo such as accuracy performance. "Developers" on the other hand, are people come up with a production implementation.     You can think that many ML jobs are really in between the spectrum of "developers" and "researchers".   e.g.  I am usually known for my skill as a architect.  That usually means I have the knowledge on both sides.  My quote on my skills is usually "50% development and 50% research".  There are also people who are highly specialized in either side.  But I will focus on the research-side more in this article.

# So, What do you actually do as an ML Researcher then?

Now I can see a lot of you jump up and say "OH I WANT TO BE A RESEARCHER!"  Yes, because doing research is fun, right?   You just need to train some models and beat the baseline and write a paper.  BOOM! In fact, if you are good, you just need to ask people to do your research.  Woohoo, you are happy and done, right?

Oh well,  in reality, good researchers are usually fairly good coders themselves.  Especially in applied field such as machine learning, my guess is out of 100 researchers in an institute, may be there is perhaps 1 person who is really a "thinking staff". i.e.  They do nothing other than coming up with new theory or writing proposal.   Just like you, I admire the life of a pure academician.  But in our time, you usually have to be either very smart and very lucky to be one of them. (There is a tl;dr explanation here, but it is out of scope of this article.)

"Okay, okay, got it..... so can we start to have some fun now?   We just need to do some coding, right? " Not really, the first step before you can work on fun stuffs such as modeling, or implement new algorithm, is to clean-up data.   So say if you work on a fraudulent transaction detection, the first is to load a giant table somewhere so that you can query it and get the training data.  Then you want to clean the data, and massage the data so that it can be an input of ML engine.   Notice that by themselves these tasks can be non-trivial as well.

# Course 3: Structuring Machine Learning Projects

Then there you are, after you code, you clean up your data, finally you have some time to do machine learning.    Notice that your time after all these "chores" are actually quite limited.    That makes how to use your time effectively a very important topic.   And here comes why you want to take Course 3: Andrew teaches you the basics of how to assign time/resource in a deep learning task.   e.g. How large are your train/validation/test sets?  When should you stop your development?   What is human performance?   What if there are mismatches between your train set/test set?   If you are stuck, should you tune your hyperparemeters more? Or should you regularize?

In a way, Course 3 is a reminiscence of "Machine Learning"'s  Week 6 and Week 11, basically what you try to learn is to make good "meta-decision"e of all your projects you will work for your life time.  I also think it's the right stuffs in your ML career.

One final note: as you might notice in my last two reviews, I usually tried to compare deeplearning.ai with other classes.   But Course 3 is quite unique, so you might only find similar material on machine learning course which focus on theory.   But Ng's treatment is unique:  first what he gave is practical and easy to understand advice.  Then his advice focused on deep learning - while we are talking about similar principle.   Working on deep learning usually implies special circumstance - such as close to human performance, and you might just have low train and test set performance.  Those scenarios did appear in the past - but only in cutting edge ML evaluation involved the best ML teams.  So you don't normally hear about it in a course,  but now Andrew tell you all.  Doesn't that worth the price of \$49? 🙂

# Conclusion

So here you have it.  This is my review of Course 3 of deeplearning.ai.  Surprising even to me, I actually write more than I expect for these two-week course.   Perhaps the main reason is - I really hope this course were there say 3 years ago.  This would have change the course of some projects I develop.

May be it's too late for me..... but if you are early in deep learning, do recognize the importance of Course 3, or any advices you hear similar to what Course 3 taught.  It will save you much time - not just on one ML task but many ML tasks you will work in your career.

Arthur Chan

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitterLinkedInPlusClarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

# Review of Ng's deeplearning.ai Course 2: Improving Deep Neural Networks

(My Reviews on Course 2 and Course 3.)

In your life, there are times you think you know something, yet genuine understanding seems to elude you.  It's always frustrating, isn't it?   For example, why would all these seemingly simple concepts such as gradients or regularization can throw us off when we learn them since Day 1 of our learning in machine learning?

In programming, there's a term called "grok", grokking something usually means that not only you know the term, but you also have intuitive understanding of the concept.    Or as in "Zen and the Art of Motorcycle Maintenance" [1], you just try to dive deep into a concept, as if it is a journey...... For example, if you really think about speech recognition, then you would realize the frame independence  assumption [2] is very important.   Because it simplifies the problem in both search and parameter estimation.  Yet it certainly introduces a modeling error.  These small things which are not mentioned in classes or lectures are things you need to grok.

That brings us to Course 2 of deeplearning.ai.  What are you grokking in this Course?  After you take Course 1, should you take Course 2?  My answer is yes and here is my reasoning.

# Really, What is Gradient Descent?

Gradient descent is a seemingly simple subject - say you want to find a minima of the function a convex function, so you follow the gradient down hill and after many iterations, you eventually hit the minima.  Sounds simple right?

Of course, once you start to realize that functions are normally not convex, and they are n-dimensional, and there can be plateaus.  Or when you follow the gradient,  but it happens to be a wrong direction! So you will have zigzagging when you try to descend.   It's a little bit like descending from a real mountain, yet you don't really can't see n-dimensional space!

That explains the early difficulty of deep learning development - Stochastic gradient descent (SGD) was just too slow back in 2000 for DNN. That results in very interesting research of restricted Boltzmann machine (RBM) which was stacked and used to  initialize DNN, which was prominent subject of Hinton's NNML after Lecture 8, or pretraining, which is still being used in some recipes in speech recognition as well as financial prediction.

But we are not doing RBM any more! In fact, research in RBM is not as fervent as in 2008. [4] Why? It has to do with people just understand more about SGD and can run it better - it has to do with initialization, e.g. Glorot's and He's initialization.   It also has to do with how gradient descent is done - ADAM is our current best.

So how do you learn these stuffs?  Before Ng deeplearning.ai's class, I would say knowledge like this spread out on courses such as cs231n or cs224n.  But as I mentioned in the Course 1's review, those are really courses with specific applications in mind.  Or you can go to read Michael Nielsen's Neural Network and Deep Learning.   Of course, Nielsen's work is a book.  So it really depends on whether you have the patience to work through the details while reading.  (Also see my review of the book.)

Now you don't have to.  The one-stop shop is Course 2.  Course 2 actually covers the material I just mentioned such as initialization, gradient descent, as well as deeper concepts such as regularization  and batch normalization.   That makes me recommend you to keep on taking the course after you finish Course 1.  If you take the class, and are also willing to read Sebastian Ruder's Review of SGD or Grabriel Goh's Why Momentum Really Works, you would be much ahead of the game.

As a note, I also like Andrew breaks down many of the SGD algorithm as a smoothing algorithm.   That's a new insight for me even after I used SGD many times.

# Is it hard?

Nope, as Math goes, Course 1 is probably toughest.  Of course, even in Course 1, you will finish coursework faster if you don't overthink the problem.  Most notebooks have the derived results for you.  On the other hand, you do want to derive the formulae,  you do need to have decent skill in matrix calculus.

# Is it Necessary to Understand These Details?; Also Top-Down vs Bottom-Up learning, which is Better?

A legitimate question here is that : well, in our current state of deep learning which we have so many toolkits which already implemented techniques such as ADAM.  Do I really need to dig so deep?

I do think there are always two views in learning - one is from top-down, which in deep learning, perhaps is to read a bunch of papers, learn the concepts and see if you can wrap you head around them.  the fast.ai class is one of them.   And 95% of the current AI enthusiasts are following such paths.

What's the problem of the top-down approach?  Let me go back to my first paragraph - which is - do you really grok something when you do something top-down?  I frequently can't.   In my work life, I also heard senior people say that top-down is the way to go.  Yet, when I went ahead to check if they truly understand an implementation.  They frequently can't give a satisfactory answer.  That happens to a lot of senior technical people who later turn to more management.   Literally, they lost their touch.

On the other hand, every time, I pop up an editor and write an algorithm, I gain tremendous understanding!   For example, I was asked to write a forward inference once with C, you better know what you are doing when you write in C!   In fact, I come to have opinion these days that you have to implement an algorithm once before you can claim you understand it.

So how come there are two sides of the opinion then?  One of my speculations is that back in 80s/90s, students are often taught to learn how to write program in first writing.  That create mindsets that you have to think up a perfect program before you start to write one.   Of course, in ML, such mindset is highly impractical because and the ML development process  are really experimental.  You can't always assume you perfect the settings before you try something.

Another equally dangerous mindset is to say "if you are too focused on details, then you miss the big picture won't come up with something new!" . This I heard a lot when I first do research and it's close to most BS-ty thing I've heard.  If you want to come up with something new, the first thing you should learn is all the details of existing works.  The so called "big picture" and "details" are always interconnected.  That's why in the AIDL forum, we never see young kids, who say "Oh I have this brand new idea, which is completely different from all previous works!", would go anywhere.  That's because you always learn how to walk before you run.   And knowing the details has no downsides.

Perhaps this is my long reasons why Ng's class is useful for me, even after I read many literature.  I distrust people who only talk about theory but don't show any implementation.

# Conclusion

This concludes my review of Course 2.  To many people, after they took Course 1, they just decide to take Course 2, I don't blame them, but you always want to ask if your time is well-spent.

To me though, taking Course 2 is not just about understanding more on deep learning.  It is also my hope to grok some of the seemingly simple concepts in the field.   Hope that my review is useful and I will keep you all posted when my Course 3's review is done.

Arthur

Footnotes:
[1] As Pirsig said - it's really not about motorcycle maintenance.

[2] Strictly speaking, it is conditional frame independence assumption.  But practitioners in ASR frequently just called it frame independence assumption.

[3] Also see HODL's interview with Ruslan Salakhutdinov, his account is first hand on the rise and fall of RBM.

# Review of Ng's deeplearning.ai Course 1: Neural Networks and Deep Learning

(See my reviews on Course 2 and Course 3.)

As you all know, Prof. Ng has a new specialization on Deep Learning. I wrote about the course extensively yet informally, which include two "Quick Impressions" before and after I finished Course 1 to 3 of the specialization.  I also wrote three posts just on Heroes on Deep Learning including Prof. Geoffrey HintonProf. Yoshua Bengio and Prof. Pieter Abbeel and Dr. Yuanqing Lin .    And Waikit and I started a study group, Coursera deeplearning.ai (C. dl-ai), focused on just the specialization.    This is my full review of Course 1 after finish watching all the videos.   I will give a description on what the course is about, and why you want to take it.   There are already few very good reviews (from Arvind and Gautam).  I will write based on my experience as the admin of AIDL, as well as a deep learning learner.

# The Most Frequently Asked Question in AIDL

If you don't know, AIDL is one of most active Facebook group on the matter of A.I. and deep learning.  So what is the most frequently asked question (FAQ) in our group then?  Well, nothing fancy:

How do I start deep learning?

In fact, we got asked that question daily and I have personally answered that question for more than 500 times.   Eventually I decided to create an FAQ - which basically points back to "My Top-5 List" which gives a list of resources for beginners.

# The Second Most Important Class

That brings us to the question what should be the most important class to take?   Oh well, for 90% of the learners these days, I would first recommend Andrew Ng's "Machine Learning", which is both good for beginners or more experienced practitioners (like me).  Lucky for me, I took it around 2 years ago and got benefited from the class since then.

But what's next? What would be a good second class?  That's always the question on my mind.   Karpathy cs231n comes to mind,  or may be Socher's cs224[dn] is another choice.    But they are too specialized in the subfields.   E.g. If you view them from the study of general deep learning,  the material in both classes on model architecture are incomplete.

Or you can think of general class such as Hinton's NNML.  But the class confuses even PhD friends I know.  Indeed, asking beginners to learn restricted Boltzmann machine is just too much.   Same can be said for Koller's PGM.   Hinton's and Koller's class, to be frank, are quite advanced.  It's better to take them if you already know the basics of ML.

That narrows us to several choices which you might already consider:  first is fast.ai by Jeremy Howard, second is deep learning specialization from Udacity.   But in my view, those class also seems to miss something essential -   e.g., fast.ai adopts a  top-down approach.  But that's not how I learn.  I alway love to approach a technical subject from ground up.  e.g.  If I want to study string search, I would want to rewrite some classic algorithms such as KMP.  And for deep learning, I always think you should start with a good implementation of back-propagation.

That's why for a long time, Top-5 List picked cs231n and cs224d as the second and third class.   They are the best I can think of  after researching ~20 DL classes.    Of course, deeplearning.ai changes my belief that either cs231n and cs224d should be the best second class.

# Learning Deep Learning by Program Verification

So what so special about deeplearning.ai? Just like Andrew's Machine Learning class, deeplearning.ai follows an approach what I would call program verification.   What that means is that instead of guessing whether your algorithm is right just by staring at the code, deeplearning.ai gives you an opportunity to come up with an implementation your own provided that you match with its official one.

Why is it important then?  First off, let me say that not everyone believes this is right approach.   e.g. Back when I started, many well-intentioned senior scientists told me that such a matching approach is not really good experimentally.  Because supposed your experiment have randomness, you should simply run your experiment N times, and calculate the variance.  Matching would remove this experimental aspect of your work.

So I certainly understand the point of what the scientists said.  But then, in practice, it was a huge pain in the neck to verify if you program is correct.  That's why in most of my work I adopt the matching approach.  You need to learn a lot about numerical properties of algorithm this way.  But once you follow this approach, you will also get an ML tasks done efficiently.

But can you learn in another way? Nope, you got to have some practical experience in implementation.  Many people would advocate learning by just reading paper, or just by running pre-prepared programs.  I always think that's missing the point - you would lose a lot of understanding if you skip an implementation.

# What do you Learn in Course 1?

For the most part, implementing feed-forward (FF) algorithm and back-propagation (BP) algorithm from scratch.  Since for most of us, we are just using frameworks such as TF or Keras, such implementation from scratch experience is invaluable.  The nice thing about the class is that the mathematical formulation of BP is fined tuned such that it is suitable for implementing on Python numpy, the course designated language.

# Wow, Implementing Back Propagation from scratch?  Wouldn't it be very difficult?

Not really, in fact, many members finish the class in less than a week.  So the key here: when many of us calling it a from-scratch implementation, in fact it is highly guided.  All the tough matrix differentiation is done for you.  There are also strong hints on what numpy functions you should use.   At least for me, homework is very simple. (Also see Footnote [1])

# Do you need to take Ng's "Machine Learning" before you take this class?

That's preferable but not mandatory.  Although without knowing the more classical view of ML, you won't be able to understand some of the ideas in the class.  e.g. the difference how bias and variance are viewed.   In general, all good-old machine learning (GOML) techniques are still used in practice.  Learning it up doesn't seem to have any downsides.

You may also notice that both "Machine Learning" and deeplearning.ai covers neural network.   So will the material duplicated?  Not really.  deeplearning.ai would guide you through implementation of multi-layer of deep neural networks, IMO which requires a more careful and consistent formulation than a simple network with one hidden layer.  So doing both won't hurt and in fact it's likely that you will have to implement a certain method multiple times in your life anyway.

# Wouldn't this class be too Simple for Me?

So another question you might ask.  If the class is so simple, does it even make sense to take it?   The answer is a resounding yes.  I am quite experienced in deep learning (~4 years by now) and I learn machine learning since college.  I still found the course very useful, because it offers many useful insights which only industry expert knows.  And of course, when a luminary such as Andrew speaks, you do want to listen.

In my case, I also want to take the course so that I can write reviews about it and my colleagues in Voci can ask me questions.  But with that in mind, I still learn several things new through listening to Andrew.

# Conclusion

That's what I have so far.   Follow us on Facebook AIDL, I will post reviews of the later courses in the future.

Arthur

[1] So what is a true from-scratch  implementation? Perhaps you write everything from C and even the matrix manipulation part?

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitterLinkedInPlusClarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

History:
Nov 29, 2017: revised the text once. Mostly rewriting the clunky parts.
Oct 16, 2017: fixed typoes and misc. changes.
Oct 14, 2017: first published

# Some Useful Links on Neural Machine Translation

Some good resources for NNMT

Tutorial:

a bit special: Tensor2Tensor uses a novel architecture instead of pure RNN/CNN decoder/encoder.   It gives a surprisingly large amount of gain.  So it's likely that it will become a trend in NNMT in the future.

Important papers:

• Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation by Cho Et al. (link) - Very innovative and smart paper by Kyunghyun Cho.  It also introduces GRU.
• Sequence to Sequence Learning with Neural Networks by Ilya Sutskever (link) - By Google's researchers, and perhaps it shows for the first time an NMT system is comparable to the traditional pipeline.
• Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (link)
• Neural Machine Translation by Joint Learning to Align and Translate by Dzmitry Bahdanau (link) - The paper which introduce attention
• Neural Machine Translation by Min-Thuong Luong (link)
• Effective Approaches to Attention-based Neural Machine Translation by Min-Thuong Luong (link) - On how to improve attention approach based on local attention.
• Massive Exploration of Neural Machine Translation Architectures by Britz et al (link)
• Recurrent Convolutional Neural Networks for Discourse Compositionality by Kalchbrenner and Blunsom (link)

Important Blog Posts/Web page:

Summarization:

Usage in Dialogue System:

Others: (Unsorted, and seems less important)

# Some Preliminary Resources for Deep Learning On NLP

Some list of Deep Learning on NLP - unsorted.

• fasttext
• https://github.com/shashankg7/Deep-Learning-for-NLP-Resources
• https://github.com/oxford-cs-deepnlp-2017
• https://github.com/keon/awesome-nlp#implementation                                             *

# Quick Impression on deeplearning.ai's "Heroes of Deep Learning" with Prof. Yoshua Bengio

Quick Impression on deeplearning.ai's "Heroes of Deep Learning". This time is the interview of Prof. Yoshua Bengio. As always, don't post any copyrighted material here at the forum!

* Out of the 'Canadian Mafia', Prof Bengio is perhaps the less known among the three. Prof. Hinton and Prof. Lecun have their own courses, and as you know they work for Google and Facebook respectively. Whereas Prof. Bengio does work for MS, the role is more of a consultant.

* You may know him as one of the coauthors of the book "Deep Learning". But then again, who really understand that book, especially part III?

* Whereas Prof. Hinton strikes me as an eccentric polymath, Prof. Bengio is more a conventional scholar. He was influenced by Hinton in his early study of AI which was mostly expert-system based.

* That explains why everyone seems to leave his interview out, which I found it very intersting.

* He named several of his group's contributions: most of what he named was all fundamental results. Like Glorot and Bengio 2010 on now widely called Xavier's initialization or attention in machine translation, his early work in language model using neural network, of course, the GAN from GoodFellow. All are more technical results. But once you think about these ideas, they are about understanding, rather than trying to beat the current records.

* Then he say few things about early deep learning researcher which surprised me: First is on depth. As it turns out, the benefit of depth was not as clear early in 2000s. That's why when I graduated in my Master (2003), I never heard of the revival of neural network.

* And then there is the doubt no using ReLU, which is the current day staple of convnet. But the reason makes so much sense - ReLU is not smooth on all points of R. So would that causes a problem. Many one who know some calculus would doubt rationally.

* His idea on learning deep learning is also quite on point - he believe you can learn DL in 5-6 months if you had the right training - i.e. good computer science and Math education. Then you can just pick up DL by taking courses and reading proceedings from ICML.

* Finally, it is his current research on the fusion of neural networks and neuroscience. I found this part fascinating. Would backprop really used in brain a swell?

That's what I have. Hope you enjoy!

# Quick Impression on deeplearning.ai (After Finishing Coursework)

Following experienced guys like Arvind Nagaraj​ and Gautam Karmakar​, I just finished all course works for deeplearning.ai. I haven't finished all videos yet. But it's a good idea to write another "impression" post.

* It took me about 10 days clock time to finish all course works. The actual work would only take me around 5-6 hours. I guess my experience speaks for many veteran members at AIDL.
* python numpy has its quirk. But if you know R or matlab/octave, you are good to go.
* Assignment of Course 1 is to guide you building an NN "from scratch". Course 2 is to guide you to implement several useful initialization/regularization/optimization algorithms. They are quite cute - you mostly just fill in the right code in python numpy.
* I quoted "from scratch" because you actually don't need to write your own matrix routine. So this "from scratch" is quite different from people who try to write a NN package "from scratch using C", in which you probably need to write a bit of code on matrix manipulation, and derive a set of formulate for your codebase. So Ng's Course gives you a taste of how these program feel like. In that regard, perhaps the next best thing is Michael Nielsen's NNDL book.
* Course 3 is quiz-only. So by far, is the easiest to finish. Just like Arvind and Gautam, I think it is the most intriguing course within the series (so far). Because it gives you a lot of many big picture advice on how to improve an ML system. Some of these advices are new to me.

Anyway, that's what I have, once I watch all the videos, I will also come up with a full review. Before that, go check out our study group "Coursera deeplearning.ai"?

Thanks,
Arthur Chan​

# Quick Impression on deeplearning.ai Heroes of Deep Learning - Geoffrey Hinton

So I was going through deeplearning.ai. You know we started a new FB group on it? We haven't public it yet but yes we are v. exited.

Now one thing you might notice of the class is that there is this optional lectures which Andrew Ng is interviewing luminaries of deep learning. Those lectures, in my view, are very different from the course lectures. Most of the topics mentioned are research and beginners would find it very perplexed. So I think these lectures deserve separate sets of notes. I still call it "quick impression" because usually I will do around 1-2 layers of literature search before I'd say I grok a video.

* Sorry I couldn't post the video because it is copyrighted by Coursera, but it should be very easy for you to find it. Of course, respect our forum rules and don't post the video here.

* This is a very interesting 40-min interview of Prof. Geoffrey Hinton. Perhaps it should also be seen as an optional material after you finish his class NNML on coursera.

* The interview is in research-level. So that means you would understand more if you took NNML or read part of Part III of deep learning.

* There are some material you heard from Prof. Hinton before, including how he became a NN/Brain researcher, how he came up with backprop and why he is not the first one who come up.

* There are also some which is new to me, like why does his and Rumelhart's paper was so influential. Oh, it has to do with his first experience on marriage relationship (Lecture 2 of NNML).

* The role of Prof. Ng in the interview is quite interesting. Andrew is also a giant in deep learning, but Prof Hinton is more the founder of the field. So you can see that Prof. Ng was trying to understand several of Prof. Hinton's thought, such as 1) Does back-propagation appear in brain? 2) The idea of capsule, which is a distributed representation of a feature vector, and allow a kind of what Hinton called "agreement". 3) Unsupervised learning such as VAE.

* On Prof. Hinton's favorite idea, and not to my surprise:
1) Boltzmann machine, 2) Stacking RBM to SBN, 3) variational method. I frankly don't fully understand Pt. 3. But then L10 to L14 of NNML are all about Pt 1 and 2. Unfortunately, not everyone love to talk about Boltzmann machine - they are not hot as GAN, and perceived as not useful at all. But if you want to understand the origin of deep learning, and one way to pre-train your DNN, you should go to take NNML.

* Prof. Hinton's advice on research is also very entertaining - he suggest you don't always read up from literature first - which according to him is good for creative researchers.

* The part I like most is Prof Hinton's view of why computer science departments are not catching up on teaching deep learning. As always, he words are penetrating. He said, " And there's a huge sea change going on, basically because our relationship to computers has changed. Instead of programming them, we now show them, and they figure it out."

* Indeed, when I first start out at work, thinking as an MLer is not regarded as cool - programming is cool. But things are changing. And we AIDL is embracing the change.

Enjoy!

Arthur Chan