All posts by grandjanitor

A Read on "Deep Neural Networks for Acoustic Modeling in Speech Recognition" by Hinton et al.

A read on "Deep Neural Networks for Acoustic Modeling in Speech Recognition" by Hinton et al.

* This is the now-classic paper in deep learning, which is for the first time people confirmed that deep learning can improve ASR significantly. It is important in the fields of both deep learning and ASR. It's also one of the first papers I read on deep learning back in 2012-3.

* Many people know the origin of deep learning from image recognition, e.g. many kids would tell you stories about Imagenet, Alexnet and history from now on. But then the first important application of deep learning is perhaps speech recognition.

* So what's going on with ASR before deep learning then? For the most part, if you can come up with a technique that cut a state-of-the-art system's WER by 10%, your PhD thesis is good. If your technique can consistently beat previous techniques in multiple systems, you usually get a fairly good job in a research institute in Big 4.

* The only technique which I recall to be better than 10% relative improvement are discriminative training. It got ~15% in many domains. That happens back in 2003-2004. In ASR, the term "discriminative training" has very complicated connotation. So I am not going to explain much. This just gives you the context of how powerful deep learning is.

* You might be curious what "relative improvement" is. e.g. suppose your original WER is 18%, but you improve from 17%, then your relatively improvement is 1%/18% = 5.56%. So 10% improvement really means you go down to 16.2%. (Yes, ASR is that tough.)

* So here comes replacing GMM with DNN. In these days, it sounds like a no-brainer. But back then, it was a huge deal. Many people in the past tried to stuff various ML technique to replace GMM. But no one can successfully beat HMM. So this is innovative.

* Now then it is how GMM is setup - the ancestor of this work has to trace back to Bourlard and Morgan's "Connectionist Speech Recognition" in which the authors tried to come up with a Context-independent HMM system by replacing VQ scores with a shallow neural network. At that time, the unit are chosen to be CI-states.

* Hinton's and perhaps Deng's thinking are interesting: The unit was chose to be context-dependent states. Now that's an new change, and reflect how modern HMM system is trained.

* Then there is how the network is really trained. Now you can see the early DLer's stress on using pre-training because training is very expensive at that point. (I suspect it wasn't using GPUs).

* Then there is the use of entropy to train a model. Later on, in other systems, many people just use a sentence-based entropy to do training. So in this sentence, the paper is olden.

* None of these are trivial work. But the result is stellar: we are talking about 18%-33% relative gain (p.14). To ASR people, that's unreal.

* The paper also foresee some future use of DNN, such as bottleneck feature and articulatory feature. You probably know the former already. The latter is more exoteric in ASR, so I am not going to talk about much.

Anyway, that's what I have. Enjoy the reading!

A Read on "Regularized Evolution for Image Classifier Architecture Search"

(First appeared in AIDL-LD and AIDL Weekly.)

This is a read on "Regularized Evolution for Image Classifier Architecture Search" which is the paper version of AmoebaNet, the latest result in AutoML (Or this page: https://research.googleblog.com/…/using-evolutionary-automl…)

* If you recall, Google already has several results on how to use RL and evolution strategy (ES) to discover model architecture in the past. e.g. Nasnet is one of the examples.

* So what's new? The key idea is so-called regularized evolution strategy. What does it mean?

* Basically it is a tweak of the more standard tournament strategy, commonly used as the means of selecting individual out of a population. (https://en.wikipedia.org/wiki/Tournament_selection)

* Tournament is not too difficulty to describe:
- Choose random individuals from the population.
- Choose the best candidate according to certain optimizing criterion.

You can also use a probabilistic scheme to decide whether to use the second or third best candidate. You might also think of it as throwing away the worst-N-candidate.

* The AutoML calls this original method by Miller and Goldberg (1995) as non-regularized evolution method.

* What is "regularized" then? Instead of throwing away the worst-N-candidates. The author proposed to throw away the oldest-trained candidate.

* Now you won't see a justification of why this method is better until the "Discussion" section. Okay, let's go with the authors' intended flow. As it turns the regularized method is better than non-regularized method. e.g. In CIFAR-10, the evolved model is ~10% relatively better either man-made model or NasNet. On Imagenet, it performs better than Squeeze-and-Excite Net as well as NasNet. (Squeenze-and-Excite Net is the ILSVRC 2017's winner.)

* One technicality when you read the paper is the G-X dataset, they are actually the gray-scale version the normal X data. e.g. G-CIFAR-10 is the gray-scale version of CIFAR-10. The intention of why the authors do it are probably two folds: 1) to scale the problem down, 2) to avoid overfitting to only the standard testsets of the problems.

* Now, these are all great. But how come the "regularized" approach is better then? How would the authors explain it?

* I don't want to come up with a hypothesis. So let me just quote the last paragraph here: "Under regularized evolution, all models have a short lifespan. Yet, populations improve over longer timescales (Figures 1d, 2c,d, 3a–c). This requires that its surviving lineages remain good through the generations. This, in turn, demands that the inherited architectures retrain well (since we always train from scratch, the weights are not heritable). On the other hand, non-regularized tournament selection allows models to live infinitely long, so a population can improve simply by accumulating high-accuracy models. Unfortunately, these models may have reached their high accuracy by luck during the noisy training process. In summary, only the regularized form requires that the architectures remain good after they are retrained."

* And also: "Whether this mechanism is responsible for
the observed superiority of regularization is conjecture. We
leave its verification to future work."

A Read on "A Neural Attention Model for Abstractive Sentence Summarization" by A.M. Rush, Sumit Chopra and Jason Weston.

(First appeared in AIDL-LD and AIDL Weekly.)

This is a read on the paper "A Neural Attention Model for Abstractive Sentence Summarization" by A.M. Rush, Sumit Chopra and Jason Weston.

* Video: https://vimeo.com/159993537 . Github: https://github.com/facebookarchive/NAMAS

* The paper was written at 2015, and is more a classic paper on NN-based summarization. It is published slightly later than classic papers on NN-based translation such as those written by Cho or Badhanau. We assume you have some basic understanding on NN-based translation and attention.

* There is a github (https://github.com/facebookarchive/NAMAS) and a video (https://vimeo.com/159993537) for the paper.

* If you haven't worked on summarization, you can broadly think of techniques as extractive or abstractive. Given the text you want to summarize, "extractive" means you just usehe word from the input text, whereas "abstractive" means you can use any words you like, even the words which are in the input text.

* So this is why summarization is seen as similar problem as translation: you just think that there is a "translation" from the original text to the summary.

* Section 2 is a fairly nice mathematical background of summarization. One thing to note, the video also bring up noisy channel formulation. But as Rush said, their paper is to completely do away noisy-channel but do direct mapping.

* The next nuance you want to look at is the type of LM and the encoder used. That can all be found in Section 3. e.g. it uses the forward NNLM proposed by Bengio. Rush mentioned that he was trying RNNLM, but at that time, he get small gain. It feels like he can probably get better results if RNNLM is used.

* Then it's the type of encoder, there is a nice comparison between bag-of-words and attention models. Since there are words embeddings, the "bag-of-words" is actually all the input words embedded down to a certain size. Attention model, on the other hand, is what we know today, which contains a weight matrix P which map the context to input.

* Here is an insightful note from Rush: "Informally we can think of this model as simply replacing the uniform distribution in bag-of-words with a learned soft alignment, P, between the input and the summary."

* Section 4 is more on decoding, in Section 2, Markov assumption was made, this simplifies the decoding quite a lot. The authors were using beam search, so you can use trick such as path combination.

* Another cute thing is that the authors also comes up with method such that make the summarization more extractive. For that it uses a log-linear model to also weigh features such as unigram to trigram. See Section 5.

* Why would the author wants to make the summarization more extractive? That probably has to do with the metric. ROUGE usually favors words which are extracted from the input text.

* We will stop at this point. Here are several interesting commentaries about the paper.

mathyouth's: https://github.com/mathsyouth/awesome-text-summarization
Denny Britz': https://github.com/…/neural-attention-model-for-abstractive…

https://arxiv.org/abs/1509.00685

Some Notes on Building a DL-Machine (Installing CUDA 8.0 on an Ubuntu 14.04)

I mostly just follow this link by Slav IvanovThe build is for a friend, so nothing fancy.  The only thing different is my friend got a Titan X instead of a 1080, and he requires Ubuntu 14.04.

As a rule of Linux installation, you can't always follow the instruction as if it is casted in stone.   So what I did differently?  So I did:

  1. sudo apt-get update
  2. sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
    sudo apt-get --assume-yes install software-properties-common
    sudo apt-get --assume-yes install git

    (Notice unlike Slavv, I didn't do an upgrade because upgrade seems to easily screw up CUDA 8.0 installation later on.)

  3.  So this is a major difference, Slavv suggested to install CUDA directly. No, no, no.  What you should do is to make sure driver of your graphic card is installed first.  And Ubuntu/Nvidia has good support on it.  Following this thread, I found that installing Titan require updating driving to nvidia-367.  So I just did an apt-get install nvidia-367.
  4. At this point if you reboot, you will notice that 14.04 recognize the display card. Usually what it means is the display is in the right resolution.   (If the driver is not installed properly, then you will find a display with overlarged icons, etc.)
  5. So now, you can test your setting, by typing nvidia-smi.  Normally a screen would look like this one.  If you are running within a GUI, there should be at least one process running on the GPU.
  6. Now all good, you now have the driver of the display card, now you can really follow Slavv's procedure :
    wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_8.0.61-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1404_8.0.61-1_amd64.deb
    sudo apt-get update
    sudo apt-get install cuda-toolkit-8.0
  7. This is the point when I stopped.  And I left it to my friend to install more software.   Usually, installing display card driver and CUDA are the toughest steps in a Linux software build.  So the rest should be quite smooth.

Arthur Chan

Acknowledgement: Thanks for the great post by Slav Ivanov!

Quick Impression on Waikit Lau's Crypto and Blockchain forum and MIT

* Hosted by Coach Wei. Our own Waikit Lau is presenting the topic on blockchain, cryptocurrency and ICO. Me and Brian Subiranna were invited as guest panels at the Q&A forums. Brian is planning to create a class on blockchain at MIT.

* About the forum, Coach Wei has launched MIT-Tsinghua Summit since Dec 30 last year. So this is part of the talk in the series: https://www.linkedin.com/…/mit-tsinghua-innovation-summit-…/

* And Waikit, as you might know, he has been successful serial entrepreneur, angel and involved in several ICOs. He also co-admin the AIDL forum.

* In my view, Waikit gave a great presentation on the excitement of blockchain and cryptos. His ~50 min presentation have couple gists
- The rise of protocol coins such as ethereum.
- The potential business opportunity is comparable to development of HTTP. Waikit use the metaphor that blockchain can be seen as TCP/IP. Whereas application build on top of blockchain can be thought as HTTP.
- The current ambiguity of how ICO should be regulated. Or generally: should cryptos be seen as a commodity or a security?

* Gems from the Q&A session. The crowd has many sharp questions to the panels. Here is a summary:

Q. Are there any values of blockchain without cryptocurrency?
A. Generally yes from the panels. e.g. most chains can exist without the idea of mining. Mining probably makes more sense when parties within the network don't trust each other.

Q. What is the current state of decentralized exchanges?
A. Panels: Still under development. There are a lot of things need to happen to motivate a large ones.

Q. Would quantum computing be a threat to blockchain?
A. Panels (Arthur): It could be, yet current quantum computing still several technical roadblocks to solve to make it usable for applications. e.g. create stabilized inputs for the QC. There are also counter technology such as quantum cryptography being developed. So we can't quite say QC would just kill blockchain even if it is developed.

Q. Should a chain be controlled by one or multiple parties?
A. Yet another issue which is hard to predict. Looking at the development of Ethereum and Bitcoin, having a benevolent dictator seems to make the Ethereum community more unified. But the fact that Bitcoin's community is segmented that now the buyers/users have more say, might motivate adoption of speed-up algorithm.

Q. Would speeding up mining speed up transaction?
A. Unlikely. What would likely to improve transaction speed are technology such as Lightning.

That's what I have.  You can find the original thread at

https://www.facebook.com/groups/blockchainnation/

Arthur

Quick Impression of Course 5 of deeplearning.ai

Hey Hey! As you know Course 5 is just out, as always, I would check out the class and give you a quick impression of what about. So far, these new 3-week class look very exciting. Here is my takeaway. Remember, I haven't started the class yet. But this is likely to give you a sense of the scope and extent of the class.

* Course 5 is mostly focused on sequence models. That include the more mysterious models such as RNN, GRU, LSTM. You will go through standard ideas such as vanishing gradients which actually first discovered in RNN. Then go through GRU and LSTM afterward.

* The sequence of coverage is nice, covering GRU first, then LSTM doesn't quite follow the historical order. (Hochreiter & Schmidhuber first discovered LSTM in 97, Cho had the idea about GRU in 2014). But such order makes more sense for pedagogical purpose. I don't want to spoil it, but this is also how Socher's approach of the subject in cs229n as well. That makes me believe this is likely a course which would teach you well on RNN.

* Week 1 will be all about RNN, then Week 2 and 3 would be about word vectors, and end-to-end structure. Would one week be enough for each topic? Not at all. But Andrew seems to give all the essential in each topic - word2vec/GloVec in word vectors. Standard dec-enc structure in end-to-end scheme. Most examples are based on SMT, which I think it's appropriate. Other applications such as image captioning or speech recognition are possible applications. But they usually have details which is tough to cover in the first class.

* Would this class be everything you need on NLP? Very unlikely. You still need take cs229n to get good. Or even the Oxford class. But just like Course 4 is a good intro to computer vision. This will be a good intro to NLP and in general any topics which require sequence modeling such as speech recognition, stock analysis or DNA sequence analysis.

The course link can be found at https://www.coursera.org/learn/nlp-sequence-models

Hope this "Quick Impression" helps you!

https://www.coursera.org/specializations/deep-learning

Review of Ng's deeplearning.ai Course 4: Convolutional Neural Networks

(You can find my reviews on previous courses here: Course 1, Course 2 and Course 3. )

Time flies, I finished Course 4 around a month ago and finally have a chance to write a full review.   Course 4 is different from the first three deeplearning.ai courses, which focused on fundamental understanding of deep learning topics such as back propagation (Course 1) , tuning hyperparameters (Course 2) and decide what improvement strategy is the best (Course 3) .  Course 4 is more about an important application of deep learning: computer vision.

Focusing on computer vision make designing Course 4 subjects to a distinct challenges as a course: how does Course 4 scales up with other existing computer vision class?   Would it be comparable with the greats such as Stanford cs231n?  For these questions, I will do a comparison between Course 4 and cs231n in this article.   My goal is to answer how you would choose between the two classes in your learning process.

Convolutional Neural Network In the Context of Deep Learning

Convolutional neural networks (CNN) has a very special place in deep learning.   For the most part, you can think of it as interesting special case of a vanilla feed-forward network with parameters tied. Computationally, you can parallelize it much better than technique such as recurrent neural networks.   Of course, it is prominent in image classification (since LeNet-5).   But then it is also frequently used in sequence modeling such as speech recognition and text classification (check out cs224n for details).   I guess, more importantly, since image classification is also used a template of development in many other newer application.  It makes learning CNN sort of mandatory for students of deep learning.

Learning Deep-Learning-based Computer Vision before deeplearning.ai

Interesting enough, there is a rather standard option to learning deep learning-based computer vision on-line.   Yes! You guess it right! It is cs231n which used to be taught by then Stanford PhD candidates, Andrej Karpathy in 2015/16.   [1]  To recap, cs231n is not only a good class for computer vision, it is also a good class for learning basics of deep learning.   Also as now famous Dr. Karpathy said, it has probably one of the best explanation of back-propagation.    My only criticism for the class (as I mentioned in earlier reviews) is that as a first class, it is too focused on image recognition.   But as a first class of deep-learning-based computer vision, I think it was the best.

Course 4: Convolutional Neural Networks Briefly

Would Course 4 changes my opinion about cs231n then?   I guess we should look at it in perspective.   Comparing Course 4 with cs231n is comparing orange and apple.  Course 4 is a month-long class which is suitable for absolute beginners.   If you look into it course 4 basically is a quick introductory class.  Week 1 focuses on what CNN is, Week 2 and 3 talks about 2 prominent applications: image classification, image detection.  Whereas Week 4 are about fun stuffs such as face verification and  image transfer.

Many people I know finish the class within 3 days when the class started.   Whereas cs231n is a semester-long course which contain ~18 hours of video to watch with more substantial (and difficult) homework problems.   It is more suitable for people who already have at least one or two machine learning full courses at their belt.

So my take is that Course 4 can be a good first class of deep-learning-based computer vision, but it is not a replacement of cs231n.  So if you only took Course 4, you will find that there are still a lot in computer vision you don't grok.   My advice is you should then audit cs231n afterward, or else your understanding would still have holes.

What if I already took cs231n? Would Course 4 still helps me?

Absolutely.   While Course 4 is much shorter - remember that a lot of deep learning concepts are obscure.  It doesn't hurt to learn the same thing in different ways.    And Course 4 offer different perspectives on several topics:

  • For starter, Course 4, just like all other deeplearning.ai has homework which require code verification at every step.  As I argued in an earlier review, that's a huge plus for learning.
  • Then there is the treatment of individual topics,  I found that Ng's treatment on image detection is refreshing - the more conventional view (which cs231n took) was to start from RCNN and its two faster variants, then bring up YOLO.   But Andrew just decide to go with YOLO instead.   Notice that neither of the classes had gave detail description of the algorithm.  (Reading the paper is probably the best.)  But YOLO is indeed more practical than RCNN variants.
  • On Week 4 about applications,  such as face verification and Siamese networks are actually new to me.   Andrew also give a very nice explanation on why image transfer really works.
  • As always, even a new note for old topics matter.  E.g.  This is the first time I am aware the convolution in deep learning is different from convolution in signal processing. (See Week 1).   I also found that Andrew's note on various image classification papers are gems.  Even if you you read those paper, I do suggest you to listen to him again.

Weakness(es)

Since I admin an unofficial forum for the course,  I learn that there are fairly obvious problems with the courses.   For example, back in December when I took the course, there is one homework you need to submit an algorithm which wouldn't match the notebook.   Also, there was also a period of time where submission was very slow, which I need to fix the file downloading to straighten it up.   I do think those are frustrating issue.  Hopefully, by the time when you read this article, the staff has already fixed the issues. [2]

To be fair, even the great NNML by Hinton has glitches here and there in their homeworks.   So I am not entirely surprised glitches happen in deeplearning.ai.   Of course, I would still highly recommend the class.

Conclusion

There you have it - I reviewed Course 4 of deeplearning.ai.  Unlike earlier parts of the courses, Course 4 has a very obvious competitor: cs231n.  And I don't quite put Course 4 as the one course you can take and master computer vision.   My belief is you need to go through both Course4 and cs23n to have reasonable understanding. 

But as a first class of DL-based computer vision.  I still think Course 4 has tremendous value.  So once again I highly recommend yo all to take the class.

As a final note, I was able to catch up reviews for all classes in deeplearning.ai.  Now all eyes on Course 5 and currently (as of Jan 23), it is set to launch at Jan 31.  Before that, do check out ourforum AIDL and Coursera deeplearning.ai for more discussion!

Arthur Chan

First published at http://thegrandjanitor.com/2018/01/24/review-of-ngs-deeplearning-ai-course-4-convolutional-neural-networks/

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitterLinkedInPlusClarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Footnotes:

[1] Funny enough, while I went through all cs231n 2016 videos a while ago, I never wrote a review about the course.

[2] As a side note, I think it has to do with Andrew and the staffs are probably rushing to create the class.   That's why I was actually relieved when I learn that Course 5 will be released in January.  Hopefully this gives more time for the staffs to perfect the class.

 

 

Some Notes on OpenMined.org

Yesterdays I watched the video from Introduction of OpenMined by Andrew Liam Trask. Wow, it's so interesting. Some notes:

* First of all, unlike similar projects. There is no ICO. It's a genuine open source projects with a vibrant community.

* The idea is quite explainable. It's a marketplace between data scientists and miners who want to provide data. The key question Trask tried to figure out is to protect data privacy from the user, at the same time allow data scientists to train model securely.

* At first I thought it is just an idea about federated learning combined with deep learning. But Trask has augmented the idea with homomorphic encryption and smart contract. I am still in the process to learn why the last two concepts but briefly homomorphic encryption allows model to be securely to miners without getting stolen. Whereas having smart contract would genuinely allow an open market place.

* What if miner tries to come up with fake data? This is actually an FAQ on OpenMined.org. As it turns out, a data scientist can also specify a test set on the smart contract. This ensures data uploaded by miner improve a model.

* Another question I asked to their slack community is how everyone is paid without a coin. Currently the project would rely on USD, ETH and BTC. Fair enough.

Quick Impression of Princeton's Coursera "Bitcoin and Cryptocurrency Technologies"

Recently I have gone through all the video lectures from Princeton's Coursera "Bitcoin and Cryptocurrency Technologies" (BCT). Just like many of the courses I took in AI/DL, I want to write some notes on the class and whatabout. But since I won't call myself deep in the subject yet and I hadn't looked at the HW (no time...), so I decide to just write a quick impression.

- If you recall, Greg DubelaWaikit Lau and I started B&C as a satellite of AIDL. Greg Dubela is more a practicing programmer, Waikit Lau has quite a bit of experience on bitcoin. But I learn about cryptos mostly for fun. When we first started the group, part of the goal is to direct OoT traffic from AIDL to a proper place. We end up having 5k members now.

- And of course, the most frequently asked question at B&C is "What really is bitcoin?". (*)

- You may think of bitcoin as just a currency. If you have a bit of money, you can just trade it. That's pretty much is what 99% of the people should know about bitcoin.

- But then you can also see bitcoin as a kind of mineral. It just happens that your computer is the equipment to mine it. Again, depends on whether you like the idea of mining, that's also what many people understand about bitcoin.

- To really look inside bitcoin though, you need to understand that it is a decentralized system of currency, which is opposed to a central banking systems which we are using today.

- If you want to understand bitcoin in this level, then BCT is your class. You will first learn how the basic structure of bitcoin blockchain works.

- Built upon the idea of blockchain, you will also learn how bitcoin's transaction really works, what is the role of miners, how mining works, and do blockchain really provides anonymity? what is the role of community in bitcoin? Who really controls bitcoin?

- And finally it's about altcoins, which I believe is more relevant to our time - as the combined altcoins market values is on par with bitcoin. So what are they? Why are they there. The course will not cover all famous altcoins we know today, but it will give you the basics.

- I found the content of the class illuminating. And I highly recommend anyone, especially with general CS background, goes to take BCT first before you do anything related to cryptos. You will find yourself more educated than your peers in many issues, even on basic operations such as trading and mining.

- My only criticism of the course is that it is dated back in 2015. Of course, things changed dramatically last 2 years: events such as bitcoin hard forks, the rise of ethereum are all unforeseen by the lecturers. That doesn't devalue the class at all. As for me, it just gives me historical perspective of the technology.

- If you enjoy the course, go ahead to read the book by Arvind Narayanan as well, once again, it is a great intro of the subject.

That's what I have, hope my impression helps.

Note:

* In fact, you may say "Should I buy bitcoin at $?" is the most frequently asked question. But of course those type of questions are banned by our strong rules.

Review of Ng's deeplearning.ai Course 3: Structuring Machine Learning Projects

(Also see my review of Course 1 and Course 2.)

As you might know, deeplearning.ai courses were released in two batches.  The first batch contains Course 1 to 3.  And only recently (as of November 15),  Course 4, "Convolution Neural Networks" was released.  And Course 5 is supposedly released in late November.   So Course 3, "Structuring Machine Learning Projects" was more the "final" course in the first batch.  It is also a good pause of the first and second half of the course:  The first half  was more the foundation of deep learning, whereas the second half was more about applications of deep learning.

So here you are, learning something new in deep learning now, isn't it time to apply these new found knowledge?  Course 3 says "Hold on!"  It turns out before you start to do machine learning,  you need to slow down and think about how to plan a task.

In fact, in practice, Course 3 is perhaps the most important course among all the courses in the specialization.   The Math in Course 1 may be tougher, and Course 5 could have difficult concepts such as RNN or LSTM which are hard to grok.  They are also longer than Course 3 (which only last 2 weeks). But in grand scheme of things, they are not as important as Course 3.  I am going to discuss why.

What do you actually do as an ML Engineer?

Let me digress a bit: I know many of my readers are young college students who are looking for careers in data science or machine learning.  But what do people actually do in the business of machine learning or AI?   I think this is a legit question because I was very confused when I first started out.

Oh well, it really depends on how much you are on the development side or research side of your team.  Terms like "Research" and "Development" can have various meaning depends on the title.  But you can think "researcher" are the people who try to get a new techniques working - usually the criterion is whether it beats the status quo such as accuracy performance. "Developers" on the other hand, are people come up with a production implementation.     You can think that many ML jobs are really in between the spectrum of "developers" and "researchers".   e.g.  I am usually known for my skill as a architect.  That usually means I have the knowledge on both sides.  My quote on my skills is usually "50% development and 50% research".  There are also people who are highly specialized in either side.  But I will focus on the research-side more in this article.

So, What do you actually do as an ML Researcher then?

Now I can see a lot of you jump up and say "OH I WANT TO BE A RESEARCHER!"  Yes, because doing research is fun, right?   You just need to train some models and beat the baseline and write a paper.  BOOM! In fact, if you are good, you just need to ask people to do your research.  Woohoo, you are happy and done, right?

Oh well,  in reality, good researchers are usually fairly good coders themselves.  Especially in applied field such as machine learning, my guess is out of 100 researchers in an institute, may be there is perhaps 1 person who is really a "thinking staff". i.e.  They do nothing other than coming up with new theory or writing proposal.   Just like you, I admire the life of a pure academician.  But in our time, you usually have to be either very smart and very lucky to be one of them. (There is a tl;dr explanation here, but it is out of scope of this article.)

"Okay, okay, got it..... so can we start to have some fun now?   We just need to do some coding, right? " Not really, the first step before you can work on fun stuffs such as modeling, or implement new algorithm, is to clean-up data.   So say if you work on a fraudulent transaction detection, the first is to load a giant table somewhere so that you can query it and get the training data.  Then you want to clean the data, and massage the data so that it can be an input of ML engine.   Notice that by themselves these tasks can be non-trivial as well.

Course 3: Structuring Machine Learning Projects

Then there you are, after you code, you clean up your data, finally you have some time to do machine learning.    Notice that your time after all these "chores" are actually quite limited.    That makes how to use your time effectively a very important topic.   And here comes why you want to take Course 3: Andrew teaches you the basics of how to assign time/resource in a deep learning task.   e.g. How large are your train/validation/test sets?  When should you stop your development?   What is human performance?   What if there are mismatches between your train set/test set?   If you are stuck, should you tune your hyperparemeters more? Or should you regularize?

In a way, Course 3 is a reminiscence of "Machine Learning"'s  Week 6 and Week 11, basically what you try to learn is to make good "meta-decision"e of all your projects you will work for your life time.  I also think it's the right stuffs in your ML career.

One final note: as you might notice in my last two reviews, I usually tried to compare deeplearning.ai with other classes.   But Course 3 is quite unique, so you might only find similar material on machine learning course which focus on theory.   But Ng's treatment is unique:  first what he gave is practical and easy to understand advice.  Then his advice focused on deep learning - while we are talking about similar principle.   Working on deep learning usually implies special circumstance - such as close to human performance, and you might just have low train and test set performance.  Those scenarios did appear in the past - but only in cutting edge ML evaluation involved the best ML teams.  So you don't normally hear about it in a course,  but now Andrew tell you all.  Doesn't that worth the price of $49? 🙂

Conclusion

So here you have it.  This is my review of Course 3 of deeplearning.ai.  Surprising even to me, I actually write more than I expect for these two-week course.   Perhaps the main reason is - I really hope this course were there say 3 years ago.  This would have change the course of some projects I develop.

May be it's too late for me..... but if you are early in deep learning, do recognize the importance of Course 3, or any advices you hear similar to what Course 3 taught.  It will save you much time - not just on one ML task but many ML tasks you will work in your career.

Arthur Chan

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitterLinkedInPlusClarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.