Tips for Completing Course 1 of

For people who got stuck in Course 1. Here are some tips:

  • Most assignments are straight-forward. And you can finish it within 30 mins. The key is not to overthink it. If you want to derive the equations yourself, you are not reading the question carefully.
  • When in doubt, the best tool to help you is the python print statement. Check out the size and shape of a python numpy matrix always give you insights.
  • I know a lot of reviewers claim that the exercise is supposed to teach you neural network "from scratch". So .... it depends on what you mean. Ng's assignment has bells and whistles built for you. You are really doing these out of nothing. If you write everything from C and has no reference. Yeah, then it is much harder. But that's not Ng's exercise. Once again, this goes back to the point of the assignment being straight-forward. No need to overthink them.

Hope this helps!

Arthur Chan

Quick Impression on Heroes of Deep Learning - Geoffrey Hinton

So I was going through You know we started a new FB group on it? We haven't public it yet but yes we are v. exited.
Now one thing you might notice of the class is that there is this optional lectures which Andrew Ng is interviewing luminaries of deep learning. Those lectures, in my view, are very different from the course lectures. Most of the topics mentioned are research and beginners would find it very perplexed. So I think these lectures deserve separate sets of notes. I still call it "quick impression" because usually I will do around 1-2 layers of literature search before I'd say I grok a video.
* Sorry I couldn't post the video because it is copyrighted by Coursera, but it should be very easy for you to find it. Of course, respect our forum rules and don't post the video here.
* This is a very interesting 40-min interview of Prof. Geoffrey Hinton. Perhaps it should also be seen as an optional material after you finish his class NNML on coursera.
* The interview is in research-level. So that means you would understand more if you took NNML or read part of Part III of deep learning.
* There are some material you heard from Prof. Hinton before, including how he became a NN/Brain researcher, how he came up with backprop and why he is not the first one who come up.
* There are also some which is new to me, like why does his and Rumelhart's paper was so influential. Oh, it has to do with his first experience on marriage relationship (Lecture 2 of NNML).
* The role of Prof. Ng in the interview is quite interesting. Andrew is also a giant in deep learning, but Prof Hinton is more the founder of the field. So you can see that Prof. Ng was trying to understand several of Prof. Hinton's thought, such as 1) Does back-propagation appear in brain? 2) The idea of capsule, which is a distributed representation of a feature vector, and allow a kind of what Hinton called "agreement". 3) Unsupervised learning such as VAE.
* On Prof. Hinton's favorite idea, and not to my surprise:
1) Boltzmann machine, 2) Stacking RBM to SBN, 3) variational method. I frankly don't fully understand Pt. 3. But then L10 to L14 of NNML are all about Pt 1 and 2. Unfortunately, not everyone love to talk about Boltzmann machine - they are not hot as GAN, and perceived as not useful at all. But if you want to understand the origin of deep learning, and one way to pre-train your DNN, you should go to take NNML.
* Prof. Hinton's advice on research is also very entertaining - he suggest you don't always read up from literature first - which according to him is good for creative researchers.
* The part I like most is Prof Hinton's view of why computer science departments are not catching up on teaching deep learning. As always, he words are penetrating. He said, " And there's a huge sea change going on, basically because our relationship to computers has changed. Instead of programming them, we now show them, and they figure it out."
* Indeed, when I first start out at work, thinking as an MLer is not regarded as cool - programming is cool. But things are changing. And we AIDL is embracing the change.
Arthur Chan

Quick Impression on

Fellows, as you all know by now, Prof. Andrew Ng has started a new Coursera Specialization on Deep Learning. So many of you came to me today and ask my take on the class. As a rule, I usually don't comment on a class unless I know something about it. (Search for my "Learning Deep Learning - Top 5 Lists" for more details.) But I'd like to make an exception for the Good Professor's class.
So here is my quick take after browsing through the specialization curriculum:
* Only Course 1 to 3 are published now, they are short classes, more like 2-4 weeks. It feels like the Data Science Specialization so it feels good for beginners. Assume that Course 4 and 5 are long: 4 weeks. So we are talking about 17 weeks of study.
* Unlike the standard Ng's ML class, python is the default language. That's good in my view because close to 80-90% of practitioners are using python-based framework.
* Course 1-3 has around 3 weeks of curriculum overlapped with "Intro to Machine Learning" Lecture 2-3. Course 1's goal seems to implement NN from scratch. Course 2 is on regularization. Course 3 on different methodologies of deep learning and it's short, only 2 weeks long.
* Course 4 and 5 are about CNN and RNN.
* So my general impression here is that it is more a comprehensive class, comparable with Hugo Larochelle's Lectures, as well as Hinton's lecture. Yet the latter two classes are known to be more difficult. Hinton's class in particular, are know to confuse even PhDs. So that shows one of the values of this new DL class, it is a great transition from "Intro to ML" to more difficult classes such as Hinton's.
* But how does it compared with other similar course such as Udacity's DL nanodegree then? I am not sure yet, but the price seems to be more reasonable if you go through the Coursera route. Assume we are talking about 5 months of study, you are paying $245.
* I also found that many existing beginner classes advocate too much on running scripts, but avoid linking more fundamental concepts such as bias/variance with DL. Or go deep to describe models such as Convnet and RNN. cs231n did a good job on Convnet, and cs224n teach you RNN. But they seem to be more difficult than Ng or Udacity's class. So again, Ng's class sounds like a great transition class.
* My current take: 1) I am going to take the class myself. 2) It's very likely this new class will change my recommendations of class on Top-5 list.
Hope this is helpful for all of you.
Arthur Chan

AIDL Postings Relevant to "Threats from AGI" and Other Misc. Thoughts

Thoughts from your Humble Administrators @Aug 8, 2018 (tl;dr)
Last week is crazy - talks about FB killing AI agents which invent a language were all over the place. I believe AIDL Weekly scooped this time - we fact-checked such claims back in #18, then again #23. Of course, anyone who works on the AI/DL/ML business would instantly smell rats when hearing the term "killing" an AI agents. Then there are 30+ outlets are talking about it, none of which are directly from practicing researchers, that's a point you should start to doubt rationally.
Saying so there are many people who come to me and passionately argue that threat of AGI is a thing *now*. And we should just talk about it to avoid future humanity issues. Since I am an Acting Admin of the group, I think it's important to let you know my take.
* First of all, as long as your post is about A.I., we will keep your post regardless of your view. But we would still ask you to post brain-related topic at CNAGI, and automation-related posts are OoT. Remember, automation is a superset of A.I., and automation can mean large machinery, writing a for-loop, using Excel macros etc. Also if you are too spammy, it's also likely we would curb your posts.
* Then there is your posting - I will not judge you, but I strongly suggest you just run some deep/machine learning training yourself - for the most part, these "agents" are Unix/Windows processes these days. Btw, just like Mundher Alshabi and I discuss - you can always kill the process. (Unix: 'kill -9', Windows: Open "Control Panel"........)
* Some insist that they *don't need any experience* to reason that machines are malicious. Again, I will not judge you. But you should understand that it's much harder to consider your opinion seriously. Read up serious work then. Bostrom's Superintelligence is harder to counter, Kurzweil's LOAR is an interesting economic theory, but his predictions in AI/ML is just too lousy to take seriously for pros.......
* Some also insist that because a certain famous person says that, then it must be the true. Again, I will not judge you. Though, be careful, "argue from authority" is a dangerous way to reason.
* Finally, I hope all of you read up what "Dunning-Krueger effect" is. Basically it is a dangerous cognitive bias, but not until you reflect deeply about intelligence, human or machine, then you would understand all of us are affected by such bias.
Good Luck! And keep enjoying AIDL!
Arthur Chan

Some Resources for Graphical Models

I have been taking a break from deep learning, and I am quite into graphical models (GM) lately.   So that's why I am gathering resources of understanding various concepts of GM.

Here are some useful courses one can use.  They are not sorted/categorized, it's just useful for me to look them through later.


Note that except Koller's class, not all of the following classes have video available.


A Closer Look at "The Post-Quantum Mechanics of Conscious Artificial Intelligence"

As always, AIDL admin routinely look at whether certain post should stay to our forum. Our criterion has 3 pillars: relevancy, non-commercial and accurate. (Q13 of AIDL FAQ)

This time I look at "The Post-Quantum Mechanics of Conscious Artificial Intelligence",  the video was brought up by an AIDL member, and he recommend we started from the 40 mins mark.
So I listened through the video as recommended.

Indeed, the post is non-commercial for sure. And yes, it mentioned AGI from Roger Penrose. So it is relevant to AIDL. But is it accurate though? I'm afraid my lack of physics education background trip me. And I would judge that "I cannot decide" on the topic. Occasionally new science comes in a form no one understand yet. So calling something inaccurate without knowing is not appropriate.

As a result this post stays. But please keep on reading.

Saying so, I don't mind to give a *strong* response to the video. Due to the following 3 reasons:

1, According to Wikipedia, most of Dr. Jack Sarfatti's theory and work are not *peer-reviewed*. He has left academia from 1975. Most of his work is speculative. And most of them are self-published(!). There's no experimental proof on what he said. He was asked several times about his thought in the video. He just said "You will know that it's real". That's a sign that he doesn't really evidence.

2, Then there is the idea of "Post-Quantum Mechanics". What is it? The information we can get is really scanty.  Since I can only find a group which seems to dedicate to such study, as in here.  Since I can't quite decide if the study is valid.  I would say "I can't judge."  But I also couldn't find any other group which actively support such theory.  So may be we should call the theory at best "an interesting hypothesis".  And Sarfatti build his argument on the existence on "Post Quantum Computer". What is it?  Again I cannot quite find the answer on-line.

Also you should be aware that current quantum computer have limited capability.  D-Wave quantum computing is based on quantum annealing, with many disputed whether it is true quantume computing.  In any case, both "conventional" quantum computing and quantum annealing has nothing to do with Post-Quantum Computer. That again you should feel very suspicious.

3a, Can all these interesting theory be the mechanism of the brain or AGI? So in the video, Sarfatti mentioned brain/AGI for four times. His point are two, I would counter them right after, first is that if you believe in Penrose's theory that neurons is related to quantum entanglement, then his own theory-based on post quantum mechanics would be huge. But then once you listen to serious computational neuroscientists, they would be very cautious on whether quantum theory as the basis of neuronal exchange of information. There are many experimental evidence that neurons operate by electrical signal or chemical signal. But they are in a much bigger scale than quantum mechanics. So why would Penrose suggested that have make many learned people scratch their heads.

3b, Then there is the part about Turing machine. Sarfatti believes that because "post-quantum Computer" is so powerful so it must be the mechanism being used by the brain. So what's wrong with such arguments? So first thing: no one knows what "post quantum-computer", that I just mentioned in point 2. But then even if it is powerful, that doesn't mean the brain has to follow such mechanism. Same can be said with our current quantum computing technologies.

Finally, Sarfatti himself believes that it is a "leap of faith" to believe the consciousness is wave. I admire his compassion on speculating the world of science/human intelligence. Yet I also learn by reading Gardner's "Fads and Fallacies" that many pseudoscientists have charismatic personality.

So Members, Caveat Emptor.


What is the Difference between Deep Learning and Machine Learning?

AIDL member Bob Akili asked (rephrased):

What is the Difference between Deep Learning and Machine Learning?

Usually I don't write a full blog message to answer member's questions. But what is "deep" is such a fundamental concept in deep learning, yet there are many well-meaning but incorrect answers floating around.   So I think it is a great idea to answer the question clearly and hopefully disabuse some of the misconceptions as well. Here is a cleaned up and expanded version of my comment to the thread.

Deep Learning is Just a Subset of Machine Learning

First of all, as you might read from internet,  deep learning is just a subset of machine learning.  There are many "Deep Learning Consultants"-type would tell you deep learning is completely different from from Machine Learning.    When we are talking about "deep learning" these days, we are really talking about "neural networks which has more than one layer".  Since neural network is just one type of ML techniques, it doesn't make any sense to call DL as "different" from ML.   It might work for marketing purpose, but the thought was clearly misleading.

Deep Learning is a kind of Representation Learning

So now we know that deep learning is a kind of machine learning.   We still can't quite answer why it is special.  So let's be more specific, deep learning is a kind of representation learning.  What is representation learning?  Representation learning is an opposite of another school of thought/practice: feature engineering. In feature engineering, humans are supposed to hand-craft features to make machine works better.   If you Kaggle before, this should be obvious to you, sometimes you just want to manipulate the raw inputs and create new feature to represent your data.

Yet in some domains which involve high-dimensional data such as images, speech or text, hand-crafting feature was found to be very difficult.  e.g. Using HOG type of approaches to do computer vision usually takes a 4-5 years of a PhD student.   So here we come back to representation learning - can computer automatically learn good features?

What is a "Deep" Technique?

Now we come to the part why deep learning is "deep" - usually we call a method "deep" when we are optimizing a nested function in the method.   So for example, if you can express such functions as a graph, you would find that it has multiple layers.  The term "deep" really is describing such "nestedness".  That should explain why we typically called any artificial neural network (ANN) with more than 1 hidden layer as "deep".   Or the general saying, "deep learning is just neural network which has more layers".

(Another appropriate term is "hierarchical". See footnote [4] for more detail.)

This is also the moment Karpathy in cs231n will show you the multi-layer CNN such that features are automatically learned from the simplest to more complex one. Eventually your last layer can just differentiate them using a linear classifier. As there is a "deep" structure that learn the right feature (last layer).   Note the key term here is "automatic", all these Gabor-filter like feature are not hand-made.  Rather, they are results from back-propagation [3].

Are there Anything which is "Deep" but not a Neural Network?

Actually, there are plenty, deep Boltzmann machine? deep belief network? deep Gaussian process?  They are still discussed in unsupervised learning using neural network, but I always found that knowledge of graphical models is more important to understand them.

So is Deep Learning also a Marketing Term?

Yes and no. It depends on who you talk to.  If you talk with ANN researchers/practitioners, they would just tell you "deep learning is just neural network which has more than 1 hidden layer".   Indeed, if you think from their perspective, the term "deep learning" could just be a short-form.  Yet as we just said, you can also called other methods "deep".  So the adjective is not totally void of meaning.  But many people would also tell you that because "deep learning" has become such a marketing term, it can now mean many different things.  I will say more in next section.

Also the term "deep learning" has been there for a century.  Check out Prof. Schmidhuber's thread for more details?

"No Way! X is not Deep but it is also taught in Deep Learning Class, You made a Horrible Mistake!"

I said it with much authority and I know some of you guys would just jump in and argue:

"What about word2vec? It is nothing deep at all, but people still call it Deep learning!!!"  "What about all wide architectures such as "wide-deep learning"?" "Arthur, You are Making a HORRIBLE MISTAKE!"

Indeed, the term "deep learning" is being abused these days.   More learned people, on the other hand, are usually careful to call certain techniques "deep learning"  For example,  in cs221d 2015/2016 lectures, Dr. Richard Socher was quite cautious to call word2vec as "deep".  His supervisor, Prof. Chris Manning, who is an authority in NLP, is known to dispute whether deep learning is always useful in NLP, simply because some recent advances in NLP really due to deep learning [1][2].

I think these cautions make sense.  Part of it is that calling everything "deep learning" just blurs what really should be credited in certain technical improvement.  The other part is we shouldn't see deep learning as the only type of ML we want to study.  There are many ML techniques, some of them are more interesting and practical than deep learning in practice.  For example, deep learning is not known to work well with small data scenario.  Would I just yell at my boss and say "Because I can't use deep learning, so I can't solve this problem!"?  No, I would just test out random forest, support vector machines, GMM and all these nifty methods I learn over the years.

Misleading Claim About Deep Learning (I) - "Deep Learning is about Machine Learning Methods which use a lot of Data!"

So now we come to the arena of misconceptions, I am going to discuss two claims which many people have been drumming about deep learning.   But neither of them is the right answer to the question "What is the Difference between Deep and Machine Learning?

The first one you probably heard all the time, "Deep Learning is about ML methods which use a lot of data".   Or people would tell you "Oh, deep learning just use a lot of data, right?"  This sounds about right, deep learning in these days does use a lot of data.  So what's wrong with the statement?

Here is the answer: while deep learning does use a lot of data, before deep learningother techniques use tons of data too! e.g. Speech recognition before deep learning, i.e. HMM+GMM, can use up to 10k hours of speech. Same for SMT.  And you can do SVM+HOG on Imagenet. And more data is always better for those techniques as well. So if you say "deep learning use more data", then you forgot the older techniques also can use more data.

What you can claim is that "deep learning is a more effective way to utilize data".  That's very true, because once you get into either GMM or SVM, they would have scalability issues.  GMM scales badly when the amount of data is around 10k hour.  SVM (with RBF-kernel in particular) is super tough/slow to use when you have ~1 million point of data.

Misleading Claim About Deep Learning II - "Deep Learning is About Using GPU and Having Data Center!"

This particular claim is different from the previous "Data Requirement" claim,  but we can debunk it in a similar manner.   The reason why it is wrong? Again before deep learning, people have GPUs to do machine learning already.  For example, you can use GPU to speed up GMM.   Before deep learning is hot, you need a cluster of machines to train acoustic model or language model for speech recognition.  You also need tons of RAM to train a language model for SMT.   So calling GPU/Data Center/RAM/ASIC/FPGA a differentiator of deep learning is just misleading.

You can say though "Deep Learning has change the computational model from distributed network model to more a single machine-centric paradigm (which each machine has one GPU).  But later approaches also tried to combine both CPU-GPU processing together".  

Conclusion and "What you say is Just Your Opinion! My Theory makes Equal Sense!"

Indeed, you should always treat what you read on-line with a grain of salt.   Being critical is a good thing, having your own opinion is good.  But you should also try to avoid equivocate an issue.  Meaning: sometimes things have only one side, but you insist there are two equally valid answers.   If you do so, you are perhaps making a logical error in your thinking.   And a lot of people who made claims such as "deep learning is learning which use more data and use a lot of GPUS" are probably making such thinking errors.

Saying so, I would suggest you to read several good sources to judge my answer, they are:

  1. Chapter 1 of Deep Learning.
  2. Shakir's Machine Learning Blog on a Statistical View of Deep Learning.  In particular, part VI, "What is Deep?"
  3. Tombone's post on Deep Learning vs Machine Learning vs Pattern Recognition

In any case, I hope that this article helps you. I thank Bob to ask the question, Armaghan Rumi Naik has debunked many misconceptions in the original thread - his understanding on machine learning is clearly above mine and he was able to point out mistakes from other commenters.  It is worthwhile for your reading time.


[1] See "Last Words: Computational Linguistics and Deep Learning"
[2] Generally whether DL is useful in NLP is widely disputed topic. Take a look of Yoav Goldberg's view on some recent GAN results on language generation. AIDL Weekly #18 also gave an expose on the issue.
[3] Perhaps another useful term is "hierarchical".  In the case of ConvNet the term is right on.  As Eric Heitzman comments at AIDL:
"(deep structure) They are *not* necessarily recursive, but they *are* necessarily hierarchical since layers always form a hierarchical structure."  After Eric's comment, I think both "deep" and "hierarchical" are fair terms to describe methods in "deep learning". (Of course, "hierarchical learning" is a much a poorer marketing term.)
[4] In earlier draft.  I use the term recursive to describe the term "deep", which as Eric Heitzman at AIDL, is not entirely appropriate.  "Recursive" give people a feeling that the function is self-recursive orf(f( \ldots f(f(*)))). but actual function are more "nested", like f_1(f_2( \ldots f_{n-1}(f_n(*)))). As a result, I removed the term "recursive" but just call the function "nested function".
Of course, you should be aware that my description is not too mathematically rigorous neither. (I guess it is a fair wordy description though)

20170709 at 6: fix some typos.

20170711: fix more typos.

20170711 at 7:05 p.m.: I got a feedback from Eric Heitzman who points out that the term "recursive" can be deceiving.  Thus I wrote footnote [4].

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

List of Bitcoin/Blockchain Resources

As AIDL grew, once in a while people would talk about blockchain would affect AI or deep learning.  Currently it is still a long shot, but blockchain by itself is a very interesting technology and it deserves our notice.

Here are some resources you may use to learn about blockchain.   Unlike "Top 5 List" for AIDL,  I don't really understand the technology too well.  But also unlike "List of Neuroscience MOOC", Greg Dubela did give me a lot of recommendations on what you should learned up.  Thus this post is also used as a resource post in "Blockchain Nation".

Introductory Videos:

  • (2 minute) This video: explaining the purpose of blockchain in 2 minutes, and the promise it makes.
  • This 6-part series from Dash School is a great introductory series on what Blockchain is, how it is governed, and several fundamental concepts.   Greg highly recommend the series.


Blockchain is still a new development, so it's harder to find MOOC which can teach you the whole thing in entirety.  We found there are couple of exceptions:


Visualizing Blockchain

Different Cryptos: (under construction)

Learning blockchain these days usually means you know different the characteristics of different coins.  Here are list of interesting ones.

  • Bitcoin
  • Litecoin
  • Ripple
  • Ethereum Classic
  • Ethereum
  • Dogecoin
  • Freicoin

As I said before, we are really no expert on the topic.  But as of 20170705, I am taking the Princeton class and I found it quite promising and get into the detail of how blockchain really works.

To be reviewed:

  • Someone also brought up University of Nicosia's Introductory MOOC on bitcoin.  I haven't see too much review yet.  So let's decide later then.
  • Khan Academy:
  • Berkeley "Dive Deep into Ethereum"
  • Udemy's Bitcoin class:
  • A list of very useful Bitcoin classes:
  • A series from CRI :