Fellows, as you all know by now, Prof. Andrew Ng has started a new Coursera Specialization on Deep Learning. So many of you came to me today and ask my take on the class. As a rule, I usually don't comment on a class unless I know something about it. (Search for my "Learning Deep Learning - Top 5 Lists" for more details.) But I'd like to make an exception for the Good Professor's class.
> git clone https://github.com/bitcoin/bitcoin.git
>git checkout v0.14.2
>./configure --without-gui --disable-tests --disable-wallet
> make -j 4
I have been taking a break from deep learning, and I am quite into graphical models (GM) lately. So that's why I am gathering resources of understanding various concepts of GM.
Here are some useful courses one can use. They are not sorted/categorized, it's just useful for me to look them through later.
Note that except Koller's class, not all of the following classes have video available.
- Daphne Koller's Probabilistic Graphical Models on Coursera. This is perhaps the best yet the most difficult one. All quiz and exams are filled with trick questions which can challenge even very experienced MLers.
- The Modern Stanford's version taught by Stefano Ermon Also check out the class notes, which is quite accessible.
- The Brown's class, I found the tutorial lectures are quite useful. It also points to various book chapters for different concepts.
- A Short Course on Graphical Models by Mark A. Paskin, this is interesting because it's more a short-3 lecture class to cover most basic concepts.
- PRML Chapter9
- Koller's "Probabilistic Graphical Models: Principles and Techniques" - fairly dense, but yet again it seems to have the best information.
- Michael Jordan's unfinished book on Graphical Model.
As always, AIDL admin routinely look at whether certain post should stay to our forum. Our criterion has 3 pillars: relevancy, non-commercial and accurate. (Q13 of AIDL FAQ)
This time I look at "The Post-Quantum Mechanics of Conscious Artificial Intelligence", the video was brought up by an AIDL member, and he recommend we started from the 40 mins mark.
So I listened through the video as recommended.
Indeed, the post is non-commercial for sure. And yes, it mentioned AGI from Roger Penrose. So it is relevant to AIDL. But is it accurate though? I'm afraid my lack of physics education background trip me. And I would judge that "I cannot decide" on the topic. Occasionally new science comes in a form no one understand yet. So calling something inaccurate without knowing is not appropriate.
As a result this post stays. But please keep on reading.
Saying so, I don't mind to give a *strong* response to the video. Due to the following 3 reasons:
1, According to Wikipedia, most of Dr. Jack Sarfatti's theory and work are not *peer-reviewed*. He has left academia from 1975. Most of his work is speculative. And most of them are self-published(!). There's no experimental proof on what he said. He was asked several times about his thought in the video. He just said "You will know that it's real". That's a sign that he doesn't really evidence.
2, Then there is the idea of "Post-Quantum Mechanics". What is it? The information we can get is really scanty. Since I can only find a group which seems to dedicate to such study, as in here. Since I can't quite decide if the study is valid. I would say "I can't judge." But I also couldn't find any other group which actively support such theory. So may be we should call the theory at best "an interesting hypothesis". And Sarfatti build his argument on the existence on "Post Quantum Computer". What is it? Again I cannot quite find the answer on-line.
Also you should be aware that current quantum computer have limited capability. D-Wave quantum computing is based on quantum annealing, with many disputed whether it is true quantume computing. In any case, both "conventional" quantum computing and quantum annealing has nothing to do with Post-Quantum Computer. That again you should feel very suspicious.
3a, Can all these interesting theory be the mechanism of the brain or AGI? So in the video, Sarfatti mentioned brain/AGI for four times. His point are two, I would counter them right after, first is that if you believe in Penrose's theory that neurons is related to quantum entanglement, then his own theory-based on post quantum mechanics would be huge. But then once you listen to serious computational neuroscientists, they would be very cautious on whether quantum theory as the basis of neuronal exchange of information. There are many experimental evidence that neurons operate by electrical signal or chemical signal. But they are in a much bigger scale than quantum mechanics. So why would Penrose suggested that have make many learned people scratch their heads.
3b, Then there is the part about Turing machine. Sarfatti believes that because "post-quantum Computer" is so powerful so it must be the mechanism being used by the brain. So what's wrong with such arguments? So first thing: no one knows what "post quantum-computer", that I just mentioned in point 2. But then even if it is powerful, that doesn't mean the brain has to follow such mechanism. Same can be said with our current quantum computing technologies.
Finally, Sarfatti himself believes that it is a "leap of faith" to believe the consciousness is wave. I admire his compassion on speculating the world of science/human intelligence. Yet I also learn by reading Gardner's "Fads and Fallacies" that many pseudoscientists have charismatic personality.
So Members, Caveat Emptor.
AIDL member Bob Akili asked (rephrased):
What is the Difference between Deep Learning and Machine Learning?
Usually I don't write a full blog message to answer member's questions. But what is "deep" is such a fundamental concept in deep learning, yet there are many well-meaning but incorrect answers floating around. So I think it is a great idea to answer the question clearly and hopefully disabuse some of the misconceptions as well. Here is a cleaned up and expanded version of my comment to the thread.
Deep Learning is Just a Subset of Machine Learning
First of all, as you might read from internet, deep learning is just a subset of machine learning. There are many "Deep Learning Consultants"-type would tell you deep learning is completely different from from Machine Learning. When we are talking about "deep learning" these days, we are really talking about "neural networks which has more than one layer". Since neural network is just one type of ML techniques, it doesn't make any sense to call DL as "different" from ML. It might work for marketing purpose, but the thought was clearly misleading.
Deep Learning is a kind of Representation Learning
So now we know that deep learning is a kind of machine learning. We still can't quite answer why it is special. So let's be more specific, deep learning is a kind of representation learning. What is representation learning? Representation learning is an opposite of another school of thought/practice: feature engineering. In feature engineering, humans are supposed to hand-craft features to make machine works better. If you Kaggle before, this should be obvious to you, sometimes you just want to manipulate the raw inputs and create new feature to represent your data.
Yet in some domains which involve high-dimensional data such as images, speech or text, hand-crafting feature was found to be very difficult. e.g. Using HOG type of approaches to do computer vision usually takes a 4-5 years of a PhD student. So here we come back to representation learning - can computer automatically learn good features?
What is a "Deep" Technique?
Now we come to the part why deep learning is "deep" - usually we call a method "deep" when we are optimizing a nested function in the method. So for example, if you can express such functions as a graph, you would find that it has multiple layers. The term "deep" really is describing such "nestedness". That should explain why we typically called any artificial neural network (ANN) with more than 1 hidden layer as "deep". Or the general saying, "deep learning is just neural network which has more layers".
(Another appropriate term is "hierarchical". See footnote  for more detail.)
This is also the moment Karpathy in cs231n will show you the multi-layer CNN such that features are automatically learned from the simplest to more complex one. Eventually your last layer can just differentiate them using a linear classifier. As there is a "deep" structure that learn the right feature (last layer). Note the key term here is "automatic", all these Gabor-filter like feature are not hand-made. Rather, they are results from back-propagation .
Are there Anything which is "Deep" but not a Neural Network?
Actually, there are plenty, deep Boltzmann machine? deep belief network? deep Gaussian process? They are still discussed in unsupervised learning using neural network, but I always found that knowledge of graphical models is more important to understand them.
So is Deep Learning also a Marketing Term?
Yes and no. It depends on who you talk to. If you talk with ANN researchers/practitioners, they would just tell you "deep learning is just neural network which has more than 1 hidden layer". Indeed, if you think from their perspective, the term "deep learning" could just be a short-form. Yet as we just said, you can also called other methods "deep". So the adjective is not totally void of meaning. But many people would also tell you that because "deep learning" has become such a marketing term, it can now mean many different things. I will say more in next section.
Also the term "deep learning" has been there for a century. Check out Prof. Schmidhuber's thread for more details?
"No Way! X is not Deep but it is also taught in Deep Learning Class, You made a Horrible Mistake!"
I said it with much authority and I know some of you guys would just jump in and argue:
"What about word2vec? It is nothing deep at all, but people still call it Deep learning!!!" "What about all wide architectures such as "wide-deep learning"?" "Arthur, You are Making a HORRIBLE MISTAKE!"
Indeed, the term "deep learning" is being abused these days. More learned people, on the other hand, are usually careful to call certain techniques "deep learning" For example, in cs221d 2015/2016 lectures, Dr. Richard Socher was quite cautious to call word2vec as "deep". His supervisor, Prof. Chris Manning, who is an authority in NLP, is known to dispute whether deep learning is always useful in NLP, simply because some recent advances in NLP really due to deep learning .
I think these cautions make sense. Part of it is that calling everything "deep learning" just blurs what really should be credited in certain technical improvement. The other part is we shouldn't see deep learning as the only type of ML we want to study. There are many ML techniques, some of them are more interesting and practical than deep learning in practice. For example, deep learning is not known to work well with small data scenario. Would I just yell at my boss and say "Because I can't use deep learning, so I can't solve this problem!"? No, I would just test out random forest, support vector machines, GMM and all these nifty methods I learn over the years.
Misleading Claim About Deep Learning (I) - "Deep Learning is about Machine Learning Methods which use a lot of Data!"
So now we come to the arena of misconceptions, I am going to discuss two claims which many people have been drumming about deep learning. But neither of them is the right answer to the question "What is the Difference between Deep and Machine Learning?
The first one you probably heard all the time, "Deep Learning is about ML methods which use a lot of data". Or people would tell you "Oh, deep learning just use a lot of data, right?" This sounds about right, deep learning in these days does use a lot of data. So what's wrong with the statement?
Here is the answer: while deep learning does use a lot of data, before deep learning, other techniques use tons of data too! e.g. Speech recognition before deep learning, i.e. HMM+GMM, can use up to 10k hours of speech. Same for SMT. And you can do SVM+HOG on Imagenet. And more data is always better for those techniques as well. So if you say "deep learning use more data", then you forgot the older techniques also can use more data.
What you can claim is that "deep learning is a more effective way to utilize data". That's very true, because once you get into either GMM or SVM, they would have scalability issues. GMM scales badly when the amount of data is around 10k hour. SVM (with RBF-kernel in particular) is super tough/slow to use when you have ~1 million point of data.
Misleading Claim About Deep Learning II - "Deep Learning is About Using GPU and Having Data Center!"
This particular claim is different from the previous "Data Requirement" claim, but we can debunk it in a similar manner. The reason why it is wrong? Again before deep learning, people have GPUs to do machine learning already. For example, you can use GPU to speed up GMM. Before deep learning is hot, you need a cluster of machines to train acoustic model or language model for speech recognition. You also need tons of RAM to train a language model for SMT. So calling GPU/Data Center/RAM/ASIC/FPGA a differentiator of deep learning is just misleading.
You can say though "Deep Learning has change the computational model from distributed network model to more a single machine-centric paradigm (which each machine has one GPU). But later approaches also tried to combine both CPU-GPU processing together".
Conclusion and "What you say is Just Your Opinion! My Theory makes Equal Sense!"
Indeed, you should always treat what you read on-line with a grain of salt. Being critical is a good thing, having your own opinion is good. But you should also try to avoid equivocate an issue. Meaning: sometimes things have only one side, but you insist there are two equally valid answers. If you do so, you are perhaps making a logical error in your thinking. And a lot of people who made claims such as "deep learning is learning which use more data and use a lot of GPUS" are probably making such thinking errors.
Saying so, I would suggest you to read several good sources to judge my answer, they are:
- Chapter 1 of Deep Learning.
- Shakir's Machine Learning Blog on a Statistical View of Deep Learning. In particular, part VI, "What is Deep?"
- Tombone's post on Deep Learning vs Machine Learning vs Pattern Recognition
In any case, I hope that this article helps you. I thank Bob to ask the question, Armaghan Rumi Naik has debunked many misconceptions in the original thread - his understanding on machine learning is clearly above mine and he was able to point out mistakes from other commenters. It is worthwhile for your reading time.
 See "Last Words: Computational Linguistics and Deep Learning"
 Generally whether DL is useful in NLP is widely disputed topic. Take a look of Yoav Goldberg's view on some recent GAN results on language generation. AIDL Weekly #18 also gave an expose on the issue.
 Perhaps another useful term is "hierarchical". In the case of ConvNet the term is right on. As Eric Heitzman comments at AIDL:
"(deep structure) They are *not* necessarily recursive, but they *are* necessarily hierarchical since layers always form a hierarchical structure." After Eric's comment, I think both "deep" and "hierarchical" are fair terms to describe methods in "deep learning". (Of course, "hierarchical learning" is a much a poorer marketing term.)
 In earlier draft. I use the term recursive to describe the term "deep", which as Eric Heitzman at AIDL, is not entirely appropriate. "Recursive" give people a feeling that the function is self-recursive or. but actual function are more "nested", like . As a result, I removed the term "recursive" but just call the function "nested function".
Of course, you should be aware that my description is not too mathematically rigorous neither. (I guess it is a fair wordy description though)
20170709 at 6: fix some typos.
20170711: fix more typos.
20170711 at 7:05 p.m.: I got a feedback from Eric Heitzman who points out that the term "recursive" can be deceiving. Thus I wrote footnote .
If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedIn, Plus, Clarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum. Also check out my awesome employer: Voci.
Here's an awesome post from Prof. Tao:
As AIDL grew, once in a while people would talk about blockchain would affect AI or deep learning. Currently it is still a long shot, but blockchain by itself is a very interesting technology and it deserves our notice.
Here are some resources you may use to learn about blockchain. Unlike "Top 5 List" for AIDL, I don't really understand the technology too well. But also unlike "List of Neuroscience MOOC", Greg Dubela did give me a lot of recommendations on what you should learned up. Thus this post is also used as a resource post in "Blockchain Nation".
- (2 minute) This video: explaining the purpose of blockchain in 2 minutes, and the promise it makes.
- This 6-part series from Dash School is a great introductory series on what Blockchain is, how it is governed, and several fundamental concepts. Greg highly recommend the series.
Blockchain is still a new development, so it's harder to find MOOC which can teach you the whole thing in entirety. We found there are couple of exceptions:
- Coursera's Bitcoin and Cryptocurrency from Princeton
- Stanford CS251's Bitcoin class. While this is not a MOOC, you can access all the notes and homework. They are high-quality.
- Coursera's Cryptography I and Cryptography II, which provides you fairly good basics of Cryptography.
- Mastering Bitcoin by Andreas M. Antonopoulos
Different Cryptos: (under construction)
Learning blockchain these days usually means you know different the characteristics of different coins. Here are list of interesting ones.
- Ethereum Classic
As I said before, we are really no expert on the topic. But as of 20170705, I am taking the Princeton class and I found it quite promising and get into the detail of how blockchain really works.
To be reviewed:
- Someone also brought up University of Nicosia's Introductory MOOC on bitcoin. I haven't see too much review yet. So let's decide later then.
- Khan Academy: https://www.khanacademy.org/economics-finance-domain/core-finance/money-and-banking/bitcoin/v/bitcoin-what-is-it
- Berkeley "Dive Deep into Ethereum" https://docs.google.com/document/d/1ejYCWkHQIRInXB4VifoHevom8CWVJ69zFVfP4J2fjSU/edit
- Udemy's Bitcoin class: https://www.udemy.com/bitcoin-or-how-i-learned-to-stop-worrying-and-love-crypto/
- A list of very useful Bitcoin classes: https://www.udemy.com/courses/search/?q=Bitcoin&p=1
- A series from CRI : https://www.youtube.com/channel/UCgo7FCCPuylVk4luP3JAgVw
This is an impression post of Coursera "Computational Neuroscience" by Rao and Fairhall: (Crossposted in both AIDL and CNAGI)
- Usually I would write an impression post when I audit a class, but write a full blog post when I completed all homeworks.
- In this case, while I actually finished "Computational Neuroscience", I am not qualified enough to comments on some Neuroscientific concept such as Hogkins-Huxley models, Cable Theory or Brain plasticity, so I would stay at the "impression"-level.
- Strictly speaking, CN is more an OoT for AIDL, but we are all curious about the brain, aren't we?
- It's a great class if you know ML, but want to learn more about the brain. It's also great if you know something about brain, but want to know how the brain is similar to modern-days ML.
- You learn nifty concepts such as spike-triggered-averages, neuronal coding/decoding and of course main dishes such as HH-models, cable theory.
- I only learned these topics amateurishly, and there are around 3-4 classes I might take to further knowledge. But it is absolutely interesting. e.g. this class is very helpful if you want to understand the difference between biological neural network and artificial neural network. You will also get insights on why deeper ML people don't just use more biologically-realistic model in ML problems.
- My take: while this is not a core class for us ML/DLers, it is an interesting class to take, especially if you want to sound smarter than "We just run CNN with our CIFAR-10" data. In my case, it humbles me a lot because I now know that we human just don't really understand our brain that well (yet).
Hope you enjoy this "impression"!
Some misadventures on MacOS X:
- Making System Calls From Assembly in Mac OS X from FiloScottie. (Felix's Blog is pretty good too.)
- AT&T vs Intel Syntax
- gcc -v
- radare2 is better to be compiled from source.
- Your system nasm is probably too old, but then a compilation can easily solve the problem.