(Repost) Recommended books in Machine Learning/Deep Learning.

(I am editing my site, so I decide to separate the book list into a separate page.)

I am often asked what the best beginner books on machine learning.  Here I list several notable references and they are usually known as "Bibles" in the field.   Also read the comments on why they are useful and how you may read them.

Machine Learning:

IMG_5135

Pattern Recognition and Machine Learning by Christopher Bishop

One of the most popular and useful references in general machine learning.   It is also the tougher book to read among this list.   Generally known as PRML,  Pattern Recognition and Machine Learning is a comprehensive treatment on several important and relevant machine learning techniques such as neural networks, graphical models and boosting.   There are in-depth discussion as well as supplementary exercises on each techniques.

The book is very Bayesian, and rightly so because Bayesian thinking is very useful in practice.   e.g. It's treatment of bias-variance is to treat it as the "frequentist illusion", which is a more advanced view point compared to most beginner classes you would take. (I think only Hinton's class fairly discuss the merit of Bayesian approach.)

While it is a huge tomb, I would still consider the book as a beginner book, because it doesn't really touch all important issues in all techniques.  e.g.  there is no in-depth discussion in sequential minimal optimization (SMO) in SVM.   It is also not a deep learning /deep neural network book.  For that Bengio/GoodFellow's book seem to be a much better read.

If you want to reap benefit out of this book, consider to do exercise from the back of the books.  Sure it will take you a while, but doing any one of the exercises would give you incredible insight on how different machine techniques work.

Pattern Classification 3rd Edition by R. Duda, P.E. Hart and D.G Stork

Commonly known as "Duda and Hart",  its 2nd Edition titled "Pattern Classification and Scene Analysis" was more known to be bible of pattern classification.  Of course, nowadays "machine learning" is the more trendy term, and in my view the two topics are quite similar.

The book is highly technical (and perhaps terse) description of machine learning, which I found more senior scientists usually referred to back when I was working at Raytheon BBN.

Compare to PRML, I found that "Duda and Hart" is slightly outdated, but it's treatment on linear classifiers is still very illuminating.   The 3rd edition is updated so that there are computer exercises.   Since I usually learn an algorithm directly looking at either the original paper or source code, I found these exercises are not as useful.   But some of my first mathematical drilling (back in 2000s) on pattern recognition does come from the guided exercises of this book, so I still recommend this book to beginners.

Machine Learning by Tom Mitchell

Compared to PRML and Duda & Hart,  Mitchell's book is much shorter and concise, thus more readable.  It is also more "rule-based" so there are discussion on concept learning, decision trees e.g.

If you want to read an entire book of machine learning, this could be your first choice.   Both PRML and Duda&Hart  are not for faint of heart.    While Mitchell's book is perhaps less relevant for today's purpose, I still found its discussion of decision tree and artificial neural network very illuminating.

The Master Algorithm : How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

You can think of it as a popular sci non-fi.   It's also a great introduction on several schools of thoughts in machine learning.

Books I heard which are Good.

  1. Hasti/Tibshirani/Friedman's Elements of Statistical Learning
  2. Barber's Bayesian Reasoning and Machine Learning
  3. Murphy's Machine Learning: a Probabilistic Perspective
  4. MacKay's Information Theory, Inference and Learning Algorithms
  5. Goodfellow/Bengio/Courville's Deep Learning  - the only one on this list which is related to deep learning. (See my impression here.)

More Advanced Books (i.e. They are good but I don't fully Grok them.)

  1. Perceptrons: An Introduction to Computational Geometry, Expanded Edition by Marvin Minsky and Seymour Papert - an important book which change history of neural network development.
  2. Parallel Models of Associative Memory by Geoff Hinton - another book of historical interest.

A Quick Impression on the Course "Synapses, Neurons and Brains"

Hi Guys, I recently finished the coursework of Synapses, Neurons and Brains (SNB below). Since neuroscience is really not my expertise, I just want to write a "Quick Impression" post to summarize what I learned:

* Idan Segev is an inspiring professor and you can feel his passion of the topic of the brain throughout the class.

* Prof. Segev is fond of the use of computational neuroscience, and thus simulation approach of connectome. Perhaps thus the topics taught in the class, Hudgins-Huxley model, Rall's cable Model, dendritic computation, the Blue brain projects.

* Compare to Fairhall and Rao's computational neuroscience, which has a general sense of applying ML approach/thinking in Neuroscience. SNB has a stronger emphasis on discussing motivating neuroscientific experiments such as Hubel and Wiesel when discussing the neurocortex. And the "squid experiment" when developing the HH model. So I found it very educational.

* The course also touches on seemingly more philosophical issues such as "Can you download a mind?", "Is it possible to read mind?", "Is there such thing call free will?" Prof. Segev presented his point of view and supporting experiments. I don't want to spoil it out, check it out if you like.

* Finally, it's on coursework - ah, it's all multiple choices and you can try up to 10 times per 8 hours but this is a tough course to pass. The course feature many multi-multiple choices question and doesn't give you any feedback on your mistakes. And you need to understand the course material quite well to get them right correctly.

* Some students even complained that some of the questions don't make sense - I think it is going a bit too far. But it's fair to say that the course wasn't really well maintained in the last 2 years or so. And you don't really see any mentors chime in to help students. That could be a downside for all of us learners.

* But I would say I still learn a lot in the process. So I do recommend you to listen through the lecture if you are into neuroscience. May be what you should decide is if you want to finish all the coursework.

That's what I have. Enjoy!

Re AIDL Member: The Next AI Winter

Re Dorin Ioniţă (Also a longer write-up Sergey Zelvenskiy's post) Whenever people asked me about AI winter. I couldn't help but think of on-line poker in 2008 and web programming in 2000. But let me just focus on web-programming?

At around 1995-2001, there was the time people keep on telling you "web programming" is the future. Many young people were told that if you know html and CGI programming, you would have a bright future. That's not too untrue. In fact, if you get good at web programming at 2000, you probably started a company and made a decent living for .... 3-4 years. But then competition arises, many college starts to include web as a core curriculum - as a result, web programming is sort of a very common skills nowadays. I am not saying it is not useful - but you are usually competing with 100 programmers to get one job.

So back to AI. Since we start to realize AI/DL can be useful, now everyone is jumping onto the wagon. Of course, there are more senior people who has been 'there', joined couple of DARPA projects or worked in ML startup years before deep learning. But most of them are frankly young college kids, who try to have a future with AI/DL. (Check out our forum?) For them, I am afraid it's likely that they will encounter a future similar to web programmers in 2000. The supply of labor will one day surpass the demand. So it's very likely that data science/machine learning is not their final destination.

So am I arguing there is an AI winter coming? Not in the old classical sense of "AI Winter" when research funding dried up. But more on like AI as a product in a product cycle - just like every technology - it will go through a hype cycle. And one day when the reality of the product doesn't meet expectation, things would just bust. It's just the way it is. We can argue to the death on whether deep learning is different or not. But you should know every technology follow similar hype cycle. Some last longer, some don't. We will have to wait and see.

For OP: If you are asking of a career advice though, so here is something I learn from poker (tl;dr story) and many other things in life - if you are genuinely smart, you can always learn up a new topic quicker than other people. That's usually what determine if you can make a living. The rest are luck, karma and whether you buy beers for your friends.

How to Think of A New Idea in A.I.?

Rephrase:  How to come up with an idea in A.I. or Machine Learning?

Answer:
1, What other people are doing and is it possible to make a twist about it?

2, What is a problem which *you* want to solve in your life. Then think, is there anyway AI/ML can help you? Everyone has some - e.g. I really like to make a the old-style Nintendo Final Fantasy style game. But then drawing the graphics of bitmap character takes insanely amount of time. So is there any way A.I. can help me? Yes, one potential idea is to create an image generator.

Would these ideas work? Who knows? But that's how you come up with ideas. You ignore the feasibility part for the moment. If you feel it is really hard for you to come up with ideas, chances are you are too caught up with the technical field. Read some books, listen to music, make some art and daydream a bit. Then ideas will come.

Arthur

Some Resources on End-to-End Sequence Prediction

Important Papers:

Unsorted:

Important Implementations:

For reference, here are some papers on the hybrid approach:

Some Thoughts on Hours

Hours is one of the taboo topics in the tech industry. I can say couple of things, hopefully not fluffy:

  • Most hours are self-reported, so from a data perspective. It's really unclean. Funny story: Since I was 23, I work on weekends regularly, so in my past jobs, there were moments I note down some colleagues of mine who claim who work 60+ hours. What really happen is they only work 35-40. Most of them are stunned when I give them the measurement. There are few of them refused to talk with me later on. (Oh I work for some of them too.)
  • Then there is what it means by working long hours (60+ hours). And practically you should wonder why that's the case. How come one can't just issue an Unix command to solve a problem? Or if you want to know what you are doing, how come writing a 2000 words note take one more than 8 hours? How come it takes such a long time to solve your weekly issues? If we talk about coding, it also doesn't make sense. Because once you have the breakdown of a coding problem, you just have to solve them iteratively in small chunks. Usually it doesn't take more than 2 hours.
  • So here is a realistic portrait of respectable people I work with which you feel like he works long hours. What they did actually do?
    1, They do some work everyday even on holidays/vacations/weekends.
    2, They respond to you even at hours such as 1 or 2.
    3, They look agitated when things go wrong in their projects.
  • Now once you really analyze these behaviors : it doesn't really prove that the person works N hours. What it really means is that they stay up all the time. For the agitation part, it also makes more sense to say "Oh, this guy probably has anger issue, but at least he cares."
  • Sadly, there are also many people who really work more than 40, but they are also the least effective people I ever know.
  • I should mention that there are more positive part of long hours: first off learning. And my guess it is what the job description really means - you spent all your moments to learn. You might code daily but if you don't learn, then your speed won't improve at all. So this extra cost of learning is always worthwhile to pay. And that's why we always encourage members to learn.
  • Before I go, I actually follow the scheduling method from "Learning How to Learn". i.e. I took frequent breaks after 45-60 mins intense works. And my view of productivity is to continuously learn. Because new skills usually improve your workflow. Some of my past employers have huge issues with my approach. So you should understand my view is biased.
  • I would also add, there are individuals who can really work 80 hours and actually code. Usually they are either obliged by culture, influenced by drugs or shaped by their very special genes.

Hope this helps,

Arthur

My Third Quick Impression on HODL - Interviews with Pieter Abbeel and Yuanqing Lin

My Third Quick Impression on Heroes of Deep Learning (HODL), from the course deeplearning.ai. This time on the interviews with Pieter Abbeel and Yuanqing Lin.
 
* This is my 3rd write-up on HODL, unlike the previous two (Hinton and Bengio), I will summarize two interviews, Pieter Abbeel and Yuanquing Lin in one post because both of the interviews are short (<15 mins).
 
* Both researchers are comparatively less known than stars such as Hinton, Bengio, Lecun and Ng. But everyone knows Pieter Abbeel as a important RL researchers and lecturers and Yuanqin Lin is the head of Baidu's Institutes of Deep Learning.
 
* Gems from Pieter Abbeel:
- Is there anyway to learn RL from another algorithm?
- Is there anyway we can learn a game but use the knowledge to learn another game faster?
- He used to want to be a basketball player. (More like a fun fact.)
- On learning: Having a mentor is good.
 
* Gems from Yuanqin Lin
- Lin is the director of Baidu, when he was at NEC, he won the first Imagenet competition.
- Lin describes a fairly impressive experimental framework based on PaddlePaddle. Based on what he describe, Lin is building a framework which allow researchers to rerun an experiment using an ID. I wonder how scalable such framework is.
- Lin was a physics student specialized in Optics
- On learning: use open source framework first, but learn up basic algorithms.
 
That's what I have. Enjoy!
Arthur Chan

Some Useful Links on Neural Machine Translation

Some good resources for NNMT

Tutorial:

a bit special: Tensor2Tensor uses a novel architecture instead of pure RNN/CNN decoder/encoder.   It gives a surprisingly large amount of gain.  So it's likely that it will become a trend in NNMT in the future.

Important papers:

  • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation by Cho Et al. (link) - Very innovative and smart paper by Kyunghyun Cho.  It also introduces GRU.
  • Sequence to Sequence Learning with Neural Networks by Ilya Sutskever (link) - By Google's researchers, and perhaps it shows for the first time an NMT system is comparable to the traditional pipeline.
  • Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (link)
  • Neural Machine Translation by Joint Learning to Align and Translate by Dzmitry Bahdanau (link) - The paper which introduce attention
  • Neural Machine Translation by Min-Thuong Luong (link)
  • Effective Approaches to Attention-based Neural Machine Translation by Min-Thuong Luong (link) - On how to improve attention approach based on local attention.
  • Massive Exploration of Neural Machine Translation Architectures by Britz et al (link)
  • Recurrent Convolutional Neural Networks for Discourse Compositionality by Kalchbrenner and Blunsom (link)

Important Blog Posts/Web page:

Others: (Unsorted, and seems less important)

Usage in Chatbot and Summarization (again unsorted, and again perhaps less important.....)

Why AIDL doesn't talk about "Consciousness" more?

Here is an answer to the question, (Rephrased from Xyed Abz) "Isn't consciousness the only algorithm we need to build to create a artificial general intelligence like humans or animals?"

My thought:

Xyed Abz: I like your question because it not exactly those "How do you build an AGI, Muahaha?"-type of fluffy topic. At least you thought about "consciousness" is important in building intelligent machine.

 
But then why AIDL doesn't talk about the consciousness more? Part of the reasons is that the English term consciousness is fairly ambiguous. There are at least three definitions: "wakefulness" which humans are awake. A bit like you just wake up, but then you are not too aware of the surroundings. Then there is "attention" which is certain groups of world stimulation is arriving to your perception. And finally is a kind of "cognition access" which is Oh, out out all these things you attended, such as I am typing with my fingers, I feel the keyboard, I listen to the fan noice, I listen to car running outside. I decide to allow "writing" to occupy my mind.
 
Just a side note, these categorization are not arbitrary. Nor it is come up by me. This thinking can be traced to Christoph Koch and his long time collaborator, Francis Crick (The Nobel Prize Winner of DNA discovery). Stannish Dahaene is also another representative of such thought. I often use this school of thought to explain because they are the ones which has more backup from experiments.
 
So to your question, we should first ask what you actually mean by consciousness? If you meant a kind of "cognition access", yeah, I do think it is one of the keys to build intelligent machine. Because you may think that all the deep learning machines we build is only one type of "attention" we created, but there is no central binding mechanism to control them. That's what Bengio called "Cognition" in his HODL interview.
 
Will that be enough? Of course not. Just as I said, if you do build a binding mechanism, you are also suppose to build the perception mechanism to go around it as well. At least that's what's going on with humans.
 
Now, all these sound very nice, so aren't we have a theory already? Nope, even Koch and Dahaene's ideas are more hypothesis about the brain. But how does this "cognitive access" mechanism actually works? No one knows. Koch believes it is a region call claustrum in the brain which carries out such mechanism, yet there are many disagree with him. And of course, even if you find such region, it will take humans a while to reverse engineer it. So you might have heard of "cognitive architecture" which suggest different mechanism how the brain works.
 
Does it sound complicated? Yes, it is. Especially we really don't know what we are talking about. People who are super assertive about the brain, usually don't know what they are talk about. That's why I rather go party/dance/sing karaoke. But today is Saturday, why not?
 
Hope it is helpful!

Arthur