(I am editing my site, so I decide to separate the book list into a separate page.)

I am often asked what the best beginner books on machine learning. Here I list several notable references and they are usually known as “Bibles” in the field. Also read the comments on why they are useful and how you may read them.

Machine Learning:

Pattern Recognition and Machine Learning by Christopher Bishop

One of the most popular and useful references in general machine learning. It is also the tougher book to read among this list. Generally known as PRML, Pattern Recognition and Machine Learning is a comprehensive treatment on several important and relevant machine learning techniques such as neural networks, graphical models and boosting. There are in-depth discussion as well as supplementary exercises on each techniques.

The book is very Bayesian, and rightly so because Bayesian thinking is very useful in practice. e.g. It’s treatment of bias-variance is to treat it as the “frequentist illusion”, which is a more advanced view point compared to most beginner classes you would take. (I think only Hinton’s class fairly discuss the merit of Bayesian approach.)

While it is a huge tomb, I would still consider the book as a beginner book, because it doesn’t really touch all important issues in all techniques. e.g. there is no in-depth discussion in sequential minimal optimization (SMO) in SVM. It is also not a deep learning /deep neural network book. For that Bengio/GoodFellow’s book seem to be a much better read.

If you want to reap benefit out of this book, consider to do exercise from the back of the books. Sure it will take you a while, but doing any one of the exercises would give you incredible insight on how different machine techniques work.

Pattern Classification 3rd Edition by R. Duda, P.E. Hart and D.G Stork

Commonly known as “Duda and Hart”, its 2nd Edition titled “Pattern Classification and Scene Analysis” was more known to be bible of pattern classification. Of course, nowadays “machine learning” is the more trendy term, and in my view the two topics are quite similar.

The book is highly technical (and perhaps terse) description of machine learning, which I found more senior scientists usually referred to back when I was working at Raytheon BBN.

Compare to PRML, I found that “Duda and Hart” is slightly outdated, but it’s treatment on linear classifiers is still very illuminating. The 3rd edition is updated so that there are computer exercises. Since I usually learn an algorithm directly looking at either the original paper or source code, I found these exercises are not as useful. But some of my first mathematical drilling (back in 2000s) on pattern recognition does come from the guided exercises of this book, so I still recommend this book to beginners.