Category Archives: Uncategorized

A Read on "A Neural Attention Model for Abstractive Sentence Summarization" by A.M. Rush, Sumit Chopra and Jason Weston.

(First appeared in AIDL-LD and AIDL Weekly.)

This is a read on the paper "A Neural Attention Model for Abstractive Sentence Summarization" by A.M. Rush, Sumit Chopra and Jason Weston.

* Video: . Github:

* The paper was written at 2015, and is more a classic paper on NN-based summarization. It is published slightly later than classic papers on NN-based translation such as those written by Cho or Badhanau. We assume you have some basic understanding on NN-based translation and attention.

* There is a github ( and a video ( for the paper.

* If you haven't worked on summarization, you can broadly think of techniques as extractive or abstractive. Given the text you want to summarize, "extractive" means you just usehe word from the input text, whereas "abstractive" means you can use any words you like, even the words which are in the input text.

* So this is why summarization is seen as similar problem as translation: you just think that there is a "translation" from the original text to the summary.

* Section 2 is a fairly nice mathematical background of summarization. One thing to note, the video also bring up noisy channel formulation. But as Rush said, their paper is to completely do away noisy-channel but do direct mapping.

* The next nuance you want to look at is the type of LM and the encoder used. That can all be found in Section 3. e.g. it uses the forward NNLM proposed by Bengio. Rush mentioned that he was trying RNNLM, but at that time, he get small gain. It feels like he can probably get better results if RNNLM is used.

* Then it's the type of encoder, there is a nice comparison between bag-of-words and attention models. Since there are words embeddings, the "bag-of-words" is actually all the input words embedded down to a certain size. Attention model, on the other hand, is what we know today, which contains a weight matrix P which map the context to input.

* Here is an insightful note from Rush: "Informally we can think of this model as simply replacing the uniform distribution in bag-of-words with a learned soft alignment, P, between the input and the summary."

* Section 4 is more on decoding, in Section 2, Markov assumption was made, this simplifies the decoding quite a lot. The authors were using beam search, so you can use trick such as path combination.

* Another cute thing is that the authors also comes up with method such that make the summarization more extractive. For that it uses a log-linear model to also weigh features such as unigram to trigram. See Section 5.

* Why would the author wants to make the summarization more extractive? That probably has to do with the metric. ROUGE usually favors words which are extracted from the input text.

* We will stop at this point. Here are several interesting commentaries about the paper.

Denny Britz':…/neural-attention-model-for-abstractive…

Quick Impression on Waikit Lau's Crypto and Blockchain forum and MIT

* Hosted by Coach Wei. Our own Waikit Lau is presenting the topic on blockchain, cryptocurrency and ICO. Me and Brian Subiranna were invited as guest panels at the Q&A forums. Brian is planning to create a class on blockchain at MIT.

* About the forum, Coach Wei has launched MIT-Tsinghua Summit since Dec 30 last year. So this is part of the talk in the series:…/mit-tsinghua-innovation-summit-…/

* And Waikit, as you might know, he has been successful serial entrepreneur, angel and involved in several ICOs. He also co-admin the AIDL forum.

* In my view, Waikit gave a great presentation on the excitement of blockchain and cryptos. His ~50 min presentation have couple gists
- The rise of protocol coins such as ethereum.
- The potential business opportunity is comparable to development of HTTP. Waikit use the metaphor that blockchain can be seen as TCP/IP. Whereas application build on top of blockchain can be thought as HTTP.
- The current ambiguity of how ICO should be regulated. Or generally: should cryptos be seen as a commodity or a security?

* Gems from the Q&A session. The crowd has many sharp questions to the panels. Here is a summary:

Q. Are there any values of blockchain without cryptocurrency?
A. Generally yes from the panels. e.g. most chains can exist without the idea of mining. Mining probably makes more sense when parties within the network don't trust each other.

Q. What is the current state of decentralized exchanges?
A. Panels: Still under development. There are a lot of things need to happen to motivate a large ones.

Q. Would quantum computing be a threat to blockchain?
A. Panels (Arthur): It could be, yet current quantum computing still several technical roadblocks to solve to make it usable for applications. e.g. create stabilized inputs for the QC. There are also counter technology such as quantum cryptography being developed. So we can't quite say QC would just kill blockchain even if it is developed.

Q. Should a chain be controlled by one or multiple parties?
A. Yet another issue which is hard to predict. Looking at the development of Ethereum and Bitcoin, having a benevolent dictator seems to make the Ethereum community more unified. But the fact that Bitcoin's community is segmented that now the buyers/users have more say, might motivate adoption of speed-up algorithm.

Q. Would speeding up mining speed up transaction?
A. Unlikely. What would likely to improve transaction speed are technology such as Lightning.

That's what I have.  You can find the original thread at


Quick Impression of Course 5 of

Hey Hey! As you know Course 5 is just out, as always, I would check out the class and give you a quick impression of what about. So far, these new 3-week class look very exciting. Here is my takeaway. Remember, I haven't started the class yet. But this is likely to give you a sense of the scope and extent of the class.

* Course 5 is mostly focused on sequence models. That include the more mysterious models such as RNN, GRU, LSTM. You will go through standard ideas such as vanishing gradients which actually first discovered in RNN. Then go through GRU and LSTM afterward.

* The sequence of coverage is nice, covering GRU first, then LSTM doesn't quite follow the historical order. (Hochreiter & Schmidhuber first discovered LSTM in 97, Cho had the idea about GRU in 2014). But such order makes more sense for pedagogical purpose. I don't want to spoil it, but this is also how Socher's approach of the subject in cs229n as well. That makes me believe this is likely a course which would teach you well on RNN.

* Week 1 will be all about RNN, then Week 2 and 3 would be about word vectors, and end-to-end structure. Would one week be enough for each topic? Not at all. But Andrew seems to give all the essential in each topic - word2vec/GloVec in word vectors. Standard dec-enc structure in end-to-end scheme. Most examples are based on SMT, which I think it's appropriate. Other applications such as image captioning or speech recognition are possible applications. But they usually have details which is tough to cover in the first class.

* Would this class be everything you need on NLP? Very unlikely. You still need take cs229n to get good. Or even the Oxford class. But just like Course 4 is a good intro to computer vision. This will be a good intro to NLP and in general any topics which require sequence modeling such as speech recognition, stock analysis or DNA sequence analysis.

The course link can be found at

Hope this "Quick Impression" helps you!

Comparing and Udacity's nanodegree

I would think it this way:

"For the most part MOOC certificates don't mean too much in real life. It is whether you can actually solve problem matters. So the meaning of MOOC is really there to stimulate you to learn. And certificate serves as a motivation tool.

As for OP's question. I never take the Udacity nanodegree. From what I heard though, I will say the nanodegree will require effort to take 1 to 2 Ng's specialization. It's also tougher if you need to take a course in a specified period of time. But the upside is there are human graders which give you feedbacks.

As for which path to go, I think it's solely depend on your finance. Let's push to an extreme: e.g. If you purely think of credential and opportunities May be an actual PhD/Master degree will give you the most, but then the downside is it can cost you multi-year of salaries. One tier down would be online ML degree from Georgia tech, but it will still cost you up to $5k. Then there is taking cs231n or cs224d from Stanford online, again that will cost you $4k/class. So that's why you would consider to take MOOC. And as I said which price tag you choose depends on how motivate you are and how much feedbacks you want to get."

Quick Impression of Course 4 of

* Fellows, I just got the certificate. As you know, I really love to review the whole specialization so I decide to take the class myself. Since I haven't watched all lectures yet, this is my usual "Quick Impression".

* I am a slow learner (getting stubborn?  ) , so I was more like the 100th guy who finish the class in our Coursera forum. Go check it out. There are many experienced DLers there:

* Many of you had noted that it is a very good class for computer vision. My assessment so far this is a very good *first* class if you want to get into DL-based computer vision. But then there are many great material you can still only learn from cs231n. So I would suggest you to go through it as well after you finish Course 4.

* So far as I see, it has perhaps one of the best explanations on basics of Convnet, as well as what YOLO is. Perhaps the pity is the short length of the course disallow going in-depth on topic such as RCNN, segmentation, GAN-based synthesis. All of these, cs231n has much better coverage.

* At this point, the course is still a bit unstable. There are assignment which requires an implementation which doesn't match the notebook. But the staff is working on it now.

That's what I have. Hope this is helpful. I am going to write up a full review for Course 3 and 4 soon too.

On Whether AI Are Stealing Our Jobs.

This is a question that come up from time to time. So I will just give you couple of perspectives.
First thing first: you should first ask what "A.I." are we talking about. A lot of times, when people are talking about machine are taking jobs away from people, they are really talking about "automation" is taking away the jobs. Now "automation" just means something can be done without human interference, it might or might not have to do with A.I. For example, I can write a for-loop to repeat something for 1 million times. But no one would call it "A.I.", "programming" may be. So in that sense automobile, large-scale industry equipment are all "automations".
This is an important note because to determine if jobs are really taken away from humans, we need to look at economic data, but so far, there is not many reports which are really on "A.I"'s impact. But there are many studies which look at the impact of "automation", then they will say "Oh, AI in recent years have been helping automation" but they usually don't quantify the impact.
Now then there is the idea of this supreme AI being (or AGI) will control everything in the world. Valeria Iegorova said it well, we are just far far away from that scenario. Just imagine you want to automate the manual work by a construction worker - creating a biepedal robot with the capability to walk around a construction site is a billion dollar project. Not to say to ask them actually do work.
So why would so many people fear about A.I. steal away their jobs then? Well, many of those who complains are really unemployed because of various reasons. May be their industry is just no longer appropriate for the time.
Now are there any cases where A.I. are replacing humans? Yes. But their economists will tell you a long-known fact - with proper re-training, humans can quite easily get back to workforce. So you might here a lot of more senior citizens these days are learning programming and get hired. These are usually incentivized by governments. In an other words - if you live in a place governments sense unemployment issues early, they will usually come up with reasonable solutions to resolve the problem.
Hope this is not too long. But I think it covers the topic. Other than that, good luck!

(Repost) Quick Impression on "Deep Learning" by Goodfellow et al.

(I wrote it back in Feb 14, 2017.)

I have some leisure lately to browse "Deep Learning" by Goodfellow for the first time. Since it is known as the bible of deep learning, I decide to write a short afterthought post, they are in point form and not too structured.

* If you want to learn the zen of deep learning, "Deep Learning" is the book. In a nutshell, "Deep Learning" is an introductory style text book on nearly every contemporary fields in deep learning. It has a thorough chapter covered Backprop, perhaps best introductory material on SGD, computational graph and Convnet. So the book is very suitable for those who want to further their knowledge after going through 4-5 introductory DL classes.

* Chapter 2 is supposed to go through the basic Math, but it's unlikely to cover everything the book requires. PRML Chapter 6 seems to be a good preliminary before you start reading the book. If you don't feel comfortable about matrix calculus, perhaps you want to read "Matrix Algebra" by Abadir as well.

* There are three parts of the book, Part 1 is all about the basics: math, basic ML, backprop, SGD and such. Part 2 is about how DL is used in real-life applications, Part 3 is about research topics such as E.M. and graphical model in deep learning, or generative models. All three parts deserve your time. The Math and general ML in Part 1 may be better replaced by more technical text such as PRML. But then the rest of the materials are deeper than the popular DL classes. You will also find relevant citations easily.

* I enjoyed Part 1 and 2 a lot, mostly because they are deeper and fill me with interesting details. What about Part 3? While I don't quite grok all the Math, Part 3 is strangely inspiring. For example, I notice a comparison of graphical models and NN. There is also how E.M. is used in latent model. Of course, there is an extensive survey on generative models. It covers difficult models such as deep Boltmann machine, spike-and-slab RBM and many variations. Reading Part 3 makes me want to learn classical machine learning techniques, such as mixture models and graphical models better.

* So I will say you will enjoy Part 3 if you are,
-a DL researcher in unsupervised learning and generative model or
-someone wants to squeeze out the last bit of performance through pre-training.
-someone who want to compare other deep methods such as mixture models or graphical model and NN.

Anyway, that's what I have now. May be I will summarize in a blog post later on, but enjoy these random thoughts for now.


(Repost) Recommended books in Machine Learning/Deep Learning.

(I am editing my site, so I decide to separate the book list into a separate page.)

I am often asked what the best beginner books on machine learning.  Here I list several notable references and they are usually known as "Bibles" in the field.   Also read the comments on why they are useful and how you may read them.

Machine Learning:


Pattern Recognition and Machine Learning by Christopher Bishop

One of the most popular and useful references in general machine learning.   It is also the tougher book to read among this list.   Generally known as PRML,  Pattern Recognition and Machine Learning is a comprehensive treatment on several important and relevant machine learning techniques such as neural networks, graphical models and boosting.   There are in-depth discussion as well as supplementary exercises on each techniques.

The book is very Bayesian, and rightly so because Bayesian thinking is very useful in practice.   e.g. It's treatment of bias-variance is to treat it as the "frequentist illusion", which is a more advanced view point compared to most beginner classes you would take. (I think only Hinton's class fairly discuss the merit of Bayesian approach.)

While it is a huge tomb, I would still consider the book as a beginner book, because it doesn't really touch all important issues in all techniques.  e.g.  there is no in-depth discussion in sequential minimal optimization (SMO) in SVM.   It is also not a deep learning /deep neural network book.  For that Bengio/GoodFellow's book seem to be a much better read.

If you want to reap benefit out of this book, consider to do exercise from the back of the books.  Sure it will take you a while, but doing any one of the exercises would give you incredible insight on how different machine techniques work.

Pattern Classification 3rd Edition by R. Duda, P.E. Hart and D.G Stork

Commonly known as "Duda and Hart",  its 2nd Edition titled "Pattern Classification and Scene Analysis" was more known to be bible of pattern classification.  Of course, nowadays "machine learning" is the more trendy term, and in my view the two topics are quite similar.

The book is highly technical (and perhaps terse) description of machine learning, which I found more senior scientists usually referred to back when I was working at Raytheon BBN.

Compare to PRML, I found that "Duda and Hart" is slightly outdated, but it's treatment on linear classifiers is still very illuminating.   The 3rd edition is updated so that there are computer exercises.   Since I usually learn an algorithm directly looking at either the original paper or source code, I found these exercises are not as useful.   But some of my first mathematical drilling (back in 2000s) on pattern recognition does come from the guided exercises of this book, so I still recommend this book to beginners.

Machine Learning by Tom Mitchell

Compared to PRML and Duda & Hart,  Mitchell's book is much shorter and concise, thus more readable.  It is also more "rule-based" so there are discussion on concept learning, decision trees e.g.

If you want to read an entire book of machine learning, this could be your first choice.   Both PRML and Duda&Hart  are not for faint of heart.    While Mitchell's book is perhaps less relevant for today's purpose, I still found its discussion of decision tree and artificial neural network very illuminating.

The Master Algorithm : How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

You can think of it as a popular sci non-fi.   It's also a great introduction on several schools of thoughts in machine learning.

Books I heard which are Good.

  1. Hasti/Tibshirani/Friedman's Elements of Statistical Learning
  2. Barber's Bayesian Reasoning and Machine Learning
  3. Murphy's Machine Learning: a Probabilistic Perspective
  4. MacKay's Information Theory, Inference and Learning Algorithms
  5. Goodfellow/Bengio/Courville's Deep Learning  - the only one on this list which is related to deep learning. (See my impression here.)

More Advanced Books (i.e. They are good but I don't fully Grok them.)

  1. Perceptrons: An Introduction to Computational Geometry, Expanded Edition by Marvin Minsky and Seymour Papert - an important book which change history of neural network development.
  2. Parallel Models of Associative Memory by Geoff Hinton - another book of historical interest.

A Quick Impression on the Course "Synapses, Neurons and Brains"

Hi Guys, I recently finished the coursework of Synapses, Neurons and Brains (SNB below). Since neuroscience is really not my expertise, I just want to write a "Quick Impression" post to summarize what I learned:

* Idan Segev is an inspiring professor and you can feel his passion of the topic of the brain throughout the class.

* Prof. Segev is fond of the use of computational neuroscience, and thus simulation approach of connectome. Perhaps thus the topics taught in the class, Hudgins-Huxley model, Rall's cable Model, dendritic computation, the Blue brain projects.

* Compare to Fairhall and Rao's computational neuroscience, which has a general sense of applying ML approach/thinking in Neuroscience. SNB has a stronger emphasis on discussing motivating neuroscientific experiments such as Hubel and Wiesel when discussing the neurocortex. And the "squid experiment" when developing the HH model. So I found it very educational.

* The course also touches on seemingly more philosophical issues such as "Can you download a mind?", "Is it possible to read mind?", "Is there such thing call free will?" Prof. Segev presented his point of view and supporting experiments. I don't want to spoil it out, check it out if you like.

* Finally, it's on coursework - ah, it's all multiple choices and you can try up to 10 times per 8 hours but this is a tough course to pass. The course feature many multi-multiple choices question and doesn't give you any feedback on your mistakes. And you need to understand the course material quite well to get them right correctly.

* Some students even complained that some of the questions don't make sense - I think it is going a bit too far. But it's fair to say that the course wasn't really well maintained in the last 2 years or so. And you don't really see any mentors chime in to help students. That could be a downside for all of us learners.

* But I would say I still learn a lot in the process. So I do recommend you to listen through the lecture if you are into neuroscience. May be what you should decide is if you want to finish all the coursework.

That's what I have. Enjoy!

Re AIDL Member: The Next AI Winter

Re Dorin Ioniţă (Also a longer write-up Sergey Zelvenskiy's post) Whenever people asked me about AI winter. I couldn't help but think of on-line poker in 2008 and web programming in 2000. But let me just focus on web-programming?

At around 1995-2001, there was the time people keep on telling you "web programming" is the future. Many young people were told that if you know html and CGI programming, you would have a bright future. That's not too untrue. In fact, if you get good at web programming at 2000, you probably started a company and made a decent living for .... 3-4 years. But then competition arises, many college starts to include web as a core curriculum - as a result, web programming is sort of a very common skills nowadays. I am not saying it is not useful - but you are usually competing with 100 programmers to get one job.

So back to AI. Since we start to realize AI/DL can be useful, now everyone is jumping onto the wagon. Of course, there are more senior people who has been 'there', joined couple of DARPA projects or worked in ML startup years before deep learning. But most of them are frankly young college kids, who try to have a future with AI/DL. (Check out our forum?) For them, I am afraid it's likely that they will encounter a future similar to web programmers in 2000. The supply of labor will one day surpass the demand. So it's very likely that data science/machine learning is not their final destination.

So am I arguing there is an AI winter coming? Not in the old classical sense of "AI Winter" when research funding dried up. But more on like AI as a product in a product cycle - just like every technology - it will go through a hype cycle. And one day when the reality of the product doesn't meet expectation, things would just bust. It's just the way it is. We can argue to the death on whether deep learning is different or not. But you should know every technology follow similar hype cycle. Some last longer, some don't. We will have to wait and see.

For OP: If you are asking of a career advice though, so here is something I learn from poker (tl;dr story) and many other things in life - if you are genuinely smart, you can always learn up a new topic quicker than other people. That's usually what determine if you can make a living. The rest are luck, karma and whether you buy beers for your friends.