The definitive weekly newsletter on A.I. and Deep Learning, published by Waikit Lau and Arthur Chan. Our background spans MIT, CMU, Bessemer Venture Partners, Nuance, BBN, etc. Every week, we curate and analyze the most relevant and impactful developments in A.I.
We also run Facebook’s most active A.I. group with 191,000+ members and host a weekly “office hour” on YouTube.
Editorial
Thoughts From Your Humble Curators
We cover F8 this week, point you to various resources of ICLR 2018, we also analyze a now-classic paper on text summarization.
As always, if you like our newsletter, feel free to subscribe and forward it to your colleagues!
This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 136,000+ members and host an occasional “office hour” on YouTube. To help defray our publishing costs, you may donate via link. Or you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65. Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760
Sponsor

Comet.ml – Machine Learning Experimentation Management
Comet.ml allows tracking of machine learning experiments with an emphasis on collaboration and knowledge sharing.
It allows you to keep your current workflow and report your result from any machine – whether it’s your laptop or a GPU cluster.
It will automatically track your hyperparameters, metrics, code, model definition and allows you to easily compare and analyze models.
Just add a single line of code to your training script:
experiment = comet_ml.Experiment(api_key=”YOUR_API_KEY”)
Comet is completely free for open source projects. We also provide 100% free unlimited access for students and academics. Click here to sign up!
News
Academics vs Nature
Nature is a highly prestigious journal, but they choose to make their new machine learning journal to stay behind pay wall. No wonder now academia rallies to go against it.
F8 2018 AI Summary
FB 2018 was just hold on May 1st and 2nd. Many commenters noted the more sobering tone of the conference because of the Cambridge Analytica data scandal.
Our focus, of course, is AI, here are couple of interesting pieces of news:
“We salute our friends at DeepMind for doing awesome work,” Facebook CTO Mike Schroepfer said in today’s keynote. “But we wondered: Are there some unanswered questions? What else can you apply these tools to.”
When you lose an ML competition, you open source your code. Nothing much – while your competitor wins, you become the one who nurtures the future generation of enthusiasts. Smart move, Facebook. Here’s the github.
FB improves computer vision by using instagram resources
How Age Affects Chinese AIDLers
This story is about tech jobs in general, and it starts with how a 42 year old research engineer suicide because he is too old. You might hear a lot of stories about Chinese’s recent stellar development of A.I. and technology, but what is untold is the 996 schedules (9 to 9 for 6 days including holiday), and it could literally cost human lives.
May be this sentence can be a solace for us AIDLers?
The competition for top tech talent has prompted higher salaries and relaxed age requirements for those skilled in complex fields such as AI and machine learning, which tend to require advanced degrees.
The Death and the Rebirth of Cambridge Analytica
General news usually don’t belong to AIDL Weekly, but then CA allegedly affected elections of multiple countries, as well as abusing user’s data on Facebook. If there is any example of abuse of AI/ML, CA is probably the most notorious example. So that’s why we include the piece here.
Astute reader from AIDL pointed out though: CA is not exactly dead – Guardian found out it is just going through rebranding and its new name is Emerdata.
Blog Posts
Open Source
Open Image V4 and ECCV Challenge
Mostly an expansion of the old Open Image in terms of annotations and number of images. But then there is also a new ECCV track: relationship detection such as “women playing guitar”.
Paper/Thesis Review
A read on A Neural Attention Model for Abstractive Sentence Summarization”
This is a read on the paper “A Neural Attention Model for Abstractive Sentence Summarization” by A.M. Rush, Sumit Chopra and Jason Weston.
- Here is the arxiv, Video, Github
- The paper was written at 2015, and is more a classic paper on NN-based summarization. It is published slightly later than classic papers on NN-based translation such as those written by Cho or Badhanau. We assume you have some basic understanding on NN-based translation and attention.
- If you haven’t worked on summarization, you can broadly think of techniques as extractive or abstractive. Given the text you want to summarize, “extractive” means you just usehe word from the input text, whereas “abstractive” means you can use any words you like, even the words which are in the input text.
- So this is why summarization is seen as similar problem as translation: you just think that there is a “translation” from the original text to the summary.
- Section 2 is a fairly nice mathematical background of summarization. One thing to note, the video also bring up noisy channel formulation. But as Rush said, their paper is to completely do away noisy-channel but do direct mapping.
- The next nuance you want to look at is the type of LM and the encoder used. That can all be found in Section 3. e.g. it uses the forward NNLM proposed by Bengio. Rush mentioned that he was trying RNNLM, but at that time, he get small gain. It feels like he can probably get better results if RNNLM is used.
- Then it’s the type of encoder, there is a nice comparison between bag-of-words and attention models. Since there are words embeddings, the “bag-of-words” is actually all the input words embedded down to a certain size. Attention model, on the other hand, is what we know today, which contains a weight matrix P which map the context to input.
- Here is an insightful note from Rush: “Informally we can think of this model as simply replacing the uniform distribution in bag-of-words with a learned soft alignment, P, between the input and the summary.”
- Section 4 is more on decoding, in Section 2, Markov assumption was made, this simplifies the decoding quite a lot. The authors were using beam search, so you can use trick such as path combination.
- Another cute thing is that the authors also comes up with method such that make the summarization more extractive. For that it uses a log-linear model to also weigh features such as unigram to trigram. See Section 5.
- Why would the author wants to make the summarization more extractive? That probably has to do with the metric. ROUGE usually favors words which are extracted from the input text.
- Another note pointed out by reader at AIDL-LD is that summary usually has proper nouns and can only be found it the input text. Once again, making the summarizer extractive is more appropriate.
- Here are several interesting commentaries about the paper. mathyouth, Denny Britz
Contemporary Classic
Word Embeddings in 2017
This is an old but highly-recommended post from Sebastian Ruder on potential research directions of word embeddings. It’s certainly a repost, but you might want to read it if you are interested to work on any NN-based approach of NLP. The post was actually the “unofficial”-fifth post in his Series about embedding. The first is about the basic setups of embeddings, second about various methods of sampling, the third delve into word2vec deeply and understand the secret ingredient, the fourth is on cross-lingual embedding. We don’t pretend to understand all of them, but reading them saves us much time on literature research, and helps us to clarify a lot of concepts.
About Us