Issue 58 – F8 and ICLR 2018 – The Grand Janitor Blog V3

Editorial

Thoughts From Your Humble Curators

We cover F8 this week, point you to various resources of ICLR 2018, we also analyze a now-classic paper on text summarization.

As always, if you like our newsletter, feel free to subscribe and forward it to your colleagues!

This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 136,000+ members and host an occasional “office hour” on YouTube. To help defray our publishing costs, you may donate via link. Or you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65. Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

Artificial Intelligence and Deep Learning Weekly

News

Academics vs Nature

Nature is a highly prestigious journal, but they choose to make their new machine learning journal to stay behind pay wall. No wonder now academia rallies to go against it.

techcrunch.com

F8 2018 AI Summary

FB 2018 was just hold on May 1st and 2nd. Many commenters noted the more sobering tone of the conference because of the Cambridge Analytica data scandal.

Our focus, of course, is AI, here are couple of interesting pieces of news:

Production-Ready PyTorch 1.0

Open sourcing Go Bot

“We salute our friends at DeepMind for doing awesome work,” Facebook CTO Mike Schroepfer said in today’s keynote. “But we wondered: Are there some unanswered questions? What else can you apply these tools to.”

When you lose an ML competition, you open source your code. Nothing much – while your competitor wins, you become the one who nurtures the future generation of enthusiasts. Smart move, Facebook. Here’s the github.

FB improves computer vision by using instagram resources

Artificial Intelligence and Deep Learning Weekly

How Age Affects Chinese AIDLers

This story is about tech jobs in general, and it starts with how a 42 year old research engineer suicide because he is too old. You might hear a lot of stories about Chinese’s recent stellar development of A.I. and technology, but what is untold is the 996 schedules (9 to 9 for 6 days including holiday), and it could literally cost human lives.

May be this sentence can be a solace for us AIDLers?

The competition for top tech talent has prompted higher salaries and relaxed age requirements for those skilled in complex fields such as AI and machine learning, which tend to require advanced degrees.

bloomberg.com

The Death and the Rebirth of Cambridge Analytica

General news usually don’t belong to AIDL Weekly, but then CA allegedly affected elections of multiple countries, as well as abusing user’s data on Facebook. If there is any example of abuse of AI/ML, CA is probably the most notorious example. So that’s why we include the piece here.

Astute reader from AIDL pointed out though: CA is not exactly dead – Guardian found out it is just going through rebranding and its new name is Emerdata.

theverge.com

Deals

Cisco Acquired Accompany for $270M
From a less-known source: ServiceNow acquires Parlo

Artificial Intelligence and Deep Learning Weekly

Blog Posts

Resources from ICLR 2018.

googleblog.com

Open Source

Open Image V4 and ECCV Challenge

Mostly an expansion of the old Open Image in terms of annotations and number of images. But then there is also a new ECCV track: relationship detection such as “women playing guitar”.

googleblog.com

Paper/Thesis Review

A read on A Neural Attention Model for Abstractive Sentence Summarization”

This is a read on the paper “A Neural Attention Model for Abstractive Sentence Summarization” by A.M. Rush, Sumit Chopra and Jason Weston.

Here is the arxiv, Video, Github
The paper was written at 2015, and is more a classic paper on NN-based summarization. It is published slightly later than classic papers on NN-based translation such as those written by Cho or Badhanau. We assume you have some basic understanding on NN-based translation and attention.
If you haven’t worked on summarization, you can broadly think of techniques as extractive or abstractive. Given the text you want to summarize, “extractive” means you just usehe word from the input text, whereas “abstractive” means you can use any words you like, even the words which are in the input text.
So this is why summarization is seen as similar problem as translation: you just think that there is a “translation” from the original text to the summary.
Section 2 is a fairly nice mathematical background of summarization. One thing to note, the video also bring up noisy channel formulation. But as Rush said, their paper is to completely do away noisy-channel but do direct mapping.
The next nuance you want to look at is the type of LM and the encoder used. That can all be found in Section 3. e.g. it uses the forward NNLM proposed by Bengio. Rush mentioned that he was trying RNNLM, but at that time, he get small gain. It feels like he can probably get better results if RNNLM is used.
Then it’s the type of encoder, there is a nice comparison between bag-of-words and attention models. Since there are words embeddings, the “bag-of-words” is actually all the input words embedded down to a certain size. Attention model, on the other hand, is what we know today, which contains a weight matrix P which map the context to input.
Here is an insightful note from Rush: “Informally we can think of this model as simply replacing the uniform distribution in bag-of-words with a learned soft alignment, P, between the input and the summary.”
Section 4 is more on decoding, in Section 2, Markov assumption was made, this simplifies the decoding quite a lot. The authors were using beam search, so you can use trick such as path combination.
Another cute thing is that the authors also comes up with method such that make the summarization more extractive. For that it uses a log-linear model to also weigh features such as unigram to trigram. See Section 5.
Why would the author wants to make the summarization more extractive? That probably has to do with the metric. ROUGE usually favors words which are extracted from the input text.
Another note pointed out by reader at AIDL-LD is that summary usually has proper nouns and can only be found it the input text. Once again, making the summarizer extractive is more appropriate.
Here are several interesting commentaries about the paper. mathyouth, Denny Britz

Artificial Intelligence and Deep Learning Weekly

Contemporary Classic

Word Embeddings in 2017

This is an old but highly-recommended post from Sebastian Ruder on potential research directions of word embeddings. It’s certainly a repost, but you might want to read it if you are interested to work on any NN-based approach of NLP. The post was actually the “unofficial”-fifth post in his Series about embedding. The first is about the basic setups of embeddings, second about various methods of sampling, the third delve into word2vec deeply and understand the secret ingredient, the fourth is on cross-lingual embedding. We don’t pretend to understand all of them, but reading them saves us much time on literature research, and helps us to clarify a lot of concepts.

ruder.io

About Us

This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 138,000+ members and host an occasional “office hour” on YouTube. To help defray our publishing costs, you may donate via link. Or you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65. Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

Artificial Intelligence and Deep Learning Weekly

Editorial

Thoughts From Your Humble Curators

Sponsor

News

F8 2018 AI Summary

Deals

Blog Posts

Open Source

Paper/Thesis Review

A read on A Neural Attention Model for Abstractive Sentence Summarization”

Contemporary Classic

About Us

Leave a Reply Cancel reply