attention – The Grand Janitor Blog V3

Some good resources for NNMT

Tutorial:

NMT tutorial written by Thang Luong – my impression is that it is a shorter tutorial with step-by-step procedure. The part which is slightly disappointing is that it doesn’t quite record exactly how the benchmarking experiments were run and evaluated. Of course, it’s kind of trivial to fix it, but it did take me a bit of time.
The original Tensorflow seq2seq tutorial – more a big gun of SMT, the first experiment I played with. Now we are talking about the WMT15 set.
tf-seq2seq (blog post: here)
Graham Neubig’s tutorial
Nematus
OpenNMT – this is PyTorch-based, if you are using OpenNMT-py with python 2.7, use a patched version of mine?
NeuralMonkey (Tensorflow-based)
Prof. Philip Koehn’s new chapter on NMT. I also check out his “6 Challenges in Neural Machine Translation“.

a bit special: Tensor2Tensor uses a novel architecture instead of pure RNN/CNN decoder/encoder. It gives a surprisingly large amount of gain. So it’s likely that it will become a trend in NNMT in the future.

Important papers:

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation by Cho Et al. (link) – Very innovative and smart paper by Kyunghyun Cho. It also introduces GRU.
Sequence to Sequence Learning with Neural Networks by Ilya Sutskever (link) – By Google’s researchers, and perhaps it shows for the first time an NMT system is comparable to the traditional pipeline.
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (link)
Neural Machine Translation by Joint Learning to Align and Translate by Dzmitry Bahdanau (link) – The paper which introduce attention
Neural Machine Translation by Min-Thuong Luong (link)
Effective Approaches to Attention-based Neural Machine Translation by Min-Thuong Luong (link) – On how to improve attention approach based on local attention.
Massive Exploration of Neural Machine Translation Architectures by Britz et al (link)
Recurrent Convolutional Neural Networks for Discourse Compositionality by Kalchbrenner and Blunsom (link)

Important Blog Posts/Web page:

Attention and Augmented Recurrent Neural Networks: Only partially relevant to attention-based RNN, but Olah’s writing is always worthwhile to read.
Stanford NMT research page: Related to Luong, See and Manning’s work on NMT. Very entertaining to look at recent techniques. Tutorial/code/models are available.
If you are just curious about how the old IBM Models work, you should check out the old Bible of the field: Prof. Phillip Koehn’s Statistical Machine Translation.

Summarization:

The more old-style summarization. You may take a look of this survey.
Easy to read blog posts on NN-based summarization: A Gentle Introduction to Text Summarization.
Also this from Pavel Surmerok, which is fairly well-written.
Papers:
- A Neural Attention Model for Abstractive Sentence Summarization
- Get To The Point: Summarization with Pointer-Generator Networks
Resources I can’t beat:
- Awesome Text Summarization by icoxfog47
- Awesome Text Summarization by mathsyouth , especially under “Abstractive Text Summarization”.

Usage in Dialogue System:

An earlier paper (2015) which discusses how to improve diversity of a generator.

Others: (Unsorted, and seems less important)