Some Resources on End-to-End Sequence Prediction

Important Papers:

Connectionist Temporal Classification <- the book
- But I found that Grave’s thesis is easier to follow. e.g. the definition of alpha and beta in the book doesn’t make sense to me.
- Few Alex Grave’s papers. (here, here, here)
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Baidu’s production system based on CTC
- Flat Start Training of CD-CTC-SMBR LSTM RNN Acoustic Models
- Very good explanation on the Math by Andrew Gibiansky: http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-networks/
- A comprehensive explanation of CTC on distll.
Attention-based seq2seq model:
- END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
- Work from Bengio’s group
Listen, Attend and Spell by William Chan (his thesis)
Very good presentation by Markus Nussbaum-Thom.

Unsorted:

Important Implementations:

For reference, here are some papers on the hybrid approach:

Leave a Reply Cancel reply