Resources on Kaldi

Introduction:

Kaldi is one of the three active open source ASR projects which is based on hybrid approach.  It has perhaps the best feature sets, but it is seen to be more advanced as a toolkit.

I like the toolkit because it works.  Also ASR developers are colorful people, and I enjoy reading their source code.

(Yes, you need to read source code to understand kaldi.)

General Resources:

  • awesome-kaldi.  - Well-deserved to be called "awesome".  Tons of useful links.
  • this page.

Basic Tutorials - the structure of the kaldi, running from egs/ etc

  • HTK Book - We are talking about kaldi, why bring up HTK then?  Well, kaldi was a response to htk.   Both were written as unix command-line tools.   Comparing kaldi with htk, htk was developed as a company codebase (Entropic).  So the code is thought as more refined, but harder to change.  Looking at both toolkit now (2020), I still find that the HTK tutorial is easier to follow.
  • The original kaldi tutorial - it uses RM, so if you don't have RM, nah. This is not going to help you run end-to-end.  But it will teach you basics of the resources.
  • The original ICASSP 2011 lecture.
  • Eleanor Chodroff's tutorial - Rare wordy explanation of the toolkits.  With some decent notes on what #senones really means.
  • Qianhui Wan's runthrough of stages in a kaldi training - Good high-level run through of kaldi's script.

More advanced topics:

  • First, a survival note.  For the most part, working with Kaldi means you work with Unix and sometimes dive deep into C++/C level code.  You would get crushed if you expect "tensorflow-style" of problem solving.
  • HBKA - WFST is one of the cores of a kaldi-based ASR system.  But it's also rather hard to grok.  These days, HBKA is seen as the Bible of learning WFST.   The key algorithms in WFST are determinization and minimization.  Well, they are actually variants of the FST.  (In the case of minimization, you just use the FST version to minimize.) So to understand what you are doing,  you also want to have the basics of some classic FST algorithms. So a computational complexity book is very useful too.  (I use Hopcraft and Ullman) .
  • If you want to dig deeper, several papers which contain the detail algorithms (and proofs) of determinization and minimization.  are here and here.   If HBKA is the Bible, these papers might be the Words. 🙂
  • Other more wordy tutorials on WFST: Vassil Panayotov's  Josh Meyer's
  • Btw, talking about internals of WFST these days i seen as "advanced" topics.   Most people are using TF/Pytorch.  So revolutionary technologies such as WFST were forgotten.

When you need to hack kaldi......

  • Changing source code of kaldi, or in general, open source speech recognizers, is not the worst thing happen to a hacker.  For the most part, you can derive most information by reading the source code.   There are modules  which are terse .  e.g. nnet3.   Say if you want to add a new computation command, then you want to go through several classes to make it works.   On the same vein, you don't really see any description of how individual command works.   Think of it as assembly code to C, you will need to work it through yourself.
  • The good news is ...... it's possible.  As always, you just need some coffee and a comfortable chair.
  • What if you want to read some documentation then?  Then go with https://kaldi-asr.org/doc/index.html.   You will be able to read high-level understanding of some algorithms.

(to be continued.)

Acknowledgement

You always want to thank Dan Povey and the kaldi team for their great work.   Hybrid approach is not going away soon.

Leave a Reply

Your email address will not be published. Required fields are marked *