Category Archives: pocketsphinx

A look on Sphinx3's initialization

I worked on Sphinx 3 a lot.  In these days, it was generally regarded as an "old-style" recognizer as compared to Sphinx 4 and PocketSphinx.   It is also not support officially by the SF's guys.

Coders of speech recognition think a little bit different.  They usually stick to a certain codebase which they feel comfortable with.   For me, it is not just a personal preference, it also reflects how much I know about a certain recognizer.  For example, I know quite a bit of how Sphinx 3 performs.   In these days, I tried to learn how Sphinx 4 fare as well.   So far, if you ask me to choose an accurate recognizer, I will still probably choose Sphinx 3, not because the search technology is better (Sphinx 4 is way superior), but because it can easily made to support several advanced modeling types.  This seems to be how the 2010 developer meeting concluded as well.

But that was just me. In fact, I am bullish on all Sphinx recognizers.  One thing I want to note is the power of Sphinx 4 in development.  There are many projects are based on Sphinx 4.  In these days, if you want to get a job on speech recognizer, knowing Sphinx 4 is probably a good ticket.  That's why I am quite keen on learning it more so hopefully I can write on both recognizers more.

In any case, this is a Sphinx 3's article.  I will probably write more on each components.   Feel free to comments.

How Sphinx3 is initialized:

Here is a listing of function used on how Sphinx 3 is initialized I got from Sphinx 3.0.8.  Essentially, there are 3 layers of initialization, kb_init, kbcore_init and s3_am_init.  Separating kb_init and kbcore_init probably starts very early in Sphinx 3.  Whereas separating s3_am_init from kbcore_init was probably from me. (So all blames on me.)  That is to support -hmmdir.

 kb_init  
-> kbcore_init (*)
-> beam_init
-> pl_init
-> fe_init
-> feat_array
-> stat_init
-> adapt_am_init
-> set operation mode
-> srch_init
kbcore_init
-> Look for feat.params very early on.
-> logmath_init
-> feat_init
-> s3_am_init (*)
-> cmn_init
-> dict_init
-> misc. models init
mgau_init such as
-> subvq_init
-> gs_read
-> lmset_init
-> fillpen_init
-> dict2pid_build <- Should put into search
s3_am_init
-> read_lda
-> read in mdef.
-> depends on -senmgau type
.cont. mgau_init
.s2semi. s2_semi_mgau_init
if (-kdtree)
s2_semi_mgau_load_kdtree
.semi or .s3cont.
ms_mgau_init
-> tmat_init
Note:
  • -hmmdir override all other sub-parameters. 

Arthur

Me and CMU Sphinx

As I update this blog more frequently, I noticed more and more people are directed to here.   Naturally,  there are many questions about some work in my past.   For example, "Are you still answering questions in CMUSphinx forum?"  and generally requests to have certain tutorial.  So I guess it is time to clarify my current position and what I plan to do in future.

Yes, I am planning to work on Sphinx again but no, I probably don't hope to be a maintainer-at-large any more.   Nick proves himself to be the most awesome maintainer in our history.   Through his stewardship, Sphinx prospered in the last couple of years.  That's what I hope and that's what we all hope.    
So for that reason, you probably won't see me much in the forum, answering questions.  Rather I will spend most of my time to implement, to experiment and to get some work done. 
There are many things ought to be done in Sphinx.  Here are my top 5 list:
  1. Sphinx 4 maintenance and refactoring
  2. PocketSphinx's maintenance
  3. An HTKbook-like documentation : i.e. Hieroglyphs. 
  4. Regression tests on all tools in SphinxTrain.
  5. In general, modernization of Sphinx software, such as using WFST-based approach.
This is not a small undertaking so I am planning to spend a lot of time to relearn the software.  Yes, you hear it right.  Learning the software.  In general, I found myself very ignorant in a lot of software details of Sphinx at 2012.   There are many changes.  The parts I really catch up are probably sphinxbase, sphinx3 and SphinxTrain.   One PocketSphinx and Sphinx4, I need to learn a lot. 
That is why in this blog, you will see a lot of posts about my status of learning a certain speech recognition software.   Some could be minute details.   I share them because people can figure out a lot by going through my status.   From time to time, I will also pull these posts together and form a tutorial post. 
Before I leave, let me digress and talk about this blog a little bit: other than posts on speech recognition, I will also post a lot of things about programming, languages and other technology-related stuffs.  Part of it is that I am interested in many things.  The other part is I feel working on speech recognition actually requires one to understand a lot of programming and languages.   This might also attract a wider audience in future. 
In any case,  I hope I can keep on.  And hope you enjoy my articles!
Arthur

Start to look at the repository tree

Programming as a profession is a a strange one.   If you are a doctor, you can usually carry your knowledge and skills from one place to another provided that you have exactly the same tool.    If you are a programmer, you speed and skill are partially determined by the tools you build in house for a particular place.   So for example, I am not supposed to use any tool I built when I worked in the small video-advertising start-up.   Even if I can do something in 1 second at that period of time, if I change my job, I will need to restart and rebuild the tool again.   We are probably talking about days to rebuild the tool and weeks to refine it again.

There is one exception: if you worked in open source, much of your code would be stored in a public place.   Even when you have left your job for long time, it is legit for you to use it again.  You don't have to solve the same problem again and again.   This is the beauty of open source and I am greatly benefited by it personally. 
As I start to regain my muscles in Sphinx, I start to notice that there are much changes in last 6 years.  Just look at the top level of Subversion:
File  Rev. Age Author Last log entry
 Parent Directory
 CLP/  10079  23 months  dhdfu  Finally add an -F argument to use the full path in the control file as the label…
PocketSphinxAndroidDemo/  11117  9 months  nshmyrev  Wrapper for nbest
 SimpleLM/  22  12 years  rickyhoughton  Initial revision
 Speech-Recognizer-SPX/  8933  3 years  nshmyrev  Update module to recent pocketsphinx API
 SphinxTrain/  11350  9 days  nshmyrev  Extract warped features during 000 stage if VTLN is enabled. See for detailsht
 archive_s3/  7289  4 years  egouvea  Fixed error message in decoder script reporting failure in bw, and made result d…
 cmuclmtk/  11035  10 months  nshmyrev  Fixes bug in wngram2idngram and adds a test for it
 cmudict/  11348  3 weeks  air  cleaned up documentation and code (a bit) recompiled the dict
 gst-sphinx/  7848  4 years  dhdfu  Support changing language models at runtime (maybe)
 htk2s3conv/  11336  6 weeks  nshmyrev  Adds warning about different number of mixtures
 jsgfparser/  7230  4 years  dhdfu  Fix the main program to output the only public rule if no rule is specified, and…
 logios/  11339  4 weeks  tkharris  remove duplicated code
 misc_scripts/  10147  22 months  dhdfu  handle zero references
 multisphinx/  10945  12 months  dhdfu  clean up better and introduce vocabulary maps
 pocketsphinx/  11351  8 days  nshmyrev  Updated lat2dot script. I need to move it to the other location though
 pocketsphinx-extra/  9972  2 years  dhdfu  add sc models with mixture_weights and mdef.txt files
 scons/  5868  5 years  egouvea  updated the scons support to reflect that plugin.jar is now part of the package
 share/  5532  6 years  egouvea  Setting dsp and dsw files to have have windows EOL regardless where it's downloa…
 sphinx2/  8767  3 years  egouvea  Updated the sphinx-2 MS files to MS .NET, consistent with the other packages, an…
 sphinx3/  11329  2 months  nshmyrev  Patch to solve memory issues in python module. See for detailshttps://bugzilla
 sphinx4/  11344  3 weeks  nshmyrev  Properly sets logger for AudioFileDataSource. Thanks to Bandele Ola.
 sphinx_fsttools/  10791  14 months  nshmyrev  Some bit in AM to FST conversion
 sphinxbase/  11346  3 weeks  nshmyrev  Properly select buffer size when using audioresample. Thanks to balkce See fo…
 tools/  9009  3 years  nshmyrev  Updated to the latest release of sphinx4
 web/  10249  21 months  nshmyrev  There is no sphinx3 development anymore
How exciting is that?  You got only 6 to 7 top level directories 7 years ago!
From now on, I will start to put more notes on different tools in the repository. 
The Grand Janitor