Category: pocketsphinx

Love the current trend that Sphinx is everywhere.

A look on Sphinx3’s initialization

Post author By grandjanitor
Post date February 18, 2013
No Comments on A look on Sphinx3’s initialization

I worked on Sphinx 3 a lot. In these days, it was generally regarded as an “old-style” recognizer as compared to Sphinx 4 and PocketSphinx. It is also not support officially by the SF’s guys.

Coders of speech recognition think a little bit different. They usually stick to a certain codebase which they feel comfortable with. For me, it is not just a personal preference, it also reflects how much I know about a certain recognizer. For example, I know quite a bit of how Sphinx 3 performs. In these days, I tried to learn how Sphinx 4 fare as well. So far, if you ask me to choose an accurate recognizer, I will still probably choose Sphinx 3, not because the search technology is better (Sphinx 4 is way superior), but because it can easily made to support several advanced modeling types. This seems to be how the 2010 developer meeting concluded as well.

But that was just me. In fact, I am bullish on all Sphinx recognizers. One thing I want to note is the power of Sphinx 4 in development. There are many projects are based on Sphinx 4. In these days, if you want to get a job on speech recognizer, knowing Sphinx 4 is probably a good ticket. That’s why I am quite keen on learning it more so hopefully I can write on both recognizers more.

In any case, this is a Sphinx 3’s article. I will probably write more on each components. Feel free to comments.

How Sphinx3 is initialized:

Here is a listing of function used on how Sphinx 3 is initialized I got from Sphinx 3.0.8. Essentially, there are 3 layers of initialization, kb_init, kbcore_init and s3_am_init. Separating kb_init and kbcore_init probably starts very early in Sphinx 3. Whereas separating s3_am_init from kbcore_init was probably from me. (So all blames on me.) That is to support -hmmdir.

 kb_init  
     -> kbcore_init (*)  
     -> beam_init  
     -> pl_init  
     -> fe_init  
     -> feat_array  
     -> stat_init  
     -> adapt_am_init  
     -> set operation mode  
     -> srch_init  
 kbcore_init  
     -> Look for feat.params very early on.   
     -> logmath_init  
     -> feat_init  
     -> s3_am_init (*)  
     -> cmn_init  
     -> dict_init  
     -> misc. models init  
       mgau_init such as  
       -> subvq_init  
       -> gs_read  
     -> lmset_init  
     -> fillpen_init  
     -> dict2pid_build <- Should put into search  
 s3_am_init   
     -> read_lda  
     -> read in mdef.   
     -> depends on -senmgau type  
       .cont. mgau_init  
       .s2semi. s2_semi_mgau_init  
           if (-kdtree)  
           s2_semi_mgau_load_kdtree  
       .semi or .s3cont.   
           ms_mgau_init  
     -> tmat_init

Note:

-hmmdir override all other sub-parameters.

Arthur

cmu sphinx grandjanitor hieroglyph HTK language pocketsphinx Programming Sphinx sphinx3 Sphinx4 sphinxbase sphinxtrain Thought wfst

Me and CMU Sphinx

As I update this blog more frequently, I noticed more and more people are directed to here. Naturally, there are many questions about some work in my past. For example, “Are you still answering questions in CMUSphinx forum?” and generally requests to have certain tutorial. So I guess it is time to clarify my current position and what I plan to do in future.

Yes, I am planning to work on Sphinx again but no, I probably don’t hope to be a maintainer-at-large any more. Nick proves himself to be the most awesome maintainer in our history. Through his stewardship, Sphinx prospered in the last couple of years. That’s what I hope and that’s what we all hope.

So for that reason, you probably won’t see me much in the forum, answering questions. Rather I will spend most of my time to implement, to experiment and to get some work done.

There are many things ought to be done in Sphinx. Here are my top 5 list:

Sphinx 4 maintenance and refactoring
PocketSphinx’s maintenance
An HTKbook-like documentation : i.e. Hieroglyphs.
Regression tests on all tools in SphinxTrain.
In general, modernization of Sphinx software, such as using WFST-based approach.

This is not a small undertaking so I am planning to spend a lot of time to relearn the software. Yes, you hear it right. Learning the software. In general, I found myself very ignorant in a lot of software details of Sphinx at 2012. There are many changes. The parts I really catch up are probably sphinxbase, sphinx3 and SphinxTrain. One PocketSphinx and Sphinx4, I need to learn a lot.

That is why in this blog, you will see a lot of posts about my status of learning a certain speech recognition software. Some could be minute details. I share them because people can figure out a lot by going through my status. From time to time, I will also pull these posts together and form a tutorial post.

Before I leave, let me digress and talk about this blog a little bit: other than posts on speech recognition, I will also post a lot of things about programming, languages and other technology-related stuffs. Part of it is that I am interested in many things. The other part is I feel working on speech recognition actually requires one to understand a lot of programming and languages. This might also attract a wider audience in future.

In any case, I hope I can keep on. And hope you enjoy my articles!

Arthur

Logios pocketsphinx Sphinx sphinx_fsttools

Start to look at the repository tree

Post author By grandjanitor
Post date May 2, 2012
1 Comment on Start to look at the repository tree

Programming as a profession is a a strange one. If you are a doctor, you can usually carry your knowledge and skills from one place to another provided that you have exactly the same tool. If you are a programmer, you speed and skill are partially determined by the tools you build in house for a particular place. So for example, I am not supposed to use any tool I built when I worked in the small video-advertising start-up. Even if I can do something in 1 second at that period of time, if I change my job, I will need to restart and rebuild the tool again. We are probably talking about days to rebuild the tool and weeks to refine it again.

There is one exception: if you worked in open source, much of your code would be stored in a public place. Even when you have left your job for long time, it is legit for you to use it again. You don’t have to solve the same problem again and again. This is the beauty of open source and I am greatly benefited by it personally.

As I start to regain my muscles in Sphinx, I start to notice that there are much changes in last 6 years. Just look at the top level of Subversion:

File	Rev.	Age	Author	Last log entry
Parent Directory
CLP/	10079	23 months	dhdfu	Finally add an -F argument to use the full path in the control file as the label…
PocketSphinxAndroidDemo/	11117	9 months	nshmyrev	Wrapper for nbest
SimpleLM/	22	12 years	rickyhoughton	Initial revision
Speech-Recognizer-SPX/	8933	3 years	nshmyrev	Update module to recent pocketsphinx API
SphinxTrain/	11350	9 days	nshmyrev	Extract warped features during 000 stage if VTLN is enabled. See for detailsht…
archive_s3/	7289	4 years	egouvea	Fixed error message in decoder script reporting failure in bw, and made result d…
cmuclmtk/	11035	10 months	nshmyrev	Fixes bug in wngram2idngram and adds a test for it
cmudict/	11348	3 weeks	air	cleaned up documentation and code (a bit) recompiled the dict
gst-sphinx/	7848	4 years	dhdfu	Support changing language models at runtime (maybe)
htk2s3conv/	11336	6 weeks	nshmyrev	Adds warning about different number of mixtures
jsgfparser/	7230	4 years	dhdfu	Fix the main program to output the only public rule if no rule is specified, and…
logios/	11339	4 weeks	tkharris	remove duplicated code
misc_scripts/	10147	22 months	dhdfu	handle zero references
multisphinx/	10945	12 months	dhdfu	clean up better and introduce vocabulary maps
pocketsphinx/	11351	8 days	nshmyrev	Updated lat2dot script. I need to move it to the other location though
pocketsphinx-extra/	9972	2 years	dhdfu	add sc models with mixture_weights and mdef.txt files
scons/	5868	5 years	egouvea	updated the scons support to reflect that plugin.jar is now part of the package
share/	5532	6 years	egouvea	Setting dsp and dsw files to have have windows EOL regardless where it’s downloa…
sphinx2/	8767	3 years	egouvea	Updated the sphinx-2 MS files to MS .NET, consistent with the other packages, an…
sphinx3/	11329	2 months	nshmyrev	Patch to solve memory issues in python module. See for detailshttps://bugzilla…
sphinx4/	11344	3 weeks	nshmyrev	Properly sets logger for AudioFileDataSource. Thanks to Bandele Ola.
sphinx_fsttools/	10791	14 months	nshmyrev	Some bit in AM to FST conversion
sphinxbase/	11346	3 weeks	nshmyrev	Properly select buffer size when using audioresample. Thanks to balkce See fo…
tools/	9009	3 years	nshmyrev	Updated to the latest release of sphinx4
web/	10249	21 months	nshmyrev	There is no sphinx3 development anymore

How exciting is that? You got only 6 to 7 top level directories 7 years ago!

From now on, I will start to put more notes on different tools in the repository.

The Grand Janitor