Visual Bookshelves

I love to read and like to write reviews for every books I read. None of them will change the world but it still loves to do it. That's why by definition - I'm a bookworm. Not even feel shy about it. ūüėČ

I go quite far: try to record every books I read on a blog and start to put them in a blog called "ContentGeek". Luckily, I haven't gone very far. Because once I discovered Visual Bookshelves, there is no need for me to do it all.

Visual Bookshelves allow users to look up a book from Amazon, add comments and stored it in a database. It also shows the book cover of the books. What else could I want more?

So anyway, this is the link of my visual bookshelves:



David's plan on Sphinx 3.7

A great read, it touches the heart of implementation issues of all sphinxen. And its criticism on my implementation a right straight to the point.

I felt very relieved when the current maintainer attack what I did in the past. (Some features I did were rather stupid.) This shows that Sphinx is still alive and will still be alive.


Life in Scanscout

Hi Guys,
Scanscout ( is a rather interesting company. . If you look at this blog, you probably know that I have been there for a while.

My direct supervisor doesn't like to give away too much. I think he has a point (as he is a *v* smart guy"). This contradicts to my philosophy of information sharing. So alright, as a compromise, here are couple of things I could share. (Of course, my estimate of the probably of anyone looking at this blog is about 1/10^9, so I guess it doesn't matter that much......)

1, We have a massage chair and it is awesome.
2, We have a foozball table and have a tournament every Friday. Beware, there are several good players. (I always get the lowest score.)
3, It is on the fore-front of video advertising. I am glad that I've joined. ūüôā

Arthur Chan



Ah. This is not exactly news. It has been around since 2006 John Hopkins workshop.

mosedecoder is probably the first open source statistical machine translation implementation in the world. For quite a while, only the IBM models training portion of the code could be found in GIZA++. So for people who is interested in SMT, they will probably turn to Pharaoh, a close source implementation available in the web.

I could have some fun. ūüėČ


Third Draft of Hieroglyphs

Hi all,

It has been a while I worked on the Hieroglyphs (the fancy name I made for sphinx documentation). This is perhaps the only things I haven't wrapped up in CMU. Therefore I decided to release a draft. You can find it


It still looks pretty messy but it starts to look like a book now.

Several chapters and sections were trimmed in this draft. You will still see a lot of ?. Those are signals of not enough proof-reading. Forgive me, when I have more time, I will try to fix some of them in near future.

Grand Janitor

Left CMU

Hi Guys,
It was a sad decision. After a long soul-searching, I decided to leave CMU and join a startup company called Scanscout. I must be out of mind!!

Anyway, my new job require knowledge in speech recognition, information retrieval and video processing. These are all good fit for me. I could tell you I have a lot of fun!

Sphinx, in particular the trio, Sphinx 3.X, SphinxTrain and CMULMTKV3 are now maintained by David Huggins-Daines and Evandro Gouvea. I still keep a nominal maintainership but these two are the true heros in the story now.

However, feel free to chat with me on anything related language processing. I am more than happy to be there.

Arthur Chan

Sphinx 3.6 is officially released

Sphinx 3.6 Official Release 
Sphinx 3.6 official release included all changes one found in Sphinx 3.6 RC I. 
From 3.6 RC I to 3.6 official:  
New Features:  
-Added support for sphinx 2-style semi-continuous HMM in Sphinx 3.  
-Added sphinx3_continuous which performs on-line decoding in both windows and linux platforms. 
-Synchronized the frontend with Sphinx2, adding implementation of VTLN. (i.e. -warp_type = inverse_linear, piewise_linear, affine)  
-Prefix "sphinx3_" has been added to programs align, allphone, astar, dag, decode, decode_anytopo, ep to avoid confusion in some unix systems. 
For Developers:  
-All public headers (*.h) are now put under $root/include instead of the same directories as their source .c file. 
-The directory name libutil is now changed to libs3util  
-Sphinx3, as well as all other modules in the CMU Sphinx 
project, is now versioned by Subversion. 
Bug Fixes:  
-[1459402] Serious memory relocation is fixed.  
-In RCI, -dither was not properly implemented, this has been fixed.  
Known Problem:  
-When the model contains nan, there will be abnormal output of the result. At this point, this issue is resolved in SphinxTrain 
Sphinx 3.6 Release Candidate I 
The corresponding SphinxTrain's tag is SPHINX3_6_CMU_INTERNAL_RELEASE 
One can check out the matching SphinxTrain of the sphinx3.6 release by command, 
svn co 
A Summary of Sphinx 3.6 RC I 
Sphinx 3.6 is a gently refactored version of Sphinx 3.5. Our programming is defensive and we only aim at further consolidation and unification our code-bases in Sphinx 3. 
Despite our programming is defensive, there are still several interesting and new features could be found in this 
release. Their details could be found in the "New Feature" 
section below. Here is a brief summary: 
1, Further speed-up of CIGMMS in the 4 level GMM Computation Schemes (4LGC) 
2, Multiple regression classes an MAP adaptation in SphinxTrain 
3, Better support in using LM in Sphinx 3.X.  
4, FSG search is now supported. This is adapted from Sphinx 2.  
5, Support of full triphone search in flat lexicon search.  
6, Some support of different character sets other than of 
Sphinx 3.X. Models in multiple languages are now tested in 
Sphinx 3.X. 
We hope you enjoy this release candidate. In future, we will 
continue to improve the quality of CMU Sphinx and CMU Sphinx's related software. 
New Features  
-Speaker Adaptation: 
a, Multiple regression class (phoneme-based) is now supported.  
-GMM Computation  
a, Improvements of CIGMMs is now incorporated.  
i, One could specify the upper limit of the number of CD 
senones to be computed in each frame by specifying -maxcdsenpf. 
ii, The best Gaussian index (BGI) are not stored and could 
be used as a mechanism to speed up GMM computation 
iii, tightening-factor (-tighten_factor) is introduced to 
smooth between fix naive down-sampling technique and CI-GMMS. 
b, Support of SCHMM and FCHMM 
i, decode will fully support computation of SCHMM.  
-Language Model  
a, reading an LM in ARPA text format is now supported. Users now have an option to by-pass the use of lm3g2dmp. 
b, live decoding API now supports switching of language models.  
c, full support of class-based LM. See also the Bug fixes section 
d, lm_convert is introduced. lm_convert supersede the functionalities of lm3g2dmp. Not only could lm_convert convert an LM from TXT format to DMP format. It could also do the reverse.  
This part will detail the change we make in different search 
In 3.6, collection of algorithms could all be used under a 
single executable decode. decode_anytopo is still reserved 
for backward compatibility purpose.  
Decode now support three modes of search.  
Mode 2 (FSG): (Adapted from Sphinx 2) FSG search.  
Mode 3 (FLAT): Flat-lexicon search. (The original search in decode_anytopo in 3.X (X < 6)) 
Mode 4 (TREE): Tree-lexicion search. (The original search in decode in 3.X (x<6)  
Some of these functionalities will only be applicable in one 
particular search. We will mark them with FSG, FLAT and TREE.  
a, One could now turn off -bt_wsil to control whether silence should be used as the ending word. (FLAT, TREE) 
b, In FLAT, full triphones could be used instead of 
multiplexed triphones.  
c, FSG is a new added routine in 3.6 which is adapted from 
Sphinx 2.5 
a, -dither is now supported in live_pretend and 
live_decode, the initial seed could always be set the command 
-seed. (Jerry Wolf will be very happy about this feature.) 
a, One could turn on built-in letter-to-sound rule in dict.c by using -lts_mismatch.  
b, current Sphinx 3.6 is tested to work on setup of English, 
Chinese Mandarin, French and English.  
c, changes in allphone: allphone can now generate a match and a matchseg just like decode* recognizers.  
Bug fixes 
-Miscellaneous memory leak fixed in the tree search (mode 4) 
-Initialization class-based LM routine use to switch the order of 
word insertion penalty and language model weight. This is now fixed.  
-Assertion generated vithist.c is now turn into an error 
message. Instead of causing the whole program stopped. The decoding will just fail for that sentnece. We suspect that this is the problem which caused possible wipe out of memory in Sphinx 
3.4 & 3.5  
-Number of CI phones could now be at most 32767 (instead of 127) 
-[1236322]: libutilstr2words special character bugs.  
Behavior Changes 
-Endpointer (ep) now used computation of s3 log.  
-Multi-stream GMM computation will not truncate the pdf to 8 bit anymore. This will avoid confusion of programmer. However 
-Except in allphone and align, When .cont. is used in 
-senmgau, the code will automatically turn to use fast GMM computation routine. To make sure the multiple-stream GMM computation will be in effect, one need to specify .s3cont. 
-executable dag hadn't accounted for the language weight. Now this issue is fixed. 
-(See Bug fixes also) decode will now return error message 
when vithist was fed in with history -1. Instead of asserting the problem. The recognizer will dump Warning message. Usually that means beam widths need to increase. 
Functions still under test  
-Encoding conversion in lm_convert.  
-LIUM contribution: LM could now represented as AT&T fsm format.  
Known bugs 
-Confidence estimation, the computations of forward and 
backward posterior probability have mismatch 
-In allphone, sometimes the scores generated in the matchseg file will have very low scores.  
-Regression test on second-stage search still have bugs.  
Corresponding changes in SphinxTrain 
Please note that SphinxTrain is distributed as a separate package and you can get it by: 
svn co 
-Support for generation of MAP, multiple-class MLLR.  
-Support for BBI tree generation

What are we up to in these days?

Well. couple of nasty issues in legacy Sphinx 3:
1, Putting Sphinx 3 to be used by some applications: first to put it work with the Galaxy/Communicator framework (David Huggins-Daines is behind that), then put it to work with the speech component we will contribute to the CALO Project (Yitao Sun is behind that.) A lot of check-ins are based on that.

2, For me, I was trying to make Sphinx 3 and CMU-Cambridge LM toolkit to work with vocabulary more than 65536 words. (The new limit is around 4 billion) The Sphinx 3's work is completed. The CMU-Cambridge LM toolkit requires me to essentially upgrade it.

You will ask, are there still any work on CMU-Cambridge LM Toolkit V2?
This is my answer, if you want something to happen, you just need to go forward to change it. This is true in Sphinx 3 and this is also true in CMU-Cambridge LM Toolkit V2. I have gathered code from Dave, Prof Yannick Esteve and couple of contributers. I definitely think some kind of alpha release will be there in May and June time frame.

Let us see how it goes. ūüôā