Category Archives: wfst

The 100th Post: Why The Grand Janitor's Blog?

Since I decided to revamp The Grand Janitor's Blog last December, it has been 100 posts. (I cheat a bit, so "not since then".)

It's funny to describe time with the number of articles you write.   In blogging though, that makes complete sense.

I have started several blogs in the past.  Only 2 of them survive (, Cumulomanic and "Start-Up Employees 333 weeks", both in Chinese) .  When you cannot maintain your blog for more than 50 posts, you blog just dies, or simply to disappear into oblivion.

Yet I make it.  So here's an important question to ask: what makes me keep on?

I believe the answer is very simple.  There is no bloggers so far who work on the niche of speech recognition: None on automatic speech recognition (ASR) systems, even though there was much progress.  None on engines, even much work has been done in open source.   None on applications, even great projects such as Simon was there.

Nor there were discussion on how open source speech recognition can be applied to the commercial world, even when there are dozens of companies are now based on Sphinx (e.g. my employer Voci,  EnglishCentral and Nexiwave ), and they are filling the startup space.

How about how the latest technology such as deep neural network (DNN) and weighted finite state transducers (WFST) would affect us?  I can see them in academic conferences, journals or sometimes tradeshows...... but not in a blog.

But blogging, which we all know, is probably the most prominent form of how people are getting news these days.

And news about speech recognition, once you understand them, is fascinating. 

The only blog which comes close is Nicholay's blog : nsh.   When I try to recover as a speech recognition programmer, nsh was a great help.  So thank you, Nick, thank you.

But there is only one nsh.  There are still have a lot of fascinating to talk about...... Right?

So probably the reason why I keep on working:  I want to invent something I want: a kind of information hub on speech recognition technology, commercial/open source, applications/engines, theory/implementations, the ideals/the realities.

I want to bring my unique perspective: I was in academia, in industrial research and now in the startup world so I know quite well people's mindsets in each group.

I also want to connect with all of you.  We are working on one of the most exciting technology in the world.   Not everyone understands that.  It will take time for all of us, to explain to our friends and families what speech recognition can really do and why it matters.

In any case, I hope you enjoy this blog.  Feel free to connect with me on Plus, LinkedIn and Twitter.


Learning vs Writing

I haven't done any serious writings for a week.  Mostly post interesting readings just to keep up the momentum.   Work is busy so I slowed down.  Another concern is what to write.   Some of the topics I have been writing such as Sphinx4 and SphinxTrain take a little bit of research to get them right.

So far I think I am on the right track.  There are not many bloggers on  speech recognition.  (Nick is an exception.)   To really increase awareness of how ASR is done in practice, blogging is a good way to go.

I also describe myself as "recovering" because there are couple of years I hadn't seriously thought about open source Sphinx.  In fact though I was working on speech related stuffs, I didn't spend too much time on mainstream ASR neither because my topic is too esoteric.

Not to say, there are many new technologies emerged in the last few years.   The major one I would say is the use of neural network in speech recognition.  It probably won't replace HMM soon but it is a mainstay for many sites already.   WFST, with more tutorial type of literature available, has become more and more popular.    In programming, Python now is a mainstay plus job-proof type of language.  The many useful toolkit such as scipy, nltk by themselves deserves book-length treatment.  Java starts to be like C++, a kind of necessary evil you need to learn.  C++ has a new standard.   Ruby is huge in the Web world and by itself is fun to learn.

All of these new technologies took me back to a kind of learning mode.   So some of my articles become longer and in more detail.   For now, they probably cater to only a small group of people in the world.   But it's okay, when you blog, you just want to build landmarks on the blogosphere.   Let people come to search for them and get benefit from it.   That's my philosophy of going on with this site.


Me and CMU Sphinx

As I update this blog more frequently, I noticed more and more people are directed to here.   Naturally,  there are many questions about some work in my past.   For example, "Are you still answering questions in CMUSphinx forum?"  and generally requests to have certain tutorial.  So I guess it is time to clarify my current position and what I plan to do in future.

Yes, I am planning to work on Sphinx again but no, I probably don't hope to be a maintainer-at-large any more.   Nick proves himself to be the most awesome maintainer in our history.   Through his stewardship, Sphinx prospered in the last couple of years.  That's what I hope and that's what we all hope.    
So for that reason, you probably won't see me much in the forum, answering questions.  Rather I will spend most of my time to implement, to experiment and to get some work done. 
There are many things ought to be done in Sphinx.  Here are my top 5 list:
  1. Sphinx 4 maintenance and refactoring
  2. PocketSphinx's maintenance
  3. An HTKbook-like documentation : i.e. Hieroglyphs. 
  4. Regression tests on all tools in SphinxTrain.
  5. In general, modernization of Sphinx software, such as using WFST-based approach.
This is not a small undertaking so I am planning to spend a lot of time to relearn the software.  Yes, you hear it right.  Learning the software.  In general, I found myself very ignorant in a lot of software details of Sphinx at 2012.   There are many changes.  The parts I really catch up are probably sphinxbase, sphinx3 and SphinxTrain.   One PocketSphinx and Sphinx4, I need to learn a lot. 
That is why in this blog, you will see a lot of posts about my status of learning a certain speech recognition software.   Some could be minute details.   I share them because people can figure out a lot by going through my status.   From time to time, I will also pull these posts together and form a tutorial post. 
Before I leave, let me digress and talk about this blog a little bit: other than posts on speech recognition, I will also post a lot of things about programming, languages and other technology-related stuffs.  Part of it is that I am interested in many things.  The other part is I feel working on speech recognition actually requires one to understand a lot of programming and languages.   This might also attract a wider audience in future. 
In any case,  I hope I can keep on.  And hope you enjoy my articles!