Category Archives: Dragon

Apology, Updates and Misc.

There are some questions on LinkedIn about the whereabouts of this blog.   As you may notice, I haven't done any updates for a while.   I was crazy busy by work in Voci (Good!) and many life challenges, just like everyone.    Having a lot of fun with programming, as I am working with two of my most favorite languages - C and Python.  Life is not bad at all.

My apology to all readers though, it could be tough to blog sometimes.  Hopefully, this situation will change later this year.....

Couple of worthwhile news in ASR,  Goldman-Sach won the trial in the Dragon law suit.  There is also the VB's piece of MS doubling up speed in their recognizer.

I don't know how to make out of the lawsuit but only feel a bit sad.  Dragon has been the homes of many elite speech programmers/developers/researchers.  Many old-timers of speech were there.   Most of them sigh about the whole L&H fiasco.   If I were them, I would feel the same too.   In fact, once you know a bit of ASR history, you would notice that the fall of L&H gave rise to one you-know-its-name player nowadays.  So in a way, the fate of two generations of ASR guys are altered.

As for the MS piece, we are following another trend these days, which is the emergence of DBN.  Is it surprising?  Probably not, it's rather easy to speed up neural network calculation.  (Training is harder, but that's what DBN is strong compared to previous NN approach.)

On Sphinx, I will point out one recent bug contributed by Ricky Chan, which exposed a problem in bw's MMIE training.   I am yet to try it but I believe Nick has already incorporated into the open-source code base.

Another items which Nick has been stressing lately is to use python, instead of perl, as the scripting language of SphinxTrain.   I think that's a good trend.  I like perl and use one-liner, map/grep type of program a lot.  Generally though, it's hard to find a concrete coding standard for perl.   Whereas python seems to be cleaner and naturally lead to OOP.  This is an important issue - perl programmers and perl programming style seems to be spawned from many different type of languages.   The original (bad) C programmer would fondly use globals and write functions with 10 arguments.  The original C++ programmer might expect language support on OOP but find that "it is just a hash".   These style difference could make perl training script hard to maintain.

That's why I like python more.  Even very bad script seems to convert itself to more maintainable script.   There is also a good pathway for python/C connect.  (Cython is probably the best.)

In any case, that's what I have this time.  I owe all of you many articles.  Let's see if I can write some in the near future.


GJB Wednesday Speech-related Links/Commentaries (DragonTV, Siri vs Xiao i Robot, Coding with Voice)

ZhiZhen company (智臻網絡科技) from Shanghai is suing Apple for infringing their patents.  (The original Shanghai Daily article) From the news, back in 2006, ZhiZhen has already developed the engine for Xiao i Robot (小i機械人).  A video 8 months ago (as below). 
Technically, it is quite possible that a Siri-like system can be built at 2006.  (Take a Look at Olympus/Ravenclaw.)  Of course, the Siri-like interface you see here is certainly built in the advent of smartphone (, which by my definition, after iPhone is released).   So overall speaking, it's a bit hard to say who is right.  
Of course, when interpreting news from China, it's tempting to use slightly different logic. In the TC article, OP (Etherington) suggested that the whole lawsuit could be state-orchestrated. It could be related to recent Beijing's attack of Apple. 
I don't really buy the OP's argument, Apples is constantly sued in China (or over the world).  It is hard to link the two events together.  
This is definitely not the Siri for TV.

Oh well, Siri is not just speech recognition, there is also the smart interpretation in the sentence level: scheduling, making appointments, do the right search.   Those by themselves are challenges.    In fact, I believe Nuance only provides the ASR engine for Apple. (Can't find the link, I read it from Matthew Siegler.)

In the scenario of TV,  what annoys users most are probably switching channels and  searching programs.  If I built a TV, I would also eliminate the any set-top boxes. (So cable companies will hate me a lot). 
With the technology profile of all big companies, Apple seems to own all technologies need.  It also takes quite a lot of design (with taste) to realize such a device. 

Using Python to code by Voice

Here is an interesting look of how ASR can be used in coding.   Some notes/highlights:

  • The speaker, Travis Rudd, had RSI 2 years ago.  After a climbing accident, He decided to code using voice instead.  Now his RSI is recovered, he claims he is still using it for 40-60%. 
  • 2000 voice commands, which are not necessarily English words.   The author used Dragonfly to control emacs in windows.
  • How does variables work?  Turns out most variables are actually English phrases. There are specific commands to get these phrases delimited by different characters. 
  • The speaker said "it's not very hard" for others to repeat.  I believe there will be some amount of customizations.  It takes him around 3 months.  That's pretty much how much time a solution engineer needs to take to tune an ASR system. 
  • The best language to program in voice : Lisp. 
One more thing.   Rudd also believe it will be very tough to do the same thing with CMUSphinx.  
Ah...... models, models, models. 

Earlier on Grand Janitor's Blog

Some quick notes on what a "Good training system" should look like: (link).
GJB reaches the 100th post! (link)

January 2013 Write-up

Miraculously, I still have some momentum for this blog and I have kept on the daily posting schedule.

Here is a write up for this month:  Feel free to look at this post on how I plan to write this blog:

Some Vision of the Grand Janitor's Blog

Sphinx' Tutorials and Commentaries

SphinxTrain1.07's bw:

Commentary on SphinxTrain1.07's bw (Part I)
Commentary on SphinxTrain1.07's bw (Part II)

Part I describes the high-level layout, Part II and describe half the state network was built.

Acoustic Score and Its Sign
Subword Units and their Occasionally Non-Trivial Meanings

Sphinx 4 from a C background : Material for Learning


Goldman Sachs not Liable
Aaron Swartz......

Other writings:

On Kurzweil : a perspective of an ASR practitioner



Speech-related Readings at Jan 30, 2013

Amazon acquired Ivona:

I am aware of Amazon's involvement in ASR.   Though it's a question on the domain.

Goldman-Dragon Trial:

I simply hope Dr. Baker has a closure on the whole thing.   In fact, when you think about it,  the whole L&H fallout is the reason why the ASR industry has a virtual monopoly now.  So if you are interested in ASR, you should be concerned.