All posts by grandjanitor

Sphinx4 from a C background : Installation of Eclipse

That's another baby step but I guess Eclipse installation is much less painful these days.

When I used Eclipse back in 2008, it was rather difficult to download and install.   Part of the reason is that the software house I worked with didn't have a strong culture of documentation.

Downloading Eclipse Juno for Java Developer was pretty easy.  My next step is to incorporate Sphinx 4 directory and do a compilation.

Arthur

Sphinx4 from a C background : first few steps

As I set out earlier,  one of my goals is to grok all of the components.  I challenged myself to work with Java, which I feel less proficient than my C/C++/Python/Perl.

What should you think when you go from one language to another?  One and only one answer : don't make a judgement too early.  
For example, compilation of Sphinx4 takes 4 steps:
  1. Download and install JDE. 
  2. Download and install ant. 
  3. run ant
If you haven't used JDE, ant or never look at a build.xml, you would feel a bit overwhelmed.    But be patient, there are a lot of goodies of Java.  Most of them are very well thought in terms of software engineering. 
I followed the process.  Woa,  Sphinx 4 is now at beta 6 and it grows to 366 files.   Sounds like groking it will take some time then. 
So what would be your strategy if you want to go forward to understand a Java project such as Sphinx4?   My suggestion: download a good IDE such as Eclipse or NetBeans.
If you are like me, coming from a emacs background, learning Eclipse would take you sometime as well.   But again: don't make a judgement too early.  Eclipse is nice in its own way.  (At least it's not Visual X.....)    
Practically, using Eclipse to understand the code also has its advantage.  Unlike C-package organization, Java software usually has deep directory hierarchy.  Using emacs would definitely cause you more keystrokes.  The only exception I know of is JDEE.  That again will take you some setup time.
In any case, I got it started.  So, my next goal is to go through all materials of Sphinx 4 again.  This time I demand myself to grok.   I will start from the Sphinx 4 documentation page.  Then expand to source code-level of undersand. 
Arthur

Favorite words of the day (Dec 25, 2012)

English: avidity, glissade
Spanish: llevarse

From time to time, I will post my favorite word of the day.  Part of it is my personal record, part of it is my view on programming.  Most capable programmer I know actually know multiple languages and can discern differences between them.

More importantly, you would find the same word can mean differently in two languages.  Think false cognates such as "actualmente" (lately) and "actually" (really).

So if you have issues of differentiating usage of keywords in different programming languages. (Think "static".)  Then learning a different real language will be a way to help you.

Arthur

Installation of Python and Pygames

I was teaching my little brother on how to make a game.  Pygames naturally come to my mind as it is pretty easy to understand and program.

I have tried to use pygames on Ubuntu and Windows.  Both are fine.  On windows though, I found that using installers for both python and pygame is the simplest.  I was using python 2.7.  If you had installed pygame 1.7 or earlier, make sure you remove the pygame directory under existing installation before you install.

Arthur

Some Reflections on Programming Languages

This is actually a self-criticizing piece.  Oh well, but call it reflection doesn't hurt.

When I first started out in speech recognition, I have a notion that C++ is the best language in the world.  For daily work? "Unix commands such as cut, split work well. "  To take care of most of my processing issues, I used some badly written bash shell.  Around the middle of the grad school, I started to learn that perl is lovely for string processing.   Then I thought perl is the best language in the world, except it is a bit slow.

After C++ and perl, I then learned C, Java, Python.  A little bit of objective-C and sampled many other languages.   For now, I will settle on C and Perl are probably the two languages I am most proficient.  I also tend to like them the most.   There is one difference between me and the twenty-something me though - instead of arguing which language is the best, I will simply go to learn more about any programming language in the world.

Take C as an example, many would praise it to be the procedure language which is closest to the machine.  I love to use C and write a lot of my algorithms in C.  But when you need to maintain and extend a C codebase, it is a source of a pain because, there is no inherent inheritance mechanism to work with, so a programmer needs to implement their own class-implementation.  Many function pointers.  There is also no memory-checking, so an extra step of memory checking is necessary.  Debugging is also a special skill.

Take perl.  It is very useful in text processing and has very flexible syntax.   But this flexibility also makes perl script hard to read sometimes.    For example, for a loop, do you want to implement it as a foreach-loop or by a map?   Those confuse lesser programmers.  Also, when you try to maintain large scale project with perl, many programmers remark to me OOP in perl seems to "just organize the code better".

How about C++?  We love the templates, we love the structure.   In practice though, the standard changes all the time.  Most house fixes the compiler version to make sure their C++ source code compiled.

How about Java?  There is memory boundary checking.  After a year or two on a dot-com, I also learned that Tomcat servlet is a thing in web development.   It is also easy to learn and one mainstream programming language taught in school these days.  Those I dig.  What's the problem? You may say speed is an issue.  Wrong.  Many Java code can be optimized such that it is as fast as its C or C++ codebase.   The issue in practice is that the process of bytecode conversion is non-trivial to many.  That is why it raises doubts in a software team on whether the language is the cause of speed issues.  

For me, I also care about the fate of Java as an open language after Oracle bought Sun Microsystem.

How about Python?  I guess this is a language I know least about.  So far, it seems to take care of a lot of problems in perl. I found the regular expression takes some time to learn.  Though other than that, the language is quite easy to learn and quite nice to maintain.  I guess the only thing I would say it is the slight difference between different Python 2.X starts to annoy me.

I guess a more important point here:  every language has its strength and weakness.  In real life, you probably need to prepare to write the same algorithm in all languages you know.   So there is no room for you to say "Hey! Programming language A is better than programming language B. Wahahaha.  Since I am using A rather than B, I rock, you suck!"  No, rather you got to accept that writing in unfamiliar language is essential for tech person's life.

I learned this through my spiritual predecessor, Eric Thayer, who organized the source code of SphinxTrain.  He once said to me, (I rephrase here,) "Arguing about programming languages is one of the most stupidest thing in the world."

Those words enlightened me.

Perhaps that is why I have been reading "C Programming a Modern Approach", "The C++ Programming Language",  "Java in a Nutshell", "Programming Perl" and "Programming Python" from time to time because I never feel satisfy with my skills on any of them.  I hope to learn D and Go soon and make sure I am proficient in Objective-C soon.  It will take me a lifetime to learn them, but on something deep like programming, learning, other than arguing, seems to be a better strategy to go.

Arthur

Passion

For a period of time, getting up is a daunting thing to me.   You see...... computers used to be a tool to let me realize myself.  I like to work, play with one.  It was not a job.

Since when it is changed for me?  It was the time when I think of a computer to be solely a tool of making money.   That's how many people in the field think.  Programming is no longer a pursuit of skill.   It is a way to get higher salary, win programming competition and have bragging right on lunch table. Knowledge in speech recognition?  It is not to solve one of the biggest problem in human history.  It is for winning contracts from defense,  beating other sites and again bragging to your esteemed colleagues.   These sicken me.
In my view, it is fine to think of money issue.  In fact, everyone should take care of their own personal finance and have basic understanding of economics...... BUT......  It doesn't mean everything has to be driven solely money.   
Rather, everyone should have passion, which allows them to wake up everyday, not being daunted by the workload of the day, but think "Woa,  there are 10 cool things I want to do.  What should I work on today?" and feel excited about life. 
Arthur

Readings at Dec 18, 2012

From time to time, I will put interesting technology reading in my blog.   Enjoy.

  1. The value of typing code : By John Cook, after all these years, I got to concur that code I didn't type are not code that I grok. 
  2. The Founder's dilemma : Recommended by Joel Spolsky.  It sounds like an interesting book to check out as I am sick of overly qualitative statement in the startup world. 
  3. Tutorial on Python NLTK:  by Sujit Pal.  Python NLTK is something I want to check out for long time.  
  4. Pure Virtual Destructor in C++ : by Eli Bendersky.
  5. Dumping A C++ Object Memory Layout With Clang : by Eli Bendersky
Arthur

How to Ask Questions in the Sphinx Forum?

Many go to different open source toolkits to look for a ready-to-use speech recognizer, and seldom get what they want.   Many feel disappointed and curse that developers of open source speech recognizer just couldn't catch up with commercial product.   Few know why and few decide to write about the reason.

People in the field blame Hollywood for lion share of the problem.  Indeed, many people believe ASR should work similarly to scenes of Space Odyssey 2001 or Star Trek.   We are far far away from there.   You may say SIRI is getting close.  True.   But when you look closer, SIRI doesn't always get what you say right, her strength lies on the very intelligent response system.

Unlike compilers such as GCC, speech recognition toolkit such as the CMU Sphinx project HTK are toolkits.   The mathematical models these toolkits provided were trained and fit to certain group of samples. Whereas, applications such as Google Voice or SIRI gather 100 or even 1000 times more data when they train a model.   This is the fundamental reason why you don't get the premium recognition rate you think you entitled to.

Many people (me included) saw that as a problem.  Unfortunately, to collect clean transcribed data has always been a problem.   Voxforge is the only attempt I am aware of to resolve the issue.    They are still growing up but it will be a while they can collect enough data to rival with commercial applications.

* * *
Now what does that tell you when you ask questions in CMU Sphinx or other speech recognition forum?   For users who expect out-of-the-box super performance, I would say "Sorry, we are not there yet."  In fact, speech recognition, in general, is probably not in performance shown in the original Star Trek yet (that will require accent adaptation and very good noise cancellation since the characters seem to be able to use the recognizer any time they like).

How about many users who have a little bit (or much) programming background? I would say one thing important.  As a programmer, you probably get used to look at the code, understand what it's done, do something cute and feel awesome from time-to-time.  You can't do that if you seriously want to develop a speech recognition system.

Rather, you should think like a data analyst.  For example, when you feel the recognition rate is bad, what is your evidence?  What is your data set?  What is the size of your data set? If you have a set, can you share the set?   If you don't have numerical measure, have you at least use pencil or paper to mark down at least some results and some mistakes? Report them when you ask questions, then you will get useful answers back.

If you go to look at programming forum, many ask questions with the source such that people can repeat the problem easily.    Some even go further to pinpoint location of the problem.    This is probably what you want to do if you get stuck.

* * *

Before I end this post, let's also bring up the issue of how usually ASR problem is solved?  Like...... if you see performance is bad, what should you do?

Some speech recognition problems can be solved readily.  For example, if you try to recognize digit strings but only get one digit at a time, chances are your grammar was written incorrectly.  If you see completely crappy speech recognition performance, then I will first check if the front-end of decoder match exactly as the front-end used to train the models.

For the rest,  the strength of the model is really the issue.   So most of your time should spend on learning and understanding techniques of model improvement.    For example, do you want to collect data and boost up your acoustic model?  Or if you know more about the domain, can you crawl some text on the web and help your language model?   Those are the first ideas you should think about.

There are also an exoteric group of people in the world who ask a different question, "Can we use a different estimation algorithm to make the better?"  That is the basis of MMIE, MPE and MFE.   If you found yourself mathematically proficient (perhaps need to be very proficient......), then learning those techniques and implement some of them would help boosting up the performance as well.   What I mentioned such as MMIE are just the basics,  each site has their own specialized technique and you might want to know.

Of course, you normally don't have to think so deep.   Adding more data is usually the first step of ASR improvement.    If you start to think something advance and if you can,  please try to put your implementation somewhere public such that everyone in the world can try it out.   These are something small to do, but I believe if we keep on doing something small right, there will be a day we can make open source speech recognizers as the commercial ones.

Arthur