Category Archives: java

Learning vs Writing

I haven't done any serious writings for a week.  Mostly post interesting readings just to keep up the momentum.   Work is busy so I slowed down.  Another concern is what to write.   Some of the topics I have been writing such as Sphinx4 and SphinxTrain take a little bit of research to get them right.

So far I think I am on the right track.  There are not many bloggers on  speech recognition.  (Nick is an exception.)   To really increase awareness of how ASR is done in practice, blogging is a good way to go.

I also describe myself as "recovering" because there are couple of years I hadn't seriously thought about open source Sphinx.  In fact though I was working on speech related stuffs, I didn't spend too much time on mainstream ASR neither because my topic is too esoteric.

Not to say, there are many new technologies emerged in the last few years.   The major one I would say is the use of neural network in speech recognition.  It probably won't replace HMM soon but it is a mainstay for many sites already.   WFST, with more tutorial type of literature available, has become more and more popular.    In programming, Python now is a mainstay plus job-proof type of language.  The many useful toolkit such as scipy, nltk by themselves deserves book-length treatment.  Java starts to be like C++, a kind of necessary evil you need to learn.  C++ has a new standard.   Ruby is huge in the Web world and by itself is fun to learn.

All of these new technologies took me back to a kind of learning mode.   So some of my articles become longer and in more detail.   For now, they probably cater to only a small group of people in the world.   But it's okay, when you blog, you just want to build landmarks on the blogosphere.   Let people come to search for them and get benefit from it.   That's my philosophy of going on with this site.

Arthur

Sphinx 4 from a C background : Material for Learning Sphinx 4

I have been quite focused on SphinxTrain lately.   So I haven't touched Sphinx 4 for a while.   As I have one afternoon which I can use with leisure (not really), so I decide to take a look of some basic material again.

Sphinx-4, as a recognizer, is interesting piece software to me, a recovering recognizer programmer.  It seems remote but oddly familiar.   It is sort of a dream-land for experimenting different decoding strategies.   During Sphinx 3.5 to 3.7, I tried to make Sphinx 3.X to be more generalized in terms of search.  Those effort was tough mainly because the programs were in C.  As you might guess, those modification requires much reinvention of a lot of good software engineering mechanisms (such as class).

Sphinx-4 is now widely studied.  There are many projects using Sphinx-4 and its architecture is analyzed in many sites.   That's why I have abundant amount of material to learn the recognizer.  (Yay! πŸ™‚ )

Here are the top 5 pages in my radar now and I am going to study them in detail:

  1. Introduction :  What Sphinx-4 is? And how to use it. 
  2. Sphinx 4 Application Programmer Guide : What excites me is model switching capability.  I also love the way the current recognizer can be linked to multiple languages. 
  3. Configuration Manager :  That's an interesting part as well.   That is a recognizer which is configurable for every components.   Is it a good thing?  There are pros and cons about a hierarchical configuration system.  But for most of the time, I think that's a better way than flat command-line structure. 
  4. Instrumentation : How to test the decoder with examples on TIDIGITS and many more database. 
  5. FAQ: Here is a list of questions which make me curious. 
  6. The White Paper : Extremely illuminating,  I also appreciate the scholarship when they compare different versions of Sphinxes. 
  7. The 2003 paper: I haven't gone through this one yet but it's certainly something I want to check out. 

Arthur

Previous related articles:
Sphinx4 from a C background : first few steps
Sphinx4 from a C background : Installation of Eclipse
Sphinx4 from a C Background : Setting up Eclipse 

Where to start when tracing source code of a speech recognition toolkit?

Modern speech recognition software are complicated piece of software.  To understand it, you need to have some basic understanding of the principle of speech recognition, as well as some ideas on the programming language being used.

By now, you may hear a lot of people say they know about a speech recognizer.   And by now, you probably realize that most of these people have absolutely no ideas what's going on inside a recognizer.   So if you are reading this blog message, you are probably telling yourself, "I might want to trace the codebase of some recognizers' code." Be it Sphinx, HTK, Julius, Kaldi or whatever codebase you are looking at.

For the above toolkits, I will say I only know in detail about Sphinx,  probably a little bit about HTK's HVite.  But I won't say the same for others.  In fact, even in Sphinx, I only know intimately about Sphinx 3/SphinxTrain/sphinxbase triplet.   So just like you, I hope to learn more.

So here it begs the question: how would you trace a speech recognition toolkit codebase? If you think it is easy, probably because you worked in speech recognition for a while and you probably shouldn't read this post.

Let's just use sphinx as an example, there are hundreds of files in each component of Sphinx.   So where should you start?    A blunt approach would be reading each of the file one by one.   That's not a smart the way.   So here is a suggestion for you : focus on the following four things,

  1. Viterbi algorithm
  2. Workflow of training
  3. Baum-Welch algorithm. 
  4. Estimation algorithms of language models. 
When you know where the Viterbi algorithm is, you will soon figure out how the feature vector is generated.  On the same vein: if you know where the Baum-Welch algorithm, you will probably know how the statistics are generated.   If you know the workflow of the training, then you will understand the how the model is "evolved".   If you know how the language model is estimated, then you would have understanding of one of the most important heuristic of the search. 
Some of you may protest, how about the front-end? Isn't that important too?  True, but not when you try to understand a codebase.  For all practical purpose, a feature vector is just an N-dimensional vector.  The waveform is just an NxT matrix.   You can certainly do a lot of fancy things on this NxT matrix.   But when you think of Viterbi and Baum-Welch, they probably just read the frames and then calculate Gaussian distribution.  That's pretty much it's how much you want to know a front-end. 
How about adaptation algorithms?  That I think it's important.   But it should probably go after understanding of the major things in the code.   Because no matter whether you are doing adaptation online or doing this in speaker adaptive training.  It is something on top of the Baum-Welch algorithm.   Some implementation stick adaptation within the Baum-Welch executable.  There is certainly nothing wrong about it.   But it is still a kind of add-on. 
How about decoding API?  Those are useful things to know but it is more important when you just need to write an application.  For example, in Sphinx4, you just need to know how to call the Recognizer class.  In sphinx3, live_decode is what you need to know.   But only understanding those won't give you too much insights of how the decoder really works. 
How about the data structure?  Those are sort of important and should be understood when you try to understand a certain algorithm.   In the case of languages such as Java and C++, you should probably take notes on a custom-made data structure.  Or whether the designer call a specific data structure libraries.  Like Boost in C++. 
I guess this pretty much sums it all.  Now let me get back to one non-trivial item on the list, which is the workflow of training.   Many of you might think that recognition systems differ from each other because they have different decoders.  Dead wrong!  As I stressed from time to time, they differ because they have different acoustic models and language models.  So that's why in many research labs, much effort was put on preserving the parameters and procedures of how models is trained.  Much effort was also put to fine tuned this procedure.  
On this part,  I got to say open source speech recognition still has long long way to go.  For starter, there is no much sharing of recipes among speech hobbyists.   What many try to do is to search for a good model.   If you don't know how to train a model, you probably don't even know how to improve it for you own project.   
Arthur

Sphinx4 from a C background : first few steps

As I set out earlier,  one of my goals is to grok all of the components.  I challenged myself to work with Java, which I feel less proficient than my C/C++/Python/Perl.

What should you think when you go from one language to another?  One and only one answer : don't make a judgement too early.  
For example, compilation of Sphinx4 takes 4 steps:
  1. Download and install JDE. 
  2. Download and install ant. 
  3. run ant
If you haven't used JDE, ant or never look at a build.xml, you would feel a bit overwhelmed.    But be patient, there are a lot of goodies of Java.  Most of them are very well thought in terms of software engineering. 
I followed the process.  Woa,  Sphinx 4 is now at beta 6 and it grows to 366 files.   Sounds like groking it will take some time then. 
So what would be your strategy if you want to go forward to understand a Java project such as Sphinx4?   My suggestion: download a good IDE such as Eclipse or NetBeans.
If you are like me, coming from a emacs background, learning Eclipse would take you sometime as well.   But again: don't make a judgement too early.  Eclipse is nice in its own way.  (At least it's not Visual X.....)    
Practically, using Eclipse to understand the code also has its advantage.  Unlike C-package organization, Java software usually has deep directory hierarchy.  Using emacs would definitely cause you more keystrokes.  The only exception I know of is JDEE.  That again will take you some setup time.
In any case, I got it started.  So, my next goal is to go through all materials of Sphinx 4 again.  This time I demand myself to grok.   I will start from the Sphinx 4 documentation page.  Then expand to source code-level of undersand. 
Arthur

Some Reflections on Programming Languages

This is actually a self-criticizing piece.  Oh well, but call it reflection doesn't hurt.

When I first started out in speech recognition, I have a notion that C++ is the best language in the world.  For daily work? "Unix commands such as cut, split work well. "  To take care of most of my processing issues, I used some badly written bash shell.  Around the middle of the grad school, I started to learn that perl is lovely for string processing.   Then I thought perl is the best language in the world, except it is a bit slow.

After C++ and perl, I then learned C, Java, Python.  A little bit of objective-C and sampled many other languages.   For now, I will settle on C and Perl are probably the two languages I am most proficient.  I also tend to like them the most.   There is one difference between me and the twenty-something me though - instead of arguing which language is the best, I will simply go to learn more about any programming language in the world.

Take C as an example, many would praise it to be the procedure language which is closest to the machine.  I love to use C and write a lot of my algorithms in C.  But when you need to maintain and extend a C codebase, it is a source of a pain because, there is no inherent inheritance mechanism to work with, so a programmer needs to implement their own class-implementation.  Many function pointers.  There is also no memory-checking, so an extra step of memory checking is necessary.  Debugging is also a special skill.

Take perl.  It is very useful in text processing and has very flexible syntax.   But this flexibility also makes perl script hard to read sometimes.    For example, for a loop, do you want to implement it as a foreach-loop or by a map?   Those confuse lesser programmers.  Also, when you try to maintain large scale project with perl, many programmers remark to me OOP in perl seems to "just organize the code better".

How about C++?  We love the templates, we love the structure.   In practice though, the standard changes all the time.  Most house fixes the compiler version to make sure their C++ source code compiled.

How about Java?  There is memory boundary checking.  After a year or two on a dot-com, I also learned that Tomcat servlet is a thing in web development.   It is also easy to learn and one mainstream programming language taught in school these days.  Those I dig.  What's the problem? You may say speed is an issue.  Wrong.  Many Java code can be optimized such that it is as fast as its C or C++ codebase.   The issue in practice is that the process of bytecode conversion is non-trivial to many.  That is why it raises doubts in a software team on whether the language is the cause of speed issues.  

For me, I also care about the fate of Java as an open language after Oracle bought Sun Microsystem.

How about Python?  I guess this is a language I know least about.  So far, it seems to take care of a lot of problems in perl. I found the regular expression takes some time to learn.  Though other than that, the language is quite easy to learn and quite nice to maintain.  I guess the only thing I would say it is the slight difference between different Python 2.X starts to annoy me.

I guess a more important point here:  every language has its strength and weakness.  In real life, you probably need to prepare to write the same algorithm in all languages you know.   So there is no room for you to say "Hey! Programming language A is better than programming language B. Wahahaha.  Since I am using A rather than B, I rock, you suck!"  No, rather you got to accept that writing in unfamiliar language is essential for tech person's life.

I learned this through my spiritual predecessor, Eric Thayer, who organized the source code of SphinxTrain.  He once said to me, (I rephrase here,) "Arguing about programming languages is one of the most stupidest thing in the world."

Those words enlightened me.

Perhaps that is why I have been reading "C Programming a Modern Approach", "The C++ Programming Language",  "Java in a Nutshell", "Programming Perl" and "Programming Python" from time to time because I never feel satisfy with my skills on any of them.  I hope to learn D and Go soon and make sure I am proficient in Objective-C soon.  It will take me a lifetime to learn them, but on something deep like programming, learning, other than arguing, seems to be a better strategy to go.

Arthur