The Grand Janitor Blog V3 – Page 28 – Speech Recognition, Artificial Intelligence, and Random Musing of Arthur Chan

Developer’s meeting note at 2010

Post author By grandjanitor
Post date December 29, 2012
No Comments on Developer’s meeting note at 2010

This catches my eyes when I browse through CMUSphinx’s blog. That generally decides how the project will go.

http://cmusphinx.sourceforge.net/2010/03/development-meeting-notes/#more-157

Looks like resources is still an issue……

Where to start when tracing source code of a speech recognition toolkit?

Post author By grandjanitor
Post date December 29, 2012
2 Comments on Where to start when tracing source code of a speech recognition toolkit?

Modern speech recognition software are complicated piece of software. To understand it, you need to have some basic understanding of the principle of speech recognition, as well as some ideas on the programming language being used.

By now, you may hear a lot of people say they know about a speech recognizer. And by now, you probably realize that most of these people have absolutely no ideas what’s going on inside a recognizer. So if you are reading this blog message, you are probably telling yourself, “I might want to trace the codebase of some recognizers’ code.” Be it Sphinx, HTK, Julius, Kaldi or whatever codebase you are looking at.

For the above toolkits, I will say I only know in detail about Sphinx, probably a little bit about HTK’s HVite. But I won’t say the same for others. In fact, even in Sphinx, I only know intimately about Sphinx 3/SphinxTrain/sphinxbase triplet. So just like you, I hope to learn more.

So here it begs the question: how would you trace a speech recognition toolkit codebase? If you think it is easy, probably because you worked in speech recognition for a while and you probably shouldn’t read this post.

Let’s just use sphinx as an example, there are hundreds of files in each component of Sphinx. So where should you start? A blunt approach would be reading each of the file one by one. That’s not a smart the way. So here is a suggestion for you : focus on the following four things,

Viterbi algorithm
Workflow of training
Baum-Welch algorithm.
Estimation algorithms of language models.

When you know where the Viterbi algorithm is, you will soon figure out how the feature vector is generated. On the same vein: if you know where the Baum-Welch algorithm, you will probably know how the statistics are generated. If you know the workflow of the training, then you will understand the how the model is “evolved”. If you know how the language model is estimated, then you would have understanding of one of the most important heuristic of the search.

Some of you may protest, how about the front-end? Isn’t that important too? True, but not when you try to understand a codebase. For all practical purpose, a feature vector is just an N-dimensional vector. The waveform is just an NxT matrix. You can certainly do a lot of fancy things on this NxT matrix. But when you think of Viterbi and Baum-Welch, they probably just read the frames and then calculate Gaussian distribution. That’s pretty much it’s how much you want to know a front-end.

How about adaptation algorithms? That I think it’s important. But it should probably go after understanding of the major things in the code. Because no matter whether you are doing adaptation online or doing this in speaker adaptive training. It is something on top of the Baum-Welch algorithm. Some implementation stick adaptation within the Baum-Welch executable. There is certainly nothing wrong about it. But it is still a kind of add-on.

How about decoding API? Those are useful things to know but it is more important when you just need to write an application. For example, in Sphinx4, you just need to know how to call the Recognizer class. In sphinx3, live_decode is what you need to know. But only understanding those won’t give you too much insights of how the decoder really works.

How about the data structure? Those are sort of important and should be understood when you try to understand a certain algorithm. In the case of languages such as Java and C++, you should probably take notes on a custom-made data structure. Or whether the designer call a specific data structure libraries. Like Boost in C++.

I guess this pretty much sums it all. Now let me get back to one non-trivial item on the list, which is the workflow of training. Many of you might think that recognition systems differ from each other because they have different decoders. Dead wrong! As I stressed from time to time, they differ because they have different acoustic models and language models. So that’s why in many research labs, much effort was put on preserving the parameters and procedures of how models is trained. Much effort was also put to fine tuned this procedure.

On this part, I got to say open source speech recognition still has long long way to go. For starter, there is no much sharing of recipes among speech hobbyists. What many try to do is to search for a good model. If you don’t know how to train a model, you probably don’t even know how to improve it for you own project.

Arthur

beginner Eclipse installation jsapi package explorer problems windows Sphinx4 subclipse svn tutorial

Sphinx 4 from a C Background : Setting up Eclipse as the IDE

Post author By grandjanitor
Post date December 28, 2012
2 Comments on Sphinx 4 from a C Background : Setting up Eclipse as the IDE

This is another baby step on how one can learn about Sphinx 4. As I mentioned in the previous post, it is nicer to use an IDE when you use Java code. Since I have some exposure in Eclipse, I choose it as an example on how to setup a Sphinx 4 build.

Before I go on there were many posts, written by others, discuss the procedure. You may take a look of them as well.

Getting Start with Sphinx4 from “Speech Recognition Woe” written by “Amit S”
Setting up Development Environment from CMUSphinx Blog written by GSOCer. (If you know who write it, let me know so I can properly credit)

You will also need to know how to install JSAPI (link). It is crucial to get the compilation right.

Eclipse as a Development Environment

If you never use Eclipse before, it is a little bit like a more versatile version of Emacs. It’s major use is on Java but lately there are more and more people use it as IDE for C/C++ as well. Not to say there are more different development packages for different programming languages.

If you come from background such as emacs/vi development, one thing you need to know is that shortcuts are quite different from your current platform. That takes some time to adapt to but generally I think the advantage worth the cost.

Another thing you might want to be mentally prepare, Eclipse’s Java compilation doesn’t generate build log. Instead it will generate a list of errors in compilation. They are basically equivalent thing. Though, if you are used to Visual C++ type of IDE with an error log, you won’t get what you want.

To me, those are minor nuisances, using Eclipse to browse code has the extra advantage of readily-made documentation as well as a flatten structure. Those features will save you many keystrokes if compared to using vanilla emacs.

In my description, I am using Eclipse Juno. Hopefully it won’t change too much by the time you are compiling the code. Of course, if there is popular demand, I might write another post which describe later version of Eclipse as well.

The compilation in High Level

Building Sphinx 4 essentially means the following four tasks:

Downloading Sphinx4 source code
Install JSAPI.
Incorporate the proper libraries.
Do the build.

In my case, I slightly stumbled on 1, naturally, just like you, I was thinking “well, why JSAPI something separate from the codebase?” Of course, if you worked in Java before, there are many projects required you to build with external codebase. So I don’t think too bad.

So let me go through the procedure of the build.

Downloading Sphinx4 source code from Subclipse

A plain simple svn command is fine, downloading the tarball will give you a more stable version. I will suggest a more attractive option is to use SVN module of Eclipse, subclipse. To do that, you may want to follow “Downloading Subclipse” from Setting up Development Environment . (Notice that there was a typo in the post should be “tigris” instead “trigris” on the location field.)
Once you finished checking out Subclipse. Start a new Project

New -> Project -> SVN -> Checkout Projects from SVN

Choose “Create a New Repository Location”

Type https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx

Remember to only download trunk/sphinx4 (Note: there are many branches and location, for starter, you will be interested how the trunk look like.)

Once you check out the code, in your Package Explorer (Alt-Shift-Q -> P) will look like this.

Package Explorer View after code is check out from SVN

Now you might notice that there is a red question mark besides the sphinx4 project (I named it “sphinx4_grandjanitor” but you can name it whatever you want.) You might also notice that in your Problem screen, there are 2 errors :

Now this is really because lib/jsapi.jar wasn’t installed correctly. So the next step is to install jsapi.jar

Install JSAPI

I tried the install of both Windows Vista and Linux. In windows, go to sphinx4lib and type

> jsapi.exe

Then accept the license.

In Linux, in the same directory. do

> sh jsapi.sh

One common problem for Linux here: you need to install uudecode if you want to install jsapi. In that case, try to install sharutil. On Ubuntu, it works for me when I do

> apt-get install sharutil

At this point you should see your directory should have a file named jsapi.jar

Incorporate the proper libraries

This is another part which took me a while. Before you go on to configure your path, you need to do one more step to make to configure libraries. In Eclipse, right click you Sphinx4/lib directory and choose Refresh first. This will make jsapi.jar appears your Package Explorer. It should look like this:

When JSAPI.jar is properly installed

Then, you can change the build path, go to your project again, right click and choose Build Path -> Configure Build, Libraries, choose Add Jar, then add the libraries you need.

Now…. wait, what are the jar files we need again?

Yeah, so this is another place which can cause confusions. In fact, because Sphinx has expanded its code from time to time, so the answer of which jar files to add depends. As of Dec 28, 2012, you should add

junit
jsapi
js
fst

This list will likely to grow in future. I am also pretty sure you might need to do different things if you want to compile in a different setting or write your own code.

Do the build

In modern Eclipse, building should be automatic, what you should see should be 0 errors but many warnings. I generally don’t approve of warnings but as a developer, it’s pretty tough to eliminate them all.

Conclusion

There you have it, a little guide on Sphinx 4 compilation with Eclipse. Notice that this guide may or may not fit your purpose because I focus on downloading the code from Subclipse. Doing a Link Source should do the trick if you want to incorporate the code yourself. I might do another post later but the web has many articles described this already, you should be able to find a set of good instructions.

Arthur

Sphinx4 from a C background : first few steps

Sphinx4 from a C background : Installation of Eclipse

cvs git git-commit git-log git-push svn

A note on GIT author and committer

Post author By grandjanitor
Post date December 28, 2012
No Comments on A note on GIT author and committer

I considered GIT as a great improvement over CVS and SVN. SVN is okay if the codebase is not too large because SVN server sometimes get into lockup mode.

One thing I like about GIT is the differentiation between author and committer. The author is the original writer of the code. The committer would be the one who commits the code. This makes ownership and responsibility clearer.

(Some houses discourage to look at who commit a change and demand programmers to take care of the problems themselves. My one comment: mental constipation.)

So if you want to change the author of a commit. Do

> git commit -m “Your message” –author “Firstname Lastname ” yourfile.java

In GIT commit, you will see “Firstname Lastname” become the author, to look at the original committers, use

>git log –pretty=full ./yourfile.java

An obviously, this information can be push to a remote repository. Simple

>git push

should work.

Arthur

sound speech text processing Thought Unix visual programming

Pondering Unix Philosophy

Post author By grandjanitor
Post date December 28, 2012
No Comments on Pondering Unix Philosophy

These are two great articles by James Hague on text processing vs visual programming.

The Unix Philosophy and a Fear of Pixels
Living inside your own Black Box

His main point is visual programming is often dismissed because it is way more difficult than text processing. It is a little bit like a lot of “stupid” things in the world such as Windows programming. They are actually quite tough to do well.

On speech processing, I guess it is appropriate to think sound programming is tougher than text processing as well. You may even think in speech processing, no one come up with a generic “Sound User Interface” IDE yet.

Arthur

cmusphinx HTK Julius Speech Recognition voxforge

Speech Recognition vs SETI

Post author By grandjanitor
Post date December 27, 2012
4 Comments on Speech Recognition vs SETI

If you track news of CMUSphinx, you may notice that the Sourceforge guys start to distribute data through BitTorrent (link).

That’s a great move. One of the issues in ASR is the lack of machine power in training. To make a blunt example, it’s possible to squeeze extra performance by searching for the best training parameters. Not to say a lot of modern training techniques take some time to run.

I do recommend all of your help the effort. Again, me not involved at all, just feel that it is a great cause.

Of course, things in ASR are never easy so I want to give two subtle points about the whole distributed approach of training.

Improvement over the years?

First question you may ask, now does that mean, ASR can be like project such as SETI, which would automatically improve over the years? Not yet, ASR still has its unique challenge.

The major part I would see is how we can incrementally increase phonetically-balanced transcribed audio. Note that it is not just audio, but transcribed audio. Meaning: someone needs to go to listen to the audio, spending 5-10 times real time to write down what the audio really say word-by-word. All these transcriptions need to clean up and in a certain format.

This is what Voxforge tries to achieve and it’s not a small undertaking. Of course, comparing to the speed of the industry development, the progress is still too slow. The last time I heard, Google was training their acoustic model with 38000 hours of data. A WSJ corpus is a toy task compared to it.

Now, thinking in this way, let’s say if we want to build the best recognizer through open source, what is the bottleneck? I bet the answer doesn’t lie on machine power, whether we have enough transcribed data would be the key. So that’s something to ponder about.

(Added Dec 27, 2012, on the part of initial amount of data, Nickolay corrected me saying that amount of data from Sphinx is already in terms of 10000 hours. That includes “librivox recordings, transcribed podcasts, subtitled videos, real-life call recordings, voicemail messages”.

So it does sound like Sphinx has the amount of data which rivals commercial companies. I am very interested to see how we can train an acoustic model with that amount of data.)

We build it, they will come?

ASR is always shrouded with misunderstanding. Many believe it is a solved problem, many believe it is a unsolvable problem. 99.99% of world population are uninformed about the problem.

I bet a lot of people would be fascinated by SETI, which …. Woa …. allows you to communicated to unknown intelligent sentients in the universe. Rather than on ASR, which ….. Em ….. basically many regards as a source of satires/parodies these days.

So here comes another problem, the public don’t understand ASR enough to see it as an important problem. When you think about this more, this is a dangerous situation. Right now, couple of big companies control the resource of training cutting-edge speech recognizers. So let’s say in the futre everyone needs to talk with a machine in a daily basis. These big companies would be so powerful that they can control our daily life. To be honest to you, this thought haunts me from time to time.

I believe we should continue to spread information on how to properly use an ASR system. At the same time, continue to build application to show case ASR and let the public understand its inner-working. Unlike subatomic particle physics, HMM-based ASR is not that difficult to understand. On this part, I appreciate all the effort which are done by developers of CMUSphinx, HTK, Julius and all other open source speech recognition projects.

Conclusion

I love the recent move of Sphinx spreading acoustic data using BitTorrent, it is another step to work towards a self-improving speech recognition system. There are still things we need to ponder in the open source speech community. I mentioned a couple, feel free to bring up more in the comment section.

Arthur

word of the day

Words of the Day (Dec 27, 2012)

Post author By grandjanitor
Post date December 27, 2012
No Comments on Words of the Day (Dec 27, 2012)

decathect, stridulant

You can probably say, “He decathect from Mary by making a stridulant voice all the time.”

Arthur

C++ pycparser

Readings at Dec 27, 2012

I have been thinking of playing with C code analysis for a while. Then I stumble on Eli Bendersky’s pycparser, I guess I will have some fun to play with it.

Also strongly recommend everyone to read his stuffs which I found highly informative.

Arthur

cmu sphinx grandjanitor hieroglyph HTK language pocketsphinx Programming Sphinx sphinx3 Sphinx4 sphinxbase sphinxtrain Thought wfst

Me and CMU Sphinx

As I update this blog more frequently, I noticed more and more people are directed to here. Naturally, there are many questions about some work in my past. For example, “Are you still answering questions in CMUSphinx forum?” and generally requests to have certain tutorial. So I guess it is time to clarify my current position and what I plan to do in future.

Yes, I am planning to work on Sphinx again but no, I probably don’t hope to be a maintainer-at-large any more. Nick proves himself to be the most awesome maintainer in our history. Through his stewardship, Sphinx prospered in the last couple of years. That’s what I hope and that’s what we all hope.

So for that reason, you probably won’t see me much in the forum, answering questions. Rather I will spend most of my time to implement, to experiment and to get some work done.

There are many things ought to be done in Sphinx. Here are my top 5 list:

Sphinx 4 maintenance and refactoring
PocketSphinx’s maintenance
An HTKbook-like documentation : i.e. Hieroglyphs.
Regression tests on all tools in SphinxTrain.
In general, modernization of Sphinx software, such as using WFST-based approach.

This is not a small undertaking so I am planning to spend a lot of time to relearn the software. Yes, you hear it right. Learning the software. In general, I found myself very ignorant in a lot of software details of Sphinx at 2012. There are many changes. The parts I really catch up are probably sphinxbase, sphinx3 and SphinxTrain. One PocketSphinx and Sphinx4, I need to learn a lot.

That is why in this blog, you will see a lot of posts about my status of learning a certain speech recognition software. Some could be minute details. I share them because people can figure out a lot by going through my status. From time to time, I will also pull these posts together and form a tutorial post.

Before I leave, let me digress and talk about this blog a little bit: other than posts on speech recognition, I will also post a lot of things about programming, languages and other technology-related stuffs. Part of it is that I am interested in many things. The other part is I feel working on speech recognition actually requires one to understand a lot of programming and languages. This might also attract a wider audience in future.

In any case, I hope I can keep on. And hope you enjoy my articles!

Arthur

beginner Eclipse Sphinx4 tutorial

Sphinx4 from a C background : Installation of Eclipse

Post author By grandjanitor
Post date December 27, 2012
2 Comments on Sphinx4 from a C background : Installation of Eclipse

That’s another baby step but I guess Eclipse installation is much less painful these days.

When I used Eclipse back in 2008, it was rather difficult to download and install. Part of the reason is that the software house I worked with didn’t have a strong culture of documentation.

Downloading Eclipse Juno for Java Developer was pretty easy. My next step is to incorporate Sphinx 4 directory and do a compilation.

Arthur