Category Archives: tutorial

Sphinx 4 from a C background : Material for Learning Sphinx 4

I have been quite focused on SphinxTrain lately.   So I haven't touched Sphinx 4 for a while.   As I have one afternoon which I can use with leisure (not really), so I decide to take a look of some basic material again.

Sphinx-4, as a recognizer, is interesting piece software to me, a recovering recognizer programmer.  It seems remote but oddly familiar.   It is sort of a dream-land for experimenting different decoding strategies.   During Sphinx 3.5 to 3.7, I tried to make Sphinx 3.X to be more generalized in terms of search.  Those effort was tough mainly because the programs were in C.  As you might guess, those modification requires much reinvention of a lot of good software engineering mechanisms (such as class).

Sphinx-4 is now widely studied.  There are many projects using Sphinx-4 and its architecture is analyzed in many sites.   That's why I have abundant amount of material to learn the recognizer.  (Yay! πŸ™‚ )

Here are the top 5 pages in my radar now and I am going to study them in detail:

  1. Introduction :  What Sphinx-4 is? And how to use it. 
  2. Sphinx 4 Application Programmer Guide : What excites me is model switching capability.  I also love the way the current recognizer can be linked to multiple languages. 
  3. Configuration Manager :  That's an interesting part as well.   That is a recognizer which is configurable for every components.   Is it a good thing?  There are pros and cons about a hierarchical configuration system.  But for most of the time, I think that's a better way than flat command-line structure. 
  4. Instrumentation : How to test the decoder with examples on TIDIGITS and many more database. 
  5. FAQ: Here is a list of questions which make me curious. 
  6. The White Paper : Extremely illuminating,  I also appreciate the scholarship when they compare different versions of Sphinxes. 
  7. The 2003 paper: I haven't gone through this one yet but it's certainly something I want to check out. 

Arthur

Previous related articles:
Sphinx4 from a C background : first few steps
Sphinx4 from a C background : Installation of Eclipse
Sphinx4 from a C Background : Setting up Eclipse 

Sphinx 4 from a C Background : Setting up Eclipse as the IDE

This is another baby step on how one can learn about Sphinx 4.   As I mentioned in the previous post,  it is nicer to use an IDE when you use Java code.  Since I have some exposure in Eclipse, I choose it as an example on how to setup a Sphinx 4 build.

Before I go on there were many posts, written by others, discuss the procedure.  You may take a look of them as well.

You will also need to know how to install JSAPI (link).  It is crucial to get the compilation right. 

Eclipse as a Development Environment

If you never use Eclipse before, it is a little bit like a more versatile version of Emacs.   It's major use is on Java but lately there are more and more people use it as IDE for C/C++ as well.  Not to say there are more different development packages for different programming languages. 
If you come from background such as emacs/vi development, one thing you need to know is that shortcuts are quite different from your current platform.  That takes some time to adapt to but generally I think the advantage worth the cost.
Another thing you might want to be mentally prepare, Eclipse's Java compilation doesn't generate build log.  Instead it will generate a list of errors in compilation.   They are basically equivalent thing.  Though, if you are used to Visual C++ type of IDE with an error log, you won't get what you want.  
To me, those are minor nuisances, using Eclipse to browse code has the extra advantage of readily-made documentation as well as a flatten structure.  Those features will save you many keystrokes if compared to using vanilla emacs. 
In my description, I am using Eclipse Juno.  Hopefully it won't change too much by the time you are compiling the code.  Of course, if there is popular demand, I might write another post which describe later version of Eclipse as well.

The compilation in High Level

Building Sphinx 4 essentially means the following four tasks:
  1. Downloading Sphinx4 source code
  2. Install JSAPI.
  3. Incorporate the proper libraries. 
  4. Do the build. 
In my case, I slightly stumbled on 1, naturally, just like you, I was thinking "well, why JSAPI something separate from the codebase?"  Of course, if you worked in Java before, there are many projects required you to build with external codebase.  So I don't think too bad. 
So let me go through the procedure of the build. 
Downloading Sphinx4 source code from Subclipse
  • A plain simple svn command is fine, downloading the tarball will give you a more stable version.  I will suggest a more attractive option is to use SVN module of Eclipse, subclipse.   To do that, you may want to follow "Downloading Subclipse" from Setting up Development Environment .   (Notice that there was a typo in the post should be "tigris" instead "trigris" on the location field.) 
  • Once you finished checking out Subclipse.  Start a new Project 
    • New -> Project -> SVN -> Checkout Projects from SVN
  • Choose "Create a New Repository Location"
  • Remember to only download trunk/sphinx4 (Note: there are many branches and location, for starter, you will be interested how the trunk look like.)
Once you check out the code, in your Package Explorer (Alt-Shift-Q -> P) will look like this. 

Package Explorer View after code is check out from SVN
Now you might notice that there is a red question mark besides the sphinx4 project (I named it "sphinx4_grandjanitor" but you can name it whatever you want.) You might also notice that in your Problem screen, there are 2 errors :
Now this is really because lib/jsapi.jar wasn't installed correctly.  So the next step is to install jsapi.jar

Install JSAPI

I tried the install of both Windows Vista and Linux.  In windows, go to sphinx4lib and type

> jsapi.exe

Then accept the license.

In Linux, in the same directory.  do

> sh jsapi.sh

One common problem for Linux here: you need to install uudecode if you want to install jsapi.  In that case, try to install sharutil.  On Ubuntu, it works for me when I do

> apt-get install sharutil 

At this point you should see your directory should have a file named jsapi.jar

Incorporate the proper libraries

This is another part which took me a while.  Before you go on to configure your path, you need to do one more step to make to configure libraries.   In Eclipse, right click you Sphinx4/lib directory and choose Refresh first.  This will make jsapi.jar appears your Package Explorer.  It should look like this:

When JSAPI.jar is properly installed

Then, you can change the build path, go to your project again, right click and choose Build Path -> Configure Build, Libraries, choose Add Jar, then add the libraries you need.

Now.... wait, what are the jar files we need again?

Yeah, so this is another place which can cause confusions.  In fact, because Sphinx has expanded its code from time to time, so the answer of which jar files to add depends.   As of Dec 28, 2012, you should add

  • junit
  • jsapi
  • js
  • fst

This list will likely to grow in future.  I am also pretty sure you might need to do different things if you want to compile in a different setting or write your own code.

Do the build

In modern Eclipse, building should be automatic, what you should see should be 0 errors but many warnings.    I generally don't approve of warnings but as a developer, it's pretty tough to eliminate them all.

Conclusion

There you have it, a little guide on Sphinx 4 compilation with Eclipse.  Notice that this guide may or may not fit your purpose because I focus on downloading the code from Subclipse.   Doing a Link Source should do the trick if you want to incorporate the code yourself.   I might do another post later but the web has many articles described this already, you should be able to find a set of good instructions. 
Arthur
Related Posts: 




Sphinx4 from a C background : Installation of Eclipse

That's another baby step but I guess Eclipse installation is much less painful these days.

When I used Eclipse back in 2008, it was rather difficult to download and install.   Part of the reason is that the software house I worked with didn't have a strong culture of documentation.

Downloading Eclipse Juno for Java Developer was pretty easy.  My next step is to incorporate Sphinx 4 directory and do a compilation.

Arthur

Sphinx4 from a C background : first few steps

As I set out earlier,  one of my goals is to grok all of the components.  I challenged myself to work with Java, which I feel less proficient than my C/C++/Python/Perl.

What should you think when you go from one language to another?  One and only one answer : don't make a judgement too early.  
For example, compilation of Sphinx4 takes 4 steps:
  1. Download and install JDE. 
  2. Download and install ant. 
  3. run ant
If you haven't used JDE, ant or never look at a build.xml, you would feel a bit overwhelmed.    But be patient, there are a lot of goodies of Java.  Most of them are very well thought in terms of software engineering. 
I followed the process.  Woa,  Sphinx 4 is now at beta 6 and it grows to 366 files.   Sounds like groking it will take some time then. 
So what would be your strategy if you want to go forward to understand a Java project such as Sphinx4?   My suggestion: download a good IDE such as Eclipse or NetBeans.
If you are like me, coming from a emacs background, learning Eclipse would take you sometime as well.   But again: don't make a judgement too early.  Eclipse is nice in its own way.  (At least it's not Visual X.....)    
Practically, using Eclipse to understand the code also has its advantage.  Unlike C-package organization, Java software usually has deep directory hierarchy.  Using emacs would definitely cause you more keystrokes.  The only exception I know of is JDEE.  That again will take you some setup time.
In any case, I got it started.  So, my next goal is to go through all materials of Sphinx 4 again.  This time I demand myself to grok.   I will start from the Sphinx 4 documentation page.  Then expand to source code-level of undersand. 
Arthur