HMM – The Grand Janitor Blog V3

I moved “The Grand Janitor Blog” to WordPress. Nothing much, Blogger is simply too constraining. I don’t like the theme. I can’t really customize a thing. I can’t put an ad there if I want to sell something. So it was really annoying and it’s time to change.

But then what’s new with V2? First of all, I might blog more about how machine learning influence speech recognition. It’s not new that machine learning is the source of how speech recognition. It has always been like that. Many experts who work in speech recognition have deep knowledge in pattern recognition. When you look at their papers, you can sense that they have studied a certain machine learning method in great-depth. So they can come up with creative ideas to improve the bottom-line, which is the only thing I care. I don’t really care the thousand APIs wrap around a certain recognizer. I only care about the guts inside the decoder, the trainer. Those components are what really matters but those are also components which are most misunderstood.

So why now? It’s obvious that the latest development of DBN-DNN (the “next big thing”) is one factor. I was told in school (10+ years ago) that GMM is the state of the art. But things are rapidly changing, work of Prof. Hinton has given a theoretical basis for making DBN-DNN training practically feasible. Enthusiasts, some rather sophisticated, are gather around the Kaldi forum.

For me, as I I will describe myself as a recovering ASR programmer. What does it mean? It means I need to grok ASR from theory to implementation. That’s tough. I found myself studying again, dust off my “Advanced Calculus” and try to read and think creatively text such as “Connectionist Speech Recognition A Hybrid Approach” by Bourland and Nelson. (It’s highly entertaining technical text!) Perhaps more in the future. But when you try to drill a certain skill in your life, there got to be a point you need to go back to the basic. Re-think all the things you thought you know. Re-prove all the proofs you thought you understood. That takes time and patience but at the end it is also how you come up with new ideas.

As for the readers, sorry for never getting back to your suggested blog messages. You might be interested in a code trace of a certain part of Sphinx. You might be interested in how certain parts of the program work. I kept a list of them and probably write-up something when I have time. No promise though; I have been very busy. And to be frank: everyone who works in ASR is busy. That perhaps explain why not many actively maintained blogs in speech recognition.

Of course, I will keep on posting on other diverse topics such as programming and technology. I am still a geek. I don’t think anyone can change that. 🙂

In any case, feel free to connect with me and have fun with speech recognition!

Cheers

Arthur Chan, “The Grand Janitor”