I have been crazily busy so blogging was rather slow for me. Though I have a stronger and stronger feeling that my understanding is closer to the state of the art of speech recognition. And for now, the state of the art of speech recognition, we got to talk about the whole deep neural network trend.
There is nothing conceptually new in the use of hybrid HMM-DBN-DNN. It has been proposed under the name HMM-ANN in the past. What is new is that there is new algorithm which allow fast training of multi-layered neural network. It is mainly due to Hinton's breakthrough in 2006: it suggests training a DBN-DNN can be first initialized by pretrained RBM.
I am naturally very interested in this new trend. IBM, Microsoft and Googles' results show that DBN-DNN is not a toy model we saw last two decades.
Well, that's all for my excitement on DBN, I still have tons of things to learn. Back to the "Grand Janitor Blog", as I had tried to improve the blog layout 4 months ago, I got to say I feel very frustrated by Blogger and finally decide to move to WordPress.
I hope to move within the next month or so. I will write a more proper announcement later on.
I have left the development of CMU Sphinx for around 6 years. Geez. Talking about changes. During the time, I went to work for one startup and one defense contractor. Start numerous non-speech related blogs.
I certainly have fun but feel drifted at the same time - both companies I worked with are extraordinary but their causes are not mine. As you know, life without a cause is a tough life.
And now when I am inspecting Sphinx and open source speech recognition again. Wow, there are tons of changes. The awareness of the need of open source speech recognition has never been so acute and high. The performance of open source speech recognition still requires a lot of work but it is no longer unthinkable to deploy an open source speech recognizer in a real application.
There are more resources for learning how to use a speech recognizer. Thanks to dedicated Sphinx developers such as David Huggins-Daines and Nickolay Shmyrev. Many more people learn about how to properly use Sphinx and there are more documentation around.
There are also more resources for building a speech recognizer. One notable effort is Voxforge led by Ken McClean which dedicated to accumulate clean and transcribed data over the time. Though I don't know how large is its size, I admire the dedication of Ken. Someone should start such a project long long time ago. Once it is started, there is a chance that open source data would be an important source of speech data in future.
In my last 6 years, I can only act as a bystander of Sphinx development. I change job again recently and will work with a company which is close to Sphinx. I don't know how much I will do *real* work. But I am glad that Sphinx and I cross paths again. At the very least, I hope to contribute ideas to the community and help this great project grows.
The Grand Janitor