It has been a while since I last blogged. I was mostly busy with work so blogging was on the sideline. So this time I spent a bit of time to update publication list and get my hands wet.
What I learned in last 1.5 years? "ASR is solved" That's the one comment I heard when DNN+GMM came out for around 3 years, i.e. when DNN+GMM was widely adopted by around 40-50 sites around the world:
Of course, this was said before - it happened when people started to use adaptive techniques and see significant gain. Perhaps it was also said when people first discovered using GMM as the state distributions, or first using HMM instead of DTW.
So while I understand people are getting more elated. Probably we still have problem to solve. For example, look at the latest IBM research at Switchboard, there are probably quite a lot of room to improve the current state-of-the-art NN-based system. Not to say, on-line videos transcription seems to be hard problems. That's perhaps why many people are still working on ASR.
Hopefully I can come back more - ideally one post per week. We will see how goes.