Author: grandjanitor

Friday Speech-related Links

Post author By grandjanitor
Post date March 22, 2013
No Comments on Friday Speech-related Links

Future Windows Phone speech recognition revealed in leaked video

Whether you like Softie, they are innovative in speech recognition in these few years. I am looking forward for their integration of DBN in many of their products.

German Language Learning Startup Babbel Buys Disrupt Finalist PlaySay To Target The U.S. Market

Not exactly in ASR but language learning has been a main stay. Look at EnglishCentral, they have been around and kicking well.

HMM with scipy-learn

When I first learned HMM, I was always hoping to use a scripting language to train the simplest HMM. scipy-learn is one such software.

Google Keep

Voice memo is a huge market. But mobile continus speech recognition is a very challenging task. Yet, with Google technology, I think it should be better than its competitor, Evernote.

Arthur

Amazon Programming Samsung smartwatch Square Starbucks Statistics

Thursday Links (FuzzBuzz programming, Samsung, Amazon and more)

Post author By grandjanitor
Post date March 21, 2013
No Comments on Thursday Links (FuzzBuzz programming, Samsung, Amazon and more)

Geeky:

Placebo Surgery : Still think acupuncture is a thing?

Expertise, the Death of Fun, and What to Do About It by James Hague

Indeed, it got hard to learn. My two cents: always keep notes on your work. See every mistakes as an opportunity to learn. And always learn new things, never stop.

FizzBuzz programming (2007)

It’s sad that it is true.

Technology in general:

Samsung smartwatch product

I still look for the Apple’s product more. I guess I was there when iPhone came out, it’s rather hard to not say Samsung plagiarize…….

The Economics of Amazon Prime (link)

When I go to Amazon, using Prime has indeed became an option, especially for the thousand ebook which cause less than $2.99. Buying ten of them is very close to the monthly subscription fee of Amazon Prime.

Starbucks and Square don’t seem to “mix” well (link)

Other newsworthy:

As Crop Prices Surge, Investment Firms and Farmers Vie for Land

Crop has reversed its course, if you are interested in restaurants business (like me), this has a huge impact of the whole food chain.

The many failures of the personal finance industry

Many geeky friends of mine are not making good sense in personal finance. This is a good link to understand the industry.

Arthur

DNN George Hinton. Duolingo News Scribe

Thursday Speech-related Readings

Post author By grandjanitor
Post date March 21, 2013
No Comments on Thursday Speech-related Readings

Speech Recognition Stumbles at Leeds Hospital

I wonder who the vendor is.

Google Peanut Gallery (Slate)

Interesting showcase again. Google always has pretty impressive speech technology.

Where Siri Has Trouble Hearing, a Crowd of Humans Could Help

Combining fragments of recognition a rather interesting idea though it’s probably not new. I am glad it is taking off though.

Google Buys Neural Net Startup, Boosting Its Speech Recognition, Computer Vision Chops

This is huge. Once again, it says something about the power of DNN approach. It is probably the real focus in the next 5 years.

Duolingo Adds Offline Mode And Speech Recognition To Its Mobile App

I always wonder how the algorithm works. Confidence-based algorithm of verification has always been tough to get it work. But then again, the whole deal of reCAPTCHA is really try to differentiate between human and machines. So it’s probably not as complicated than I thought.

Some notes on DNS 12: link

The whole sentence mode is the more interesting part. Does it make users more frustrated though? I am curious.

Arthur

Mark Suster Martin Fowler STEM Jobs

Wednesday Links (STEM Jobs)

Post author By grandjanitor
Post date March 20, 2013
No Comments on Wednesday Links (STEM Jobs)

Martin Fowler on Homonyms in Design
Peter Bell on Innovation Debt
Mark Suster’s “Is it Time to Earn or to Learn?“

STEM Jobs Series by Daniel Lemire (read from Vivek Halda’s blog)

What is really hot in STEM jobs?
The Catch-22 of STEM Job Market
What do STEM job employers want?

Also the NYT’s comment from Prof. Peter Cappelli:
If There’s a Gap, Blame It on the Employer

Arthur

open source speech recognition simon

Landscape of Open Source Speech Recognition Software (II : Simon)

Post author By grandjanitor
Post date March 20, 2013
3 Comments on Landscape of Open Source Speech Recognition Software (II : Simon)

Around December last year, I wrote an article on open source speech recognizers. I covered HTK, Kaldi and Julius. One thing you should know, just like CMUSphinx, all of these packages contain their own versions of Viterbi algorithms’ implementation. So when you asked someone who is in the field of speech recognition, they will usually say open source speech recognizers are Sphinx, HTK, Kaldi and Julius.

That’s how I usually view speech recognition too. After years working in the industry though, I start to realize this definition of seeing speech recognizer = Viterbi algorithm could be constraining. In fact, from the user’s point of view, a good speech application system should be a combination of

a recognizer + good models + good GUI.

I like to call the former type of “speech recognizer” as “speech recognition engines” but the latter type as “speech recognition applications“. Both types of “speech recognizers” are worthwhile applications. From the users’ point of view, it might just be a technicality to differentiate them.

When I am recovering as a speech recognition programmer (another name throwing 🙂 ), one thing I notice is that there is much effort on writing “speech recognition applications“. It is a good trend because most people from academia really didn’t spend too much time to write good speech applications. And in open source, we badly need good applications such as dictation machine, IVR and C&C.

One effort which really impressed me is Simon. It is weird because most of the time I only care about engine-level type of software. But in the case of Simon, you can see couple of its features are really solving problems in real life and integrated to the bigger them of open source speech recognition.

In 0.4.0, Simon starts to integrate with Sphinx. So if someone wants to develop it commercially, they can.
The Simon’s team also intentionally make context switching in the application, that’s good work as well. In general, if you always use a huge dictionary, you are just over-recognizing words in a certain context.
Last and not least, I like the fact it integrates itself to Voxforge. Voxforge is the open source answer to a large speech database of commercial speech company. So integration with Voxforge will ensure an increasing amount of data for your application.

So kudo to the Simon team! I believe this is the right kind of thinking to start a good speech application.

Arthur

C++ g2p sphinxbase sphinxtrain Thought

sphinxbase 0.8 and SphinxTrain 1.08

Post author By grandjanitor
Post date March 20, 2013
No Comments on sphinxbase 0.8 and SphinxTrain 1.08

I have done some analysis on sphinxbase0.8 and SphinxTrain 1.08 and try to understand if it is very different from sphinxbase0.7 and SphinxTrain1.0.7. I don’t see big difference but it is still a good idea to upgrade.

(sphinxbase) The bug in cmd_ln.c is a must fix. Basically the freeing was wrong for all ARG_STRING_LIST argument. So chances are you will get a crash when someone specify a wrong argument name and cmd_ln.c forces an exit. This will eventually lead to a cmd_ln_val_free.
(sphinxbase) There were also couple of changes in fsg tools. Mostly I feel those are rewrites.
(SphinxTrain) sphinxtrain, on the other hands, have new tools such as g2p framework. Those are mostly openfst-based tool. And it’s worthwhile to put them into SphinxTrain.

One final note here: there is a tendency of CMUSphinx, in general, starts to turn to C++. C++ is something I love and hate. It could sometimes be nasty especially dealing with compilation. At the same time, using C to emulate OOP features is quite painful. So my hope is that we are using a subset of C++ which is robust across different compiler version.

Arthur

cmu sphinx multiprocessing Python

Python multiprocessing

As my readers may noticed, I haven’t updated this blog as I have pretty heavy workload. It doesn’t help that I was sick in the middle of March as well. Excuses aside though, I am happy to come back. If I couldn’t write much about Sphinx and programming, I think it’s still worth it to keep posting links.

I also come up with requests on writing more details on individual parts of Sphinx. I love these requests so feel free to send me more. Of course, it usually takes me some time to fully grok a certain part of Sphinx and I could describe it in an approachable way. So before that, I could only ask for your patience.

Recently I come up with parallel processing a lot and was intrigued on how it works in the practice. In python, a natural choice is to use the library multiprocessing. So here is a simple example on how you can run multiple processes in python. It would be very useful in the modern days CPUs which has multi-cores.

Here is an example program on how that could be done:

1:  import multiprocessing  
2:  import subprocess  
3:    jobs = []  
4:    for i in range (N):  
5:      p = multiprocessing.Process(target=process,   
6:                      name = 'TASK' + str(i),   
7:                      args=(i, ......  
8:                    )  
9:      )  
10:     jobs.append(p)  
11:     p.start()  
12:   for j in jobs:  
13:     if j.is_alive():  
14:        print 'Waiting for job %s' %(j.name)  
15:        j.join()

The program is fairly trivial. Interesting enough, it is also quite similar to the multithreading version in python. Line 5 to 11 is where you run your task and I just wait for the tasks finished from Line 12 to 15.

It feels little bit less elegant than using Pool because it provides a waiting mechanism for the entire pool of task. Right now, I am essentially waiting for job which is still running by the time job 1 is finished.

Is it worthwhile to go another path which is thread-based programming. One thing I learned in this exercise is that older version of python, multi-threaded program can be paradoxically slower than the single-threaded one. (See this link from Eli Bendersky.) It could be an easier being resolved in recent python though.

Arthur

Uncategorized

Readings at Feb 28, 2013

Taeuber’s Paradox and the Life Expectancy Brick Wall by Kas Thomas

Simplicity is Wonderful, But Not a Requirement by James Hague

Yeah. I knew a professor who always want to rewrite speech recognition systems such that is easier for research. Ahh…… modern speech recognition systems are complex any way. Not making mistakes is already very hard. Not to say building a good research system which easy to use for everyone. (Remember, everyone has their different research goal.)

Arthur

Thought

On sh*tty job.

I read “Why Hating Your Shitty Job Only Makes It Worse“, there is something positive about the article but I can’t completely agree with the authors.

Part of the dilemma at work in a traditional office space is that inevitably some kind of a*holes and bad system will appear in your life. The question is whether you want to ignore it or not. You should be keenly aware of your work condition and make rational decision of staying an leaving.

Arthur

infosec

61398

Chinese Army is Seen as Tied to Hacking Against U.S.