Categories
Aaron Swartz acoustic score Dragon goldman sach Kurzweil list Sphinx4 sphinxtrain subword units

January 2013 Write-up

Miraculously, I still have some momentum for this blog and I have kept on the daily posting schedule.

Here is a write up for this month:  Feel free to look at this post on how I plan to write this blog:

Some Vision of the Grand Janitor’s Blog

Sphinx’ Tutorials and Commentaries

SphinxTrain1.07’s bw:

Commentary on SphinxTrain1.07’s bw (Part I)
Commentary on SphinxTrain1.07’s bw (Part II)

Part I describes the high-level layout, Part II and describe half the state network was built.

Others:
Acoustic Score and Its Sign
Subword Units and their Occasionally Non-Trivial Meanings

Sphinx4:
Sphinx 4 from a C background : Material for Learning

News

Goldman Sachs not Liable
Aaron Swartz……

Other writings:

On Kurzweil : a perspective of an ASR practitioner

Enjoy!

Arthur

Categories
acoustic score cmu sphinx linguistic score. Speech Recognition

Acoustic Score and Its Signness

Over the years, I got asked about why acoustic score could be a positive number all the time. That occasionally lead to a kind of big confusion from beginner users. So I write this article as a kind of road sign for people.

Acoustic score per frame is essentially the log value of continuous distribution function (cdf). In Sphinx’s case, the cdf is a multi-dimensional Gaussian distribution. So Acoustic score per phone will be the log likelihood of the phone HMM. You can extend this definition to word HMM.

For the sign. If you think of a discrete probability distribution, then this acoustic score thingy should always be negative. (Because log of a decimal number is negative.) In the case of a Gaussian distribution though, when the standard deviation is small, it is possible that the value is larger than 1. (Also see this link). So those are the time you will see a positive value.

One thing you might feel disharmonious is the magnitude of the likelihood you see. Bear in mind, Sphinx2 or Sphinx3 are using a very small logbase. We are also talking about a multi-dimensional Gaussian distribution. It makes numerical values become bigger.

Arthur

Also see:
My answer on the Sphinx Forum