Category Archives: Thought

AIDL Pinned Post V2

(Just want to keep a record for myself.)

Welcome! Welcome! We are the most active FB group for Artificial Intelligence/Deep Learning, or AIDL. Many of our members are knowledgeable so feel free to ask questions.

We have a tied-in newsletter: and

a YouTube-channel, with (kinda) weekly show "AIDL Office Hour",

Posting is strict at AIDL, your post has to be relevant, accurate and non-commerical (FAQ Q12). Commercial posts are only allowed on Saturday. If you don't follow this rule, you might be banned.


Q1: How do I start AI/ML/DL?
A: Step 1: Learn some Math and Programming,
Step 2: Take some beginner classes. e.g. Try out Ng's Machine Learning.
Step 3: Find some problem to play with. Kaggle has tons of such tasks.
Iterate the above 3 steps until you become bored. From time to time you can share what you learn.

Q2: What is your recommended first class for ML?
A: Ng's Coursera, the CalTech edX class, the UW Coursera class is also pretty good.

Q3: What are your recommended classes for DL?
A: Go through at least 1 or 2 ML class, then go for Hinton's, Karparthay's, Socher's, LaRochelle's and de Freitas. For deep reinforcement learning, go with Silver's and Schulmann's lectures. Also see Q4.

Q4: How do you compare different resources on machine learning/deep learning?
A: (Shameless self-promoting plug) Here is an article, "Learning Deep Learning - Top-5 Resources" written by me (Arthur) on different resources and their prerequisites. I refer to it couple of times at AIDL, and you might find it useful:…/learning-deep-learning-my-top…/ . Reddit's machine learning FAQ has another list of great resources as well.

Q5: How do I use machine learning technique X with language L?
A: Google is your friend. You might also see a lot of us referring you to Google from time to time. That's because your question is best to be solved by Google.

Q6: Explain concept Y. List 3 properties of concept Y.
A: Google. Also we don't do your homework. If you couldn't Google the term though, it's fair to ask questions.

Q7: What is the most recommended resources on deep learning on computer vision?
A: cs231n. 2016 is the one I will recommend. Most other resources you will find are derivative in nature or have glaring problems.

Q8: What is the prerequisites of Machine Learning/Deep Learning?
A: Mostly Linear Algebra and Calculus I-III. In Linear Algebra, you should be good at eigenvectors and matrix operation. In Calculus, you should be quite comfortable with differentiation. You might also want to have a primer on matrix differentiation before you start because it's a topic which is seldom touched in an undergraduate curriculum.
Some people will also argue Topology as important and having a Physics and Biology background could help. But they are not crucial to start.

Q9: What are the cool research papers to read in Deep Learning?
A: We think songrotek's list is pretty good:…/Deep-Learning-Papers-Reading-Roadmap. Another classic is's reading list:

Q10: What is the best/most recommended language in Deep Learning/AI?
A: Python is usually cited as a good language because it has the best support of libraries. Most ML libraries from python links with C/C++. So you get the best of both flexibility and speed.
Other also cites Java (deeplearning4j), Lua (Torch), Lisp, Golang, R. It really depends on your purpose. Practical concerns such as code integration, your familiarity with a language usually dictates your choice. R deserves special mention because it was widely used in some brother fields such as data science and it is gaining popularity.

Q11: I am bad at Math/Programming. Can I still learn A.I/D.L?
A: Mostly you can tag along, but at a certain point, if you don't understand the underlying Math, you won't be able to understand what you are doing. Same for programming, if you never implement one, or trace one yourself, you will never truly understand why an algorithm behave a certain way.
So what if you feel you are bad at Math? Don't beat yourself too much. Take Barbara Oakley's class on "Learning How to Learn", you will learn more about tough subjects such as Mathematics, Physics and Programming.

Q12: Would you explain more about AIDL's posting requirement?
A: This is a frustrating topic for many posters, albeit their good intention. I suggest you read through this blog post before you start any posting.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Thoughts From Your Humble Administrators - Feb 5, 2017

Last week:

Libratus is the biggest news item this week.  In retrospect, it's probably as huge as AlphaGo.   The surprising part is it has nothing to do with deep-learning.   So it worths our time to look at it closely.

  • We learned that Libratus crushes human professional player in head-up no-limit holdem (NLH).  How does it work?  Perhaps the Wired and the Spectrum articles tell us the most.
    • First of all, NLH is not as commonly played in Go, but it is interesting because people play real-money on it.  And we are talking about big money.  World Series of Poker holds a yearly poker tournament, all top-10 players will become instant millionaires. Among pros, holdem is known as the "Cadillac of Poker" coined by Doyle Brunson. That implies mastering holdem is the key skill in poker.
    • Limit Holdem, which pros generally think it is a "chess"-like game.  Polaris from University of Alberta bested humans in three wins back in 2008.
    • Not NLH until now, so let's think about how you would model a NLH in general. In NLH, the game states is 10^165, close to Go.  Since the game only 5 streets, you easily get into what other game players called end-game.   It's just that given the large number of possibility of bet size, the game-state blow up very easily.
    • So in run-time you can only evaluate a portion of the game tree.    Since the betting is continuous, the bet is usually discretized such that the evaluation is tractable with your compute, known as "action abstraction",  actual bet size is usually called "off-tree" betting.   These off-tree betting will then translate to in tree action abstraction in run-time, known as "action translation".   Of course, there are different types of tree evaluation.
    • Now, what is the merit of Libratus, why does it win? There seems to be three distinct factors, the first two is about the end-game.
      1. There is a new end-game solver ( which features a new criterion to evaluate game tree, called Reach-MaxMargin.
      2. Also in the paper, the authors suggest a way to solve an end-game given the player bet size.  So they no longer use action translation to translate an off-tree bet into the game abstraction.  This considerably reduce "Regret".
    • What is the third factor? As it turns out, in the past human-computer games, humans were able to easily exploit machine by noticing machine's betting patterns.   So the CMU team used an interesting strategy, every night, the team will manually tune the system such that repeated betting patterns will be removed.   That confuses human pro.  And Dong Kim, the best player against the machine, feel like they are dealing with a different machine every day.
    • These seems to be the reasons why the pro is crushed.  Notice that this is a rematch, the pros won in a small margin back in 2015, but the result this time shows that there are 99.8% chance the machine is beating humans.  (I am handwaving here because you need to talk about the big blinds size to talk about winnings.  Unfortunately I couldn't look it up.)
    • To me, this Libratus win is very closed to say computer is able to beat the best tournament head-up players.  But poker players will tell you the best players are cash-game players.  And head-up plays would not be representative because bread-and-butter games are usually 6 to 10 player games. So we will probably hear more about pokerbot in the future.

Anyway, that's what I have this week.  We will resume our office hour next week.  Waikit will tell you more in the next couple of days.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

Thoughts From Your Humble Administrators - Jan 29, 2017

This week at AIDL:

Must-read:  I would read the Stanford's article and Deep Patient's paper in tandem.

If you like this message, subscribe the Grand Janitor Blog's RSS feed. You can also find me (Arthur) at twitter, LinkedInPlus,  Together with Waikit Lau, I maintain the Deep Learning Facebook forum.  Also check out my awesome employer: Voci.

How To Get Better At X (X = Programming, Math, etc ) ......

Here are some of my reflections on how to improve at work.

So how would you get better at X?

X = Programming

  • Trace code of smart programmers, learn their tricks,
  • Learn how to navigate codebase using your favorite editors,
  • Learn algorithm better, learn math better,
  • Join an open source project,  first contribute, then see if you can maintain,
  • Always be open to learn a new language.

X = Machine Learning

X = Reading Literature

  • Read everyday, make it a thing.
  • Browse arxiv's summary as if it more than daily news.
  • Ask questions on social networks, Plus or Twitter, listen to other people,
  • Teach people a concept, it makes you consolidate your thought and help you realize something you don't really know something.

X = Unix Administration

  • Google is your friend.
  • Listen to experienced administrator, their perspective can be very different - e.g. admin usually care about security more than you.   Listen to them and think whether your solution incorporate their thought.
  • Every time you solve a problem, put it in a notebook.  (Something which Tadashi Yonezaki at Scanscout taught me.)

X = Code Maintenance

  • Understand the code building process, see it as a part of your jobs to learn them intimately,
  • Learn multiple types of build system, learn autoconf, cmake, bazel.  Learn them,  because by knowing them you can start to compile and eventually really hack a codebase.
  • Learn version control, learn GIT.  Don't say you don't need one, it would only inhibit your speed.
  • Learn multiple types of version control systems, CVS, SVN, Mercury and GIT.  Learn why some of them are bad (CVS), some of them are better but still bad (SVN).
  • Send out a mail whenever you are making a release, make sure you communicate clearly what you plan to do.

X = Math/Theory

  • Focus on one topic.  For example, I am very interested in machine learning these days, so I am reading Bishops.
  • Don't be cheap, buy the bibles in the field.  Get Thomas Cover if you are studying information theory.   Read Serge Lang on linear algebra.
  • Solve one problem a day, may be more if you are bored and sick of raising dumbbells.
  • Re-read a formulation of a certain method.  Re-read a proof.   Look up different ways of how people formulate and prove something.
  • Rephrasing Ian Stewart - you always look silly before your supervisor.  But always remember that once you study to the graduate-level, you cannot be too stupid.   So what learning math/theory takes is gumption and perseverance.

X = Business

  • Business has mechanism so don't dismiss it as fluffy before you learn the details,
  • Listen to your BD, listen to your sales, listen to your marketing friends.   They are your important colleagues and friends

X = Communication

  • Stands on other people shoes, that is to say: be empathetic,
  • I think it's Atwood said: (rephrase) It's easy to be empathetic for people in need, but it's difficult to be empathetic for annoying and difficult people.   Ask yourself these questions,
    • Why would a person became difficult and annoying in the first place?  Do they have a reason?
    • Are you big enough to help these difficult and annoying people?   Even if they could be toxic?
  • That said, communication is a two-way street, there are indeed hopeless situation.  Take it in stride, spend your time to help friends/colleagues who are in need.

X = Anything

Learning is a life-long process, so be humble and ready to be humbled.





Good ASR Training System

The term "speech recognition" is a misnomer.

Why do I say that? I have explained this point in an old article "Do We Have True Open Source Dictation?, which I wrote back in 2005: To recap,  a speech recognition system consists of a Viterbi decoder, an acoustic model and a language model.  You could have a great recognizer but bad accuracy performance if the models are bad.

So how does that related to you, a developer/researcher of ASR?    The answer is ASR training tools and process usually become a core asset of your inventories.    In fact, I can tell you when I need to work on acoustic model training, I need to spend full time to work on it and it's one of the absorbing things I have done.  

Why is that?  When you look at development cycles of all tasks in making an ASR systems.   Training is the longest.  With the wrong tool, it is also the most error prone.    As an example, just take a look of Sphinx forum, you will find that majority of non-Sphinx4 questions are related to training.    Like, "I can't find the path of a certain file", "the whole thing just stuck at the middle".

Many first time users complain with frustration (and occasionally disgust) on why it is so difficult to train a model.   The frustration probably stems from the perception that "Shouldn't it be well-defined?"   The answer is again no. In fact how a model should be built (or even which model should be built) is always subjects to change.   It's also one of the two subfields in ASR, at least IMO, which is still creative and exciting in research.  (Another one: noisy speech recognition.)  What an open source software suite like Sphinx provide is a standard recipe for everyone.

Saying so, is there something we can do better for an ASR training system?   There is a lot I would say, here are some suggestions:

  1. A training experiment should be created, moved and copied with ease,
  2. A training experiment should be exactly repeatable given the input is exactly the same,
  3. The experimenter should be able to verify the correctness of an experiment before an experiment starts. 
Ease of Creation of an Experiment

You can think of a training experiment as a recipe ...... not exactly.   When we read a recipe and implement it again, we human would make mistakes.

But hey! We are working with computers.   Why do we need to fix small things in the recipe at all? So in a computer experiment, what we are shooting for is an experiment which can be easily created and moved around.

What does that mean?  It basically means there should be no executables which are hardwired to one particular environment.   There should also be no hardware/architecture assumption in the training implementations.   If there is, they should be hidden.

Repeatability of an Experiment

Similar to the previous point, should we allow difference when running a training experiment?  The answer should be no.   So one trick you heard from experienced experimenters is that you should keep the seed of random generators.   This will avoid minute difference happens in different runs of experiments.

Here someone would ask.   Shouldn't us allow a small difference between experiments?  We are essentially running a physical experiment.

I think that's a valid approach.  But to be conscientious, you might want to run a certain experiment many times to calculate an average.    In a way, I think this is my problem with this thinking.  It is slower to repeat an experiment.    e.g.  What if you see your experiment has 1% absolute drop?  Do you let it go? Or do you just chalk it up as noise?   Once you allow yourself to not repeat an experiment exactly, there will be tons of questions you should ask.

Verifiability of an Experiment

Running an experiment sometimes takes day, how do you make sure running it is correct? I would say you should first make sure trivial issues such as missing paths, missing models, or incorrect settings was first screened out and corrected.

One of my bosses used to make a strong point and asked me to verify input paths every single time.  This is a good habit and it pays dividend.   Can we do similar things in our training systems?

Apply it on Open Source

What I mentioned above is highly influenced by my experience in the field.   I personally found that sites, which have great infrastructure to transfer experiments between developers, are the strongest and faster growing.   
To put all these ideas into open source would mean very different development paradigm.   For example, do we want to have a centralized experiment database which everyone shares?   Do we want to put common resource such as existing paramatized inputs (such as MFCC) somewhere in common for everyone?  Should we integrate the retrieval of these inputs into part of our experiment recipe? 
Those are important questions.   In a way, I think it is the most type of questions we should ask in open source. Because regardless of much volunteer's effort.  Performance of open source models is still lagging behind the commercial models.  I believe it is an issue of methodology.  

sphinxbase 0.8 and SphinxTrain 1.08

I have done some analysis on sphinxbase0.8 and SphinxTrain 1.08 and try to understand if it is very different from sphinxbase0.7 and SphinxTrain1.0.7.  I don't see big difference but it is still a good idea to upgrade.

  • (sphinxbase) The bug in cmd_ln.c is a must fix.  Basically the freeing was wrong for all ARG_STRING_LIST argument.  So chances are you will get a crash when someone specify a wrong argument name and cmd_ln.c forces an exit.  This will eventually lead to a cmd_ln_val_free. 
  • (sphinxbase) There were also couple of changes in fsg tools.  Mostly I feel those are rewrites.  
  • (SphinxTrain) sphinxtrain, on the other hands, have new tools such as g2p framework.  Those are mostly openfst-based tool.  And it's worthwhile to put them into SphinxTrain. 
One final note here: there is a tendency of CMUSphinx, in general, starts to turn to C++.   C++ is something I love and hate. It could sometimes be nasty especially dealing with compilation.  At the same time, using C to emulate OOP features is quite painful.   So my hope is that we are using a subset of C++ which is robust across different compiler version. 

On sh*tty job.

I read "Why Hating Your Shitty Job Only Makes It Worse",  there is something positive about the article but I can't completely agree with the authors.

Part of the dilemma at work in a traditional office space is that inevitably some kind of a*holes and bad system will appear in your life.   The question is whether you want to ignore it or not.   You should be keenly aware of your work condition and make rational decision of staying an leaving.


Two Views of Time-Signal : Global vs Local

As I have been working on Sphinx at work and start to chat with Nicholay more, one thing I realize is that several frequently used components of Sphinx need to rethink.  Here is one example  related to my work recently.

Speech signal or ...... in general time signal can be processed in two ways: you either process as a whole, or you process in blocks.  The former, you can call it a global view, the latter, you can call it a local view.  Of course, there are many other names: block/utterance, block/whole but essentially the terminology means the same thing.

For most of the time, global and local processing are the same.   So you can simply say: the two types of the processing are equivalent.

Of course, not when you start to an operation which use information available.   For a very simple example, look at cepstral mean normalization (CMN).  Implementing CMN in block mode is certainly an interesting problem.  For example, how do you estimate the mean if you have a running window?   When you think about it a little bit, you will realize it is not a trivial problem. That's probably why there are still papers on cepstral mean normalization.

Translate to sphinx, if you look at sphinxbase's sphinx_fe, you will realize that the implementation is based on the local mode, i.e. every once in a while, samples are consumed, processed and write onto the disc.    There is no easy way to implement CMN on sphinx_fe because it is assumed that the consumer (such as decode, bw) will do these stuffs their own.

It's all good though there are interesting consequence: what the SF's guys said about "feature" is really all the processing that can be done in the local sense.   Rather than the "feature" you see in either the decoders or bw.

This special point of view is ingrained within sphinxbase/sphinxX/sphinxtrain (Sphinx4? not sure yet.) .  This is quite different from what you will find in HTK which see feature vector as the vector used in Viterbi decoding.

That bring me to another point.  If you look deeper, HTK such as HVite/HCopy are highly abstract. So each tool was designed to take care of its own problem well. HCopy really means to provide just the feature, whereas HVite is just doing Viterbi algorithm on a bunch of features.   It's nothing complicated.  On the other hand, Sphinx are more speech-oriented.  In that world, life is more intertwined.   That's perhaps why you seldom hear people use Sphinx to do research other than speech recognition.  You can, on the other hand, do other machine learning tasks in HTK.

Which view is better?  If you ask me, I hope that both HTK and Sphinx are released in Berkeley license.  Tons of real-life work can be saved because each cover some useful functionalities.

Given that only one of them are released in a liberal license (Sphinx),  then may be what we need is to absorb some design paradigm from HTK.  For example, HTK has a sense of organizing data as pipes.   That something SphinxTrain can use.   This will enhance work of Unix users, who are usually contribute the most in the community.

I also hope that eventually there are good clones of HTK tools but made available in Berkeley/GNU license.  Not that I don't like the status quo: I am happy to read the code of HTK (unlike the time before 2.2......).   But as you work in the industry for a while, many are actually using both Sphinx and HTK to solve their speech research-related problems.   Of course, many of these guys  (, if they are honest,) need to come up with extra development time to port some HTK functions into their own production systems.  Not tough, but you will wonder whether time can be better spent ......


Readings at Jan 8, 2012

Testing Redux  by Vivek Halder

Comment: I second Vivek,  in a large scale project, having no testing is a project-killing act.  One factor to consider: in real-life, the mythical 100X productive programmer are rarely seen.   Even then, these programmers can make a mistake or two.   Therefore, not having any automatic testing  for a group is a very bad thing. 
On the other, should an individual programmer always follow automatic testing?   Yes and no.  Yes in the sense you should always write a test for your program.  No in the sense that you shouldn't believe testing will make your program automatically correct.   
Comment: very well said.  I hope this more hours = more work done crap can end soon. 
Todon't by Jeff Atwood
Comment: I like todo list alot but sometimes it takes me a while to organize them.  They also grow very quickly.  A good suggestions here is that not only you should add to your list, you should always change priorities.  One of the purposes of todo is to record your life but it has nothing to do with how you move forward with your life. 

You got to have thick skin......

Linux Chews Up Kernel Maintainer for Introducing UserSpace Bug.

That's just the way it is.  Remember, there are always better, stronger, more famous, more powerful programmers working with you.  So criticisms will come one day.

My take, as long as if you don't work with them in person, just treat them as an icon on the screen.   😀