Category Archives: Programming

How To Get Better At X (X = Programming, Math, etc ) ......

Here are some of my reflections on how to improve at work.

So how would you get better at X?

X = Programming

  • Trace code of smart programmers, learn their tricks,
  • Learn how to navigate codebase using your favorite editors,
  • Learn algorithm better, learn math better,
  • Join an open source project,  first contribute, then see if you can maintain,
  • Always be open to learn a new language.

X = Machine Learning

X = Reading Literature

  • Read everyday, make it a thing.
  • Browse arxiv's summary as if it more than daily news.
  • Ask questions on social networks, Plus or Twitter, listen to other people,
  • Teach people a concept, it makes you consolidate your thought and help you realize something you don't really know something.

X = Unix Administration

  • Google is your friend.
  • Listen to experienced administrator, their perspective can be very different - e.g. admin usually care about security more than you.   Listen to them and think whether your solution incorporate their thought.
  • Every time you solve a problem, put it in a notebook.  (Something which Tadashi Yonezaki at Scanscout taught me.)

X = Code Maintenance

  • Understand the code building process, see it as a part of your jobs to learn them intimately,
  • Learn multiple types of build system, learn autoconf, cmake, bazel.  Learn them,  because by knowing them you can start to compile and eventually really hack a codebase.
  • Learn version control, learn GIT.  Don't say you don't need one, it would only inhibit your speed.
  • Learn multiple types of version control systems, CVS, SVN, Mercury and GIT.  Learn why some of them are bad (CVS), some of them are better but still bad (SVN).
  • Send out a mail whenever you are making a release, make sure you communicate clearly what you plan to do.

X = Math/Theory

  • Focus on one topic.  For example, I am very interested in machine learning these days, so I am reading Bishops.
  • Don't be cheap, buy the bibles in the field.  Get Thomas Cover if you are studying information theory.   Read Serge Lang on linear algebra.
  • Solve one problem a day, may be more if you are bored and sick of raising dumbbells.
  • Re-read a formulation of a certain method.  Re-read a proof.   Look up different ways of how people formulate and prove something.
  • Rephrasing Ian Stewart - you always look silly before your supervisor.  But always remember that once you study to the graduate-level, you cannot be too stupid.   So what learning math/theory takes is gumption and perseverance.

X = Business

  • Business has mechanism so don't dismiss it as fluffy before you learn the details,
  • Listen to your BD, listen to your sales, listen to your marketing friends.   They are your important colleagues and friends

X = Communication

  • Stands on other people shoes, that is to say: be empathetic,
  • I think it's Atwood said: (rephrase) It's easy to be empathetic for people in need, but it's difficult to be empathetic for annoying and difficult people.   Ask yourself these questions,
    • Why would a person became difficult and annoying in the first place?  Do they have a reason?
    • Are you big enough to help these difficult and annoying people?   Even if they could be toxic?
  • That said, communication is a two-way street, there are indeed hopeless situation.  Take it in stride, spend your time to help friends/colleagues who are in need.

X = Anything

Learning is a life-long process, so be humble and ready to be humbled.





Radev's Coursera Introduction to Natural Language Processing - A Review

As I promised earlier, I was going to review Prof.  Dragomir Radev's introductory class on natural language processing.   Few words about Prof. Radev: from his Wikipedia entry, Prof Radev is an award winning Professor, who co-found North American Computational Linguistics Olympiad (NACLO), which is the equivalent of USAMO in computational linguistics. He was also the coach of U.S. coach of International Language Olympiad 2011 and helped the team won several medals [1].    I think these are great contributions to the speech and language community.  In late 90s, when I was still in undergraduate, there was not much recognition of computational language processing as an important computation skill.    With competition in high-school or college level, there will be a generation of young minds who would aspire to build intelligent conversation agent, robust speech recognizer and versatile question and answering machine.   (Or else everyone would think Linux kernel is the only cool hack in town. 🙂 )

The Class

So how about the class?  I got to say I am more than surprised and happy with it.   I was searching for an intro NLP class, so the two natural choices was Prof. Jurafsky' and Manning' s and Prof.  Collin's Natural Language Processing.   Both classes received great praise and comments and few of my friends recommend to take both.   Unfortunately, there was no class offering recently so I could only watch the material off-line.

Then there comes the Radev's class,  it is as Prof. Radev explains: "more introductory" than Collin's class and "more focused on Linguistics and resources" than Jurafsky and Manning.   So it is good for two types of learners:

  1. Those who just started out in NLP.
  2. Those who want to gather useful resources and start projects on NLP.

I belong to both types.   My job requires me to have more comprehensive knowledge of language and speech processing.

The Syllabus and The Lectures

The class itself is a brief survey of many important topics of NLP.   There are the basics:  parsing, tagging, language modeling.  There are the advanced topics such as summarization, statistical machine translation (SMT), semantic analysis and dialogue modeling.   The lectures, except occasionally mistakes, are quite well done and filled with interesting examples.

My only criticism is perhaps the length of videos, I would hope that most videos I watch would be less than 10 minutes.    That makes it easier to rotate with my other daily tasks.

The material is not too difficult to absorb for newcomers.   For starter, advanced topic such as  SMT is not covered in too much detail mathematically.  (So no need to derive EM on IBM models.)  That I think it's quite appropriate for first time learners like me.

One more unique feature of the lectures: it fills with interesting NACLO problems.    While NACLO is more a high-school level competition, most of the problems are challenging even for experienced practitioners.  And I found them quite stimulating.

The Prerequisites and The Homework Assignments

To me, the fun part is the homework.   There were 3 of them, they focus on,

  1. Nivre's Dependency Parser,
  2. Language Modeling and POS Tagging,
  3. Word Sense Disambiguation

All homework are based on python.   If you know what you are doing, they are not that difficult to do.   For me, I spent around 12-14 hours on each.   (Those are usually weekends.) Just like Ng's Machine Learning class,   you need to match numbers with  the golden reference.   I think that's the right approach to learn any machine learning task the first time.   Blindly come up with a system and hope it works never get you anywhere.

The homework does speak about an issue of the class, i.e. you do need to know the basics of Machine Learning .  Also, if you never had any programming experience would find the homework very difficult.   This probably described many linguistic students but never take any computer science classes.  [3]    You can still "power it through" and pass.  But it can be unnecessarily hard.

So I will recommend you first take the Ng's class or perhaps the latest Machine Learning specialization from Guestrin and Fox first.   Those are the classes which would give you some basics of programming as well as basic concept of Machine Learning.

If you didn't take any machine learning class, one way to go through more difficult classes like this is to read forum messages.   There are many nice people in the course was answering various questions.   To be frank, if the forum doesn't exist, then it will take me around 3 times more time to finish all assignments.

Final Word

All-in-all, I highly recommend Prof. Radev's class to anyone who is interested in NLP.    As I mentioned though, the class does require prerequisite such as basics of programming and machine learning.   So  I would recommend any learners to first take the Ng's class before taking this one.

In any case, I want to thank Prof. Radev and all teaching staffs who prepare this wonderful course.   I also thank to many classmates who help me through the homework.


Postscript at 2017 April

After I wrote this review, Coursera had since upgraded to the new format.  It's a pity none of the NLP classes, including Prof. Radev's survive.   To bad for NLP lovers!

There is also a seismic shift in the field of NLP toward deep learning. While deep learning does not dominate evaluations like in computer vision or speech recognition, it is perhaps the most actively researched direction right now.  So if you are curious about what's new, consider to take the latest Standford cs224n 2017 or Oxford's Deep Learning for NLP.


[2] Week 1 Lecture 1 Introduction

[3] One anecdote:  In the forum, some students was asking why you can't just sum all data points of a class together and pour into scikit-learn's fit().    I don't blame the student because she started late and lacks of prerequisite.   She later finished all assignment and I really admire her determination.

Experience in Real-Life Machine Learning

I have been refreshing myself on the general topic of machine learning.   Mostly motivated by job requirements as well as my own curiosity.   That's why you saw my review post on the famed Andrew Ng's class.   And I have been taking the Dragomir Radev's NLP class, as well as the Machine Learning Specialization by Emily Fox and Carlos Guestrin [1].   When you are at work, it's tough to learn.  But so far, I managed to learn something from each class and was able to apply them in my job.

So, one question you might ask is how applicable are on-line or even  university machine learning courses in real life?     Short answer, they are quite different. Let me try to answer this question by giving an example that come up to me recently.

It is a gender detection task based on voice.  This comes up at work and I was tasked to improve the company's existing detector.   For the majority of the my time, I tried to divide the data set, which has around 1 million data point to train/validation/test sets.   Furthermore,  from the beginning of the task I decided to create sets of dataset with increasing size.  For example, 2k, 5k, 10k..... and up to 1 million.     This simple exercise, done mostly in python, took me close to a week.

Training, aka the fun part, was comparatively short and anti-climatic.  I just chose couple of well-known methods in the field.    And test on the progressively sized data set.  Since prototyping a system is so easy,  I was able to weed out weaker methods very early and come up with a better system.    I was able to get high relative performance gain.  Before I submitted the system to my boss, I also worked out an analysis of why the system doesn't give 100%.   No surprise.  it turns out volume of the speech matters, and some individual of the world doesn't like their sex stereotypes.    But so far the tasks are still quite well-done because we get better performance as well as we know why certain things don't work well.   Those are good knowledge in practice.

One twist here, after finishing the system, I found that the method which gives the best classification performance doesn't give the best speed performance.   So I decided to choose a cheaper but still rather effective method.    It hurts my heart to see the best method wasn't used but that's the way it is sometimes.

Eventually, as one of the architects of the system, I also spent time to make sure integration is correct.   That took coding, much of it was done in C/C++/python.  Since there were couple of bugs from some existing code,  I was spending about a week to trace code with gdb.

The whole thing took me about three months.  Around 80% of my time was spent on data preparation and  coding.  Machine learning you do in class happens, but it only took me around 2 weeks to determine the best model.   I could make these 2 weeks shorter by using more cores. But compare to other tasks,  the machine learning you do in class, which is usually in the very nice form, "Here is a training set, go train and evaluate it with evaluation set.",  seldom appears in real-life.  Most of the time, you are the one who prepare the training and evaluation set.

So if you happen to work on machine learning, do expect to work on tasks such as web crawling and scraping if you work on text processing,  listen to thousands of waveforms if you work on speech or music processing,  watch videos that you might not like to watch if you try to classify videos.   That's machine learning in real-life.   If you happen to be also the one who decide which algorithm to use, yes, you will have some fun.   If you happen to design a new algorithm. then you will have a lot of fun.  But most of the time, practitioners need to handle issues, which can just be .... mundane.   Tasks such as web crawling, is certainly not as enjoyable as to apply advanced mathematics to a problem.   But they are incredibly important and they will take up most of time of your or your organization as a whole.

Perhaps that's why you heard of the term "data munging"  or in Bill Howe's class: "data jujitsu".   This is a well-known skill but not very advertised and unlikely to be seen as important.    But in real-life, such data processing skill is crucial.   For example, in my case, if I didn't have a progressive sized datasets, prototyping could take a long time.  And I might need to spend 4 to 5 times more experimental time to determine what the best method is.    Of course, debugging will also be slower if you only have a huge data set.

In short, data scientists and machine learning practitioners spend majority of their time as data janitors.   I think that's a well-known phenomenon for a long time.  But now as machine learning become a thing,  there are more awareness [2].  I think this is a good thing because it helps better scheduling and division of labors if you want to manage a group of engineers in a machine learning task.

[1] I might do a review at a certain point.
[2] e.g. This NYT article.

The Search Programmer

In every team of any serious ASR or NLP company, that has to be one person who is the "search guy".  Not search as in search engine, but search as in searching in AI.  The equivalent of a chess engine programmer in a chess program,  or perhaps to engine specialist for race cars.   Usually this person has three important roles:

  1. Program the engine,
  2. Add new features to the engine ,
  3. Maintain the engine through its life time.

This job is usually taken by someone who has title such as "Speech Scientist" or "Speech Engineer".   They usually have blended skills of both programming and statistics.   It's a tough job, but it's also highly satisfactory job.  Because the success of a company usually depends on whether features can be integrated quickly.   That gives the "search guy" a mythical status even among data scientist - a search engineer needs to effectively work with two teams: one with mostly research background on statistics and machine learning, the other with mostly programming background, whose job is to churn out pseudocode, implementation and architecture diagrams daily.

I tend to think the power of "search guy" is both understated and overstated.

It's understated because there are many companies which only use other people's engine.  So they couldn't quite get the edge of customizing an engine. Those which use open source implementation is better, because they preserved the right to change the engine and give them leverage on intellectual property and trade secrets.  Those who bought commercial engine from large company would enjoy good performance for few years, but then got squeezed by huge price of upgrading and constrained by overly restrictive license.

(Shameless prompotion here:  Voci is an exception.  We are very nice to our clients. Check us out at here. 🙂 )

It's overstated because the skill of programming a search is nothing but a series of logical exercises.   The pity is programming a search algorithm, or generally a dynamic program (DP) in general, takes many kinds of expertise.  The knowledge can only be sporadically found in different subjects.  Some might learn the basic of DP in an algorithmic book such as CLRS, but mere knowledge of programming doesn't give you insights on how to debug an issue of the search.  You do need to have solid understanding in the domain knowledge (such as POS tagging and speech recognition) and theory (such as machine learning) to get the job done correctly.


Patterns in ASR Coding

Many toolkits in ASR appears in the form of unix executables.   But the nature of ASR tool is quite a bit different from general unix tools.   I will name 3 here:

  1. Complexity: A flexible toolkit also demands developers to have an external scripting framework.  In SphinxTrain, it used to be glued by perl, now by python.   Kaldi, on the other hand, is mainly glued by shell script.  I heard Cambridge has its own tools to do experiment correctly.
  2. Running Time: Coding ASR is that it takes long time to verify if something is correct.   So there are things you can't do: a very agile type of development by code-and-test doesn't work well.   I have seen people implemented, but it leaves so many bugs in the codebase.
  3. Numerical Issues: Another issue is that much coding in numerical algorithm could cause subtle changes of the results, it is tricky to code these changes well.  When these changes penetrated to production, it is usually very hard to debug.  When such changes affect performance, the result could be disastrous to you and your clients.

In a nutshell, we are dealing with a piece of software which is complex and mission-critical.  The issue is how do you continue develop and maintain such software.

In this article, I will talk about how this kind of coding can be done right.   You should notice that I don't favor a monolithic design of experimental tools.   e.g. "why don't we just write one single tool that does everything (to train/to decode)?"  There is a place of those mindsets in software engineering. e.g. Mercuria is designed in that way and I heard it is very competitive to GIT.   But I prefer a Unix-tool type of design which is closed to HTK, Sphinx, Kaldi.  i.e.  you write many tools and each of them has different purposes. You then simply glue them together for your own purpose.  I will call all the code changes in these little unix tools as code-level changes.  While changes in the scripting level simply as script-level changes.

Many of these thought are taught to me by experienced people in the field.   Some can be applicable in other fields: such as Think Before Code, Conclude from your Test.  Other can be applied to machine-learning specific problem: Match Results Numerically, Always Record Results.

Think Before Code

In our time, the agile development paradigm is very popular.  May be too popular, in my view.  Agile development is being deployed in too many places which I think inappropriate.  ASR is one of them.

As a coder in ASR, what you usually do are two things: making code-level changes (in C/C++/Java) or script-level changes (in Perl/Python).  In a nutshell, you are doing programming in a complex piece of software.   Since testing could take a long time.  Code-and-test type paradigm won't work for you too well.

On the other hand, deliberate-and-slow thinking is your first line of defense for any potential issues.  You should ask yourself couple of questions before any changes:

  1. Do you understand the purpose each of the tools in your script?
  2. Do you understand the underlying principle of the tool?
  3. Do you understand the I/O?
  4. Would you expect that any changes would change the I/O at all?
  5. For each tool, do you understand the code?
  6. What is your change?
  7. Where are your changes?  How many things you need to change? (10 files, 100 files? List them out.)
  8. In your head, after you make the change, do you expect your change will work? Why?  Convince yourself.

These are some of the questions you should ask yourself.  Granted, you don't have to all answers, but the more you know, you would reduce any potential future issues .

Conclude from your Tests, not from your Head

After all the thinking, are we done? No, you should still test your code, in fact you should test your code like a professional tester.  Bombard your well-thought out program with test.   Fix all warnings from compilers, valgrind it to fix leaks.   If you don't fix a certain thing, make sure you have a very very good reason. Because any changes in your decoder and trainer could have many ramifications to upper-layer of software, to you and to your colleagues.

The worst way to think about ASR coding is to say "it should work!".  No.  Sometimes, it doesn't. You are too naive for not testing the code.

Who makes such mistakes? It is hard to nail it down. My observation is that those who always try to think through any problems in their head and have strong conviction that they are right.    They are usually fresh grads (all kinds, Bachelors? Masters? PhDs? They are everywhere.)  Or people who only work on research and hadn't done real-life coding that much.  In a nutshell, it is a "philosophy"-thing.  Some people tend to think their thought apriori will work as it is.   This is a 8-th century thinking.  Always verify your changes with experiments.

Also. No one say, testing always eliminate all problems.  But if you think and test.  The chances of making mistakes will be tremendously reduced.

Scale It Down

The issue about large amount of testing in ASR it that it takes a long time.   So what should you do?

Scale it down.

e.g. Suppose you have 1000 utterance test, you want to reduce the testing time.  Make it a 100 utterance test, or even 10.  That allows you to verify your change quickly.

e.g. If you have an issue appears in 1 min utterance, try to see if you can repeat the same issue on a 6 second one.

e.g. If you are trying a procedure for 1000 hour of data, try to test it with 100 hour first.

These are just some examples.  This is a very important paradigm because it allows you to move on with your work faster.

Match Results Numerically

If you make an innocuous change, but the results are slightly different.  You should be very worried.

The first question you should ask is "How can this happen at all?" For example, let's say if you add a command-line option, your decoding results shouldn't change.

Are there any implicit or explicit random number generators in the code?  Or have you accidentally take in users' input?  Or else, how come your innocuous change would cause changes in results?

Be wearied about any one who say "It is just a small change.  Who cares? The results won't change." No, always question the size of the changes.   Ask for how many significant digits are match if there are any difference.   If you could try to learn more about intrinsic error introduced by floating point calculation.  (e.g. "What Every Computer Scientist Should Know About Floating Point Calculation" is a good start.)

There is another opposing thought: i.e. It should be okay to have some numerical changes.  I don't really buy it because once you allow yourself to drift 0.1% 10 times, you will have a 1% drift which can't be explained.  The only times you should let yourself go is you encountered randomness you can't control.  Even in those cases, you should still explain why your performance would change.

Predict before Change

Do you expect your changes would give better results?  Or worse results?  Can you explain to yourself why your change could be good/bad?

In terms of results, we are talking about mainly 3 things :  word-error-rate, speed and usage of memory.

Setup an Experimental Framework

If you are anyone serious about ML or ASR, you should have tested your code many times.  If you have tested your code many times, you will realize you can't use your brain to manipulate all your experiments.  You need a system.

I have written an article in V1 about this subject.  In a nutshell, make sure you can repeat/copy/record all your experimental detail including versions of binary, parameters.

Record your Work

With complexity of your work, you should make sure you keep enough documentation.  Here are some ideas:

  1. Version Control System : for your code
  2. Bug tracking : for your bugs and feature requests
  3. Planning document: for what you need to do in a certain task
  4. Progress Note: record in a daily basis on what you have done/learned experimentally.

Yes, you should have many records by now.  If you don't have any, I feel worried about you.  Chances are some important experimental details were forgotten.  Or if you don't see what you are doing is an experiment...... Woa.  I wonder how you explain what you do to other people.


That's what I have today.  This article summarizes many important concepts on how to maximize your success of doing any coding changes.    Some of these are habits which take time to setup and get used to.   Though from my experience, these habits are invaluable.  I found myself writing features which have less problems.  Or at least when there are problems, they are problems I hadn't and couldn't anticipate.







Thursday Links (FuzzBuzz programming, Samsung, Amazon and more)


Placebo Surgery : Still think acupuncture is a thing?

Expertise, the Death of Fun, and What to Do About It by James Hague

Indeed, it got hard to learn.  My two cents: always keep notes on your work.  See every mistakes as an opportunity to learn.   And always learn new things, never stop.

FizzBuzz programming (2007)

It's sad that it is true.

Technology in general:

Samsung smartwatch product

I still look for the Apple's product more.   I guess I was there when iPhone came out, it's rather hard to not say Samsung plagiarize.......

The Economics of Amazon Prime (link)

When I go to Amazon, using Prime has indeed became an option,  especially for the thousand ebook which cause less than $2.99.   Buying ten of them is very close to the monthly subscription fee of Amazon Prime.

Starbucks and Square don't seem to "mix" well (link)

Other newsworthy:

As Crop Prices Surge, Investment Firms and Farmers Vie for Land

Crop has reversed its course,  if you are interested in restaurants business (like me), this has a huge impact of the whole food chain.

The many failures of the personal finance industry

Many geeky friends of mine are not making good sense in personal finance.  This is a good link to understand the industry.


Me and CMU Sphinx

As I update this blog more frequently, I noticed more and more people are directed to here.   Naturally,  there are many questions about some work in my past.   For example, "Are you still answering questions in CMUSphinx forum?"  and generally requests to have certain tutorial.  So I guess it is time to clarify my current position and what I plan to do in future.

Yes, I am planning to work on Sphinx again but no, I probably don't hope to be a maintainer-at-large any more.   Nick proves himself to be the most awesome maintainer in our history.   Through his stewardship, Sphinx prospered in the last couple of years.  That's what I hope and that's what we all hope.    
So for that reason, you probably won't see me much in the forum, answering questions.  Rather I will spend most of my time to implement, to experiment and to get some work done. 
There are many things ought to be done in Sphinx.  Here are my top 5 list:
  1. Sphinx 4 maintenance and refactoring
  2. PocketSphinx's maintenance
  3. An HTKbook-like documentation : i.e. Hieroglyphs. 
  4. Regression tests on all tools in SphinxTrain.
  5. In general, modernization of Sphinx software, such as using WFST-based approach.
This is not a small undertaking so I am planning to spend a lot of time to relearn the software.  Yes, you hear it right.  Learning the software.  In general, I found myself very ignorant in a lot of software details of Sphinx at 2012.   There are many changes.  The parts I really catch up are probably sphinxbase, sphinx3 and SphinxTrain.   One PocketSphinx and Sphinx4, I need to learn a lot. 
That is why in this blog, you will see a lot of posts about my status of learning a certain speech recognition software.   Some could be minute details.   I share them because people can figure out a lot by going through my status.   From time to time, I will also pull these posts together and form a tutorial post. 
Before I leave, let me digress and talk about this blog a little bit: other than posts on speech recognition, I will also post a lot of things about programming, languages and other technology-related stuffs.  Part of it is that I am interested in many things.  The other part is I feel working on speech recognition actually requires one to understand a lot of programming and languages.   This might also attract a wider audience in future. 
In any case,  I hope I can keep on.  And hope you enjoy my articles!