Category Archives: deep learning

How To Get Better At X (X = Programming, Math, etc ) ......

Here are some of my reflections on how to improve at work.

So how would you get better at X?

X = Programming

  • Trace code of smart programmers, learn their tricks,
  • Learn how to navigate codebase using your favorite editors,
  • Learn algorithm better, learn math better,
  • Join an open source project,  first contribute, then see if you can maintain,
  • Always be open to learn a new language.

X = Machine Learning

X = Reading Literature

  • Read everyday, make it a thing.
  • Browse arxiv's summary as if it more than daily news.
  • Ask questions on social networks, Plus or Twitter, listen to other people,
  • Teach people a concept, it makes you consolidate your thought and help you realize something you don't really know something.

X = Unix Administration

  • Google is your friend.
  • Listen to experienced administrator, their perspective can be very different - e.g. admin usually care about security more than you.   Listen to them and think whether your solution incorporate their thought.
  • Every time you solve a problem, put it in a notebook.  (Something which Tadashi Yonezaki at Scanscout taught me.)

X = Code Maintenance

  • Understand the code building process, see it as a part of your jobs to learn them intimately,
  • Learn multiple types of build system, learn autoconf, cmake, bazel.  Learn them,  because by knowing them you can start to compile and eventually really hack a codebase.
  • Learn version control, learn GIT.  Don't say you don't need one, it would only inhibit your speed.
  • Learn multiple types of version control systems, CVS, SVN, Mercury and GIT.  Learn why some of them are bad (CVS), some of them are better but still bad (SVN).
  • Send out a mail whenever you are making a release, make sure you communicate clearly what you plan to do.

X = Math/Theory

  • Focus on one topic.  For example, I am very interested in machine learning these days, so I am reading Bishops.
  • Don't be cheap, buy the bibles in the field.  Get Thomas Cover if you are studying information theory.   Read Serge Lang on linear algebra.
  • Solve one problem a day, may be more if you are bored and sick of raising dumbbells.
  • Re-read a formulation of a certain method.  Re-read a proof.   Look up different ways of how people formulate and prove something.
  • Rephrasing Ian Stewart - you always look silly before your supervisor.  But always remember that once you study to the graduate-level, you cannot be too stupid.   So what learning math/theory takes is gumption and perseverance.

X = Business

  • Business has mechanism so don't dismiss it as fluffy before you learn the details,
  • Listen to your BD, listen to your sales, listen to your marketing friends.   They are your important colleagues and friends

X = Communication

  • Stands on other people shoes, that is to say: be empathetic,
  • I think it's Atwood said: (rephrase) It's easy to be empathetic for people in need, but it's difficult to be empathetic for annoying and difficult people.   Ask yourself these questions,
    • Why would a person became difficult and annoying in the first place?  Do they have a reason?
    • Are you big enough to help these difficult and annoying people?   Even if they could be toxic?
  • That said, communication is a two-way street, there are indeed hopeless situation.  Take it in stride, spend your time to help friends/colleagues who are in need.

X = Anything

Learning is a life-long process, so be humble and ready to be humbled.





Learning Machine Learning - Some Personal Experience


Some context: a good friend of mine, Waikit Lau, starts a facebook group called "Deep Learning".  It is a gathering place of many deep learning enthusiasts around the globe.  And so far it is almost 400 members strong.   Waikit kindly gave me the admin right of the group; I was able to interact with all members since, and had a lot of fun.

When asked "Which topic do you like to see in "Deep Learning"?", surprisingly enough, "Learning Deep Learning" is the topic most members would like to see more.   So I decided to write a post, summarizing my own experience of learning deep learning, and machine learning in general.

My Background

Not every one could predict the advent of deep learning, neither do I.  I was trained as a specialist in automatic speech recognition (ASR), with half of the time focusing on research (at HKUST, CMU, BBN), the other half on implementation (Speechworks, CMUSphinx).   That reflects in my current role, Principal Speech Architect, which my research-to-implementation is around 50-50.    If you are being nice to me, you can say I was quite familiar with standard modeling in speech recognition,  with passable programming skills.  Perhaps what I gain from ASR, is more an understanding in languages and linguistics, which I would described as cool party tricks.  But real-life speech recognition only use little linguistic [1].

To be frank though, while ASR used a lot of machine learning techniques such as GMM, HMM, n-grams, my skills in general machine learning were clearly lacking.   For a while, I didn't have an acute sense of dangerous issues such as over- and under-fitting, nor I would able to foresee the rise of deep neural network in so many different fields.    So when my colleagues start to tell me, "Arthur, you got to check out this Microsoft's work using deep neural network!" I was mostly suspicious at the time and couldn't really fathom its importance.   Obviously I was too specialized in ASR - if I had ever give a deeper thought on "universal approximation theorem",  the rise of DNN would make a lot of sense to me.  I can only blame myself for my ignorance.

That is a long digression.  So long story short: I woke up about 4 years ago and said "screw it!" I decided to "empty my cup" and learn again.   I decided to learn everything I can learn on neural networks, and in general machine learning again.  So this article is about some of the lessons I learn.

Learning The Jargons

If you are an absolute beginner,  the best way to start is to take a good on-line class.   For example Andrew Ng's machine learning class   (my review) would be a very good place to start.   Because Ng's class is generally known to be gentle to beginners.

Ideally you want to finish the whole course,  from there you will be able to have some basic understanding on what you are doing.  For example, you want to know that "Oh, if I want to make a classifier, I need a train set and a test set; And it's absolutely wrong that they are the same".   Now this is a rather deep thought, and actually there are people I know just take short cut and use the training set as the test set.  (Bear in mind, they or their love ones suffer eventually. 🙂 )    If you don't know anything about machine learning, learning how to setup data set is the absolute minimum you want to learn.

You would also want to know some basic machine learning methods such as linear regression, logistic regression and decision tree.   Most method you will use in practice require these techniques as building blocks.  e.g.  If you don't really know logistic regression, understanding neural network would be much tougher.   If you don't understand linear classifier, understand support vector machine would be tough too.  If you have know idea what decision tree, no doubt you will confuse about random forest.

Learning basic classifiers also equipped you with intuitive understanding of core algorithms,  e.g. you will need to know stochastic gradient descent (SGD) for many things you do in DNN.

Once you go through first class, then there are two things you want to do: one is to actually work on a machine learning problem, the other is to learn more about certain techniques.  So let me split them into two sections:

How To Work On Actual Machine Learning Problems

Where Are The Problems?

If you are still in school and specialize in machine learning, chances you are funded by agency.   So more than likely you already have a task.   My suggestion for you is try to learn up your own problem as much as you can, and make sure you master all the latest techniques first, because that will help your daily job and career.

On the other hand, what if you were not major in machine learning?  For example, what if you were an experienced programmer in the first place, and now shift your attention to machine learning?  The simple answer for that is Kaggle.  Kaggle is a multi-purpose venue where you can learn and compete in machine learning.  You will also start from basic tasks such as MNIST or CIFAR-10 to first hone your skill.

Another good source of basic machine learning tasks, are tutorials of machine learning toolkits.  For example,  Theano's tutorial is my first taste on MNIST,  from there I also follow the tutorial to train up the IMDB sentiment classifier and well as polyphonic music generator.

My only criticism to Kaggle is that it lacks of the most challenging problem you can find in the field.   e.g. At the time when imagenet was not yet solved, I would hope a large scale computer vision would be hold at Kaggle.   And now when machine reading is the most acute problem, I would hope that there are tasks which every one in the world would try to tackle.

If you have my concerns, then consider other evaluations sources.  In your field, there got to be a competition or two holding every years. Join them, and make sure you gain experience from these competitions.  By far, I think it is the fastest way to learn.

Practical Matter 1 - Linux Skills

For the most part, what I found tripping many beginners are linux skills, especially software installation.    For that I would recommend you to use Ubuntu.   Many machine learning software can be installed by simple apt-get.   If you are into python, try out anaconda python, because it will save you a lot of time in software installation.

Also remember that Google is your friend.  Before you feel frustrated about a certain glitch and give up, always turn to google, paste your error message, to see if you find an answer.  Ask forums if you still can't resolve your issue.   Remember, working on machine learning requires you to have certain problem-solving skill.  So don't feel deter by small things.

Oh you ask what if you are using windows? Nah, switch to Linux, a majority of machine learning tools ran in Linux anyway.   Many people would also recommend Docker.   So far I heard both good and bad things about it.  So I can't say if I like it or not.

Practical Matter 2 - Machines

Another showstopper for many people is compute.   I will say though if you are a learner,  the computational requirement can be just a simple dual-core desktop with no GPU cards.   Remember, a lot of powerful machine learning tools are developed before GPU card became trendy.   e.g. libsvm is mostly a CPU-based software and all Theano's tutorial can be completed within a week with a decent CPU-only machine.  (I know because I did that before.)

On the other hand, if you have to do a moderate size task.  Then you should buy a decent GPU card,  a GTX980 would be a choice consumer card, for a more supported workstation grade card, Quadro series would be nice.    Of course, if you can come up with 5k, then go for a Tesla K40 or K80.   The GPU card you use directly affect your productivity.   If you know how to build a computer, consider to DIY one.  Tim Dettmer has couple of articles (e.g. here) on how to build a decent machine for deep learning.    Though you might never reach the performance of a 8-GPU card monster, you will be able to test with pleasure on all standard techniques including DNN, CNN and LSTM.

Once You Have a Taste

For the most part, your first few tasks will teach you quite a lot of machine learning.   Then the next problem you will encounter is how do you progressively improve your classifier performance.  I will address that next.

How To Learn Different Machine Learning Methods

As you might already know, there are many ways to learning machine learning.  Some will approach it mathematically and try to come up with an analysis of how a machine technique works.  That's what you will learn when you go through school training, i.e. say a 2-3 year master program, or the first 3-4 year of a PhD program.

I don't think that type of learning has anything wrong.  But machine learning is also a discipline which requires real-life experimental data to confirm your theoretical knowledge.  An overly theoretical approach would sometimes hurt your learning.   That said, you will need both practical and theoretical understanding to work well in practice.

So what should you do?  I will say machine learning should be learned through 3 aspects, they are

  1. Running the Program,
  2. Hacking the Source Code,
  3. Learning the Math (i.e. Theory).

Running the Program - A Thinking Man Guide

In my view, by far the most important skill in machine learning is to run a certain technique.    Why?  Wouldn't that the theory is important too?  Why don't we go to first derive an algorithm from the first principle, and then write our own program?

In practice, I found that starting that a top-down approach, i.e. go from theory to implementation, can work.   But most of the time, you will easily pigeonhole yourself into certain technique, and couldn't quite see the big picture of the field.

Another flaw of the top-down approach is that it assumes you would understand more from just the principle.   In practice, you might need to deal with multiple types of classifiers at work, and it's hard to understand their principle in a timely manner.    Besides, having practical experience of running will teach you aspects of the technique.   For example, have you run libsvm on a million data point, with each vector in the dimension of a thousand?   Then you will notice that type of algorithm to find support vectors makes a huge difference.   You will also appreciate why many practitioners from big companies would suggest beginners to learn random forest soon, because in practice random forest is the faster and more scalable solution.

Let me sort of bite my tongue: While it is meant to be a practice, at this stage, you should try very hard to feel and understand a certain technique.    If you are new, this is also a stage where you should ask if general principle such as bias vs variance work in your domain.

What is the mistake you can make while using a technique for beginners?    I think the biggest is you decide to run certain things without thinking why, that's detrimental to your career.    For example, many people would read a paper, pick up all techniques the author used, then rush to rerun all these experiments themselves.    While this is usually what people do in evaluation/competition, it is a big mistake in real industrial scenario.   You should always think about if a technique would work for you - "Is it accurate but too slow?",  "Is its performance good but takes up too much memory?",  "Are there any good integration route which fits to our existing codebase?"   Those are all tough questions you should answer in practice.

I hope you get an impression from me that being practical in machine learning requires a lot of thinking too.   Only when you master this aspect of knowledge, then you are ready to take up more difficult parts of our work, i.e.  changing the code, algorithm and even the theory itself.

Hacking the Source Code

I believe the more difficult task after you successfully run an experiment, is to change the algorithm itself.   Mastery of using a program perhaps ties to your general skills in Linux.   Whereas mastery of source code would tie to your coding skills in lower-level language such as C/C++/Java.

Making the source code works require you the capability to read and understand a source code base,  a valuable skill in practice.     Reading a code base requires a more specialized type of reading - you want to keep notes of a source file, make sure you understand each of the function calls, which could go many levels deep.   gdb is your friend, and your reading session should be based on both gdb and eye-balling the source code.  Setting conditional break points and display important variables.   These are the tricks.  And at the end, make sure you can spell out the big picture of the program - What does it do?  What algorithm does it implement?  Where is the important source files?   And more importantly, if I was the one who wrote the program, how would I write it?

What I said so far applies for all types of programs, for machine learning, this is a stage you should focus on just the algorithm.  e.g.  you can easily implement SGD of linear regression without understanding the math.    So why would you want to decouple math out of the process then?    The reason is that there are always multiple implementations for a same technique and each implementation can be based on slightly different theories.    Once again, chasing down the theory would take you too much time.

And do not underestimate the work required to learn the Math behind even the simplest technique in the field.   Consider just linear regression,  and consider how people have thought about it as 1) optimizing the squared loss, 2) as a maximum likelihood problem [2],  then you will notice it is not a simple topic as you learned in Ng's class.   While I love the Math, would not knowing the Math affect your daily work? Not in most circumstances.    On the other hand, that will be situations you want to just focus on implementations.    That's why decoupling theory and practice is a good thinking.

Learning The Math and The Theory

That brings us to our final stage of learning - the theory of machine learning.  Man, this is such a tough thing to learn, and I don't really do it well myself.   But I can share you some of my experience.

First thing first, as I am an advocate of bottom-up learning in machine learning, why would we want to learn any theory at all?

In my view, there are several use of theory,

  1. Simplify your practice: e.g. knowing direct method of linear regression would save you a lot of typing when implementing one using SGD.
  2. Identify BS: e.g.  You have a data set with two classes with prior 0.999:0.001, your colleague has created a classifier with 99.8% accuracy and decide he has done his job.
  3. Identify redundant idea:  someone in marketing and sales ask why can't we create more data point by squaring every elements of the data point.  You should know how to answer, "That is just polynomial regression."
  4. Have fun with theory and the underlying mathematics,
  5. Think of a new idea
  6. Brag before your colleagues and show how smart you are. 

(There is no 6.  Don't try to understand theory because you want to brag.  And for that matter, stop bragging.)

So now we establish theory can be useful.  How do you learn it?   By far I think the most important means are to listen to good lectures, reading papers, and actually do the math,

With lectures, you goal is to gather insight from experienced people.  So I would recommend the Ng's class as the first class, then Hinton's Neural Networks For Machine Learning.  I also heard Koller's class on Graphical Models are good.  If you understand Mandarin,  H. T. Lin's classes on support vector machine are perhaps the best.

On papers, subscribe to today, get an RSS feed for yourself, read at least the headlines daily to learn what's new.   That's where I first learn many of the important concepts last few years: LSTM, LSTM with attention, highway networks etc.

If you are new, check out the "Awesome resources", like Awesome Deep Learning, that's where you find all basic papers to read.

And eventually you will find that just listening to lecture and reading papers don't explain enough, this is the moment you should go to the "Bible".   When I say Bible, we are really talking about 7-8 textbook which are known to be good in the field:

If you have to start with one book, consider either Pattern Classification by Duda and Hart or  Patten Recognition and Machine Learning (PRML) by C. M.  Bishop.   (Those are the only I read deep as well.) In my view, the former is suitable for a 3rd year undergraduate or graduate students to tackle.  There are many computer exercises, so you will enjoy a lot in both math problem solving and programming.  PRML is more for advanced graduates, like a PhD.   PRML is known to be more Bayesian,  in a way, it's more modern.

And do the Math, especially for the first few chapters, where you would be frustrated by more advanced calculus problems.   Noted though, both Duda and Hard, and PRML's exercises are guided.  Try to spread out this kind of Math exercise overtime, for example, I try to spend 20-30 minutes to tackle one problem in PRML a day.  Write down all of your solutions and attempts in a note book.  You will be greatly benefited from this effort.    You will gain valuable insights of different techniques: their theory, their motivations, their implementations as well as their notable variants.

Finally, if you have tough time on the Math, don't stay on the same problem all the time.   If you can't solve a problem after a week, look it up on google, or go to standard text such as Solved Problems in Analysis.  There is no shame of looking up the answers if you had tried.


No one can hit the ground running and train a Google's "convolutional LSTM" on 80000 hours of data in one day.   Nor one can think of the very smart idea of using multiplier in a RNN. (i.e. LSTM),  using attention to do sequence-to-sequence learning, or reformulating neural network such that a very deep one is trainable.  It is hard to understand the fundamentals of concepts such as LSTM or CNN, not to say to innovate on them.

But you got start somewhere, in this article I tell you my story of how I started and restarted this learning process.   I hope you can join me in learning.   Just like all of you, I am looking forward to see what deep learning will bring to humanity.   And rest assure, you and I will enjoy the future more because we understand more behind the scene.

You might also like Learning Deep Learning - My Top Five List.



[1]  As Fred Jelinek said "Every time I fire a linguist, the performance of our speech recognition system goes up.(

Some Speculations On Why Microsoft Tay Collapsed

Microsoft's Tay, following Google AlphaGo, was meant to be yet another highly intelligent A.I. program which fulfill human's long standing dream: a machine which can truly converse.   But as you know, Tay fails spectacularly.  To me, this is a highly unusual event, part of it is that Microsoft's another conversation agent, Xiaoice, was extremely successful in China.   The other part is MSR, is one of the leading sites on using deep learning in various machine learning problems.   You would think that a major P.R. problem such as Tay confirming "Donald Trump is the hope",  and purportedly support genocide should be weeded out before launch.

As I read many posts in the past week attempted to describe why Tay fails, sadly they offer me no insights.  Some even written from respected magazines, e.g. in New Yorkers' "I’ve Seen the Greatest A.I. Minds of My Generation Destroyed by Twitter" at the end the author concluded,

"If there is a lesson to be learned, it is that consciousness wants conscience. Most consumer-tech companies have, at one time or another, launched a product before it was ready, or thought that it was equipped to do something that it ended up failing at dismally. "

While I always love the prose from New Yorkers, there is really no machine which can mimic/model  human consciousness (yet).   In fact, no one really knows how "consciousness" works, it's also tough to define what "consciousness" is.   And it's worthwhile to mention that chatbot technology is not new.   Google had released similar technology and get great press.  (See here)  So the New Yorkers' piece reflect how much the public does not understand technology.

As a result, I decided to write a Tay's postmortem myself, and offer some thoughts on why this problem could occur and how one could actively avoid such problems.

Since I try to write this piece for general audience, (say my facebook friends), the piece contains only small amount of technicalities.   If you are interested, I also list several more technical articles in the reference section.

How does a Chatbot work?  The Pre-Deep Learning Version

By now,  all of us use a chat bot or two, there is obviously Siri, which perhaps is the first program which put speech recognition and dialogue system in the national spotlight.  If you are familiar with history of computing, you would probably know ELIZA [1], which is the first example of using rule-based approach to respond to users.

What does it mean?  In such system, usually a natural language parser is used to parse human's input, then come up with an answer with some pre-defined and mostly manually rules.    It's a simple approach, but when it's done correctly.   It creates an illusion of intelligence.

Rule-base approach can go quite far.  e.g. The ALICE language is a pretty popular tool to create intelligent sounding bot. (History as shown in here.)   There are many existing tools which help programmers to create dialogue.   Programmer can also extract existing dialogues into the own system.

The problem of rule-based approach is obvious: the response is rigid.  So if someone use the system for a while, they will easily notice they are talking with a machine.  In a way, you can say the illusion can be easily dispersed by human observation.

Another issue of rule-based approach is it taxes programmers to produce a large scale chat bot.   Even with convenient languages such as AIML (ALICE Markup Language), it would take a programmer a long long time to come up with a chat-bot, not to say one which can answer a wide-variety of questions.

Converser as a Translator

Before we go on to look at chat bot in the time of deep learning.  It is important to ask how we can model conversation.   Of course, you can think of it as ... well... we first parse the sentence, generate entities and their grammatical relationships,  then based on those relationships, we come up with an answer.

This approach of decomposing a sentence to its element, is very natural to human beings.   In a way, this is also how the rule-based approach arise in the first place.  But we just discuss the weakness of rule-based approach, namely, it is hard to program and generalize.

So here is a more convenient way to think, you could simply ask,  "Hey, now I have an input sentence, what is the best response?"    It turns out this is very similar to the formulation of statistical machine translation.   "If I have an English sentence, what would be the best French translation?"    As it turns out, a converser can be built with the same principle and technology as a translator.    So all powerful technology developed for statistical machine translation (SMT) can be used on making a conversation bot.   This technology includes I.B.M. models, phrase-based models, syntax model [2]   And the training is very similar.

In fact, this is how many chat bots was made just before deep-learning arrived.    So some method simply use an existing translator to translate input-response pair.    e.g. [3]

The good thing about using a statistical approach, in particular, is that it generalizes much better than the rule-based approach.    Also, as the program is based on machine learning, all you have to do is to prepare (carefully) a bunch of training data.   Then existing machine learning program would help you come up with a system automatically.   It eases the programmer from long and tedious tweaking of the bot.

How does a Chatbot work?  The Deep Learning Version

Now given what we discuss, then how does Microsoft's chat bot Tay works?   Since we don't know Tay's implementation, we can only speculate:

  1. Tay is smart, so it doesn't sound like a purely rule-based system.  so let's assume it is based on the aforementioned "converser-as-translator" paradigm.
  2. It's Microsoft, there got to be some deep neural network.  (Microsoft is one of the first sites picked up the modern "deep" neural network" paradigm.)
  3. What's the data?  Well,  given Tay is built for millennials, the guy who train Tay must be using dialogue between teenagers.  If I research for Microsoft [4],  may be I would use data collected from Microsoft Messenger or Skype.   Since Microsoft has all the age data for all users, the data can easily be segmented and bundled into training.

So let's piece everything together.  Very likely,  Tay is a neural-network (NN)-based program which can intelligently translate an user's natural language input to a response.    The program's training is based on chat data.   So my speculation is the data is exactly where things goes wrong.   Before I conclude, the neural network in question is likely to be an Long-Short Term Model (LSTM).    I believe Google's researchers are the first advocate such approach [5] (headlined last year and the bot is known for its philosophical undertone.) Microsoft did couple of papers on how LSTM can be used to model conversation.  [6].    There are also several existing bot building software on line e.g. Andrej Karpathy 's char-RNN.    So it's likely that Tay is based on such approach. [7]


What goes wrong then?

Oh well, given that Tay is just a machine learning program.  Her behavior is really governed by the training material.   Since the training data is likely to be chat data, we can only conclude the data must contain some offensive speech, given the political landscape of the world.   So one reasonable hypothesis is the researcher who prepares the training material hadn't really filter out topics related to hate speech and sensitive topics.    I guess one potential explanation of not doing that is that filtering would reduce the amount of training data.     But then given the data owned by Microsoft,  it doesn't make sense.  Say 20% of 1 billion conversation is still a 200 million, which is more than enough to train a good chatterbot.  So I tend to think the issue is a human oversight. 

And then, as a simple fix,  you can also give the robots a list of keywords, e.g. you can just program  a simple regular expression match of "Hitler",  then make sure there is a special rule to respond the user with  "No comment".   At least the consequence wouldn't be as huge as a take down.     That again, it's another indication that there are oversights in the development.   You only need to spend more time in testing the program, this kind of issues would be noticed and rooted out.


In this piece, I come up with couple of hypothesis why Microsoft Tay fails.   At the end, I echo with the title of New Yorker's piece: "I’ve Seen the Greatest A.I. Minds of My Generation Destroyed by Twitter" .... at least partially. Tay is perhaps one of the smartest chatter bots, backed by one of the strongest research organization in the world, trained by tons of data. But it is not destroyed by Twitter or trolls. More likely, it is destroyed by human oversights and lack of testing. In this sense, it's failure is not too different from why many software fails.


[1] Weizenbaum, Joseph "ELIZA—A Computer Program For the Study of Natural Language Communication Between Man And Machine", Communications of the ACM 9 (1): 36–45,

[2] Philip Koehn, Statistical Machine Translation

[3] Alan Ritter, Colin Cherry, and William Dolan. 2011. Data-driven response generation in social media. In Proc. of EMNLP, pages 583–593. Association for Computational Linguistics.

[4] Woa! I could only dream! But I prefer to work on speech recognition, instead of chatterbot.

[5] Oriol Vinyal, Le Quoc, A Neural Conversational Model.

[6] Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan, A Diversity-Promoting Objective Function for Neural Conversation Models

[7] A more technical point here: Using LSTM, a type of recurrent neural network (RNN), also resolved one issue of the classical models such as IBM models because the language model is usually n-gram which has limited long-range prediction capability.

Me and My Machines

IMG_4054I wrote a page long time ago about the machines I used.   When you work with computing for a while, every one of your computers mean something to you.    That's why I  tried not to throw them away easily. Occasionally I also bought scrape computers, fixed them up and felt like I did a good thing for the planet.

Anyway, here is a list of machine I used.  Some with more stories than the others:

  1. A 286 (1991-1992?) : The first computer I ever touch back in junior high school.  There was a geeky senior dude tried to teach us the basic of database and none of us really understand him. He wasn't nice to us, who were like 12-13 years old.  I disliked his attitude and called him out.   He was so unhappy and stormed out the computer room.   We eventually learn stuffs like LOGO, and basic DOS commands on these very slow 286. (Well, you can optimize the hell of them though.)
  2. A 486-66DX  (1994-1996?):  My first computer and I had it since high school.  I wasn't very into computer at that time. I used it to play Tie-Fighter, and wrote documents using Words.  I also did several assignments on microprocessor programming (i.e. basic Assembly stuffs).   It was incredibly slow and it takes a long time to compile a Visual C++ backbone windows program.   Later, I gave it to a girl and she just threw the whole thing away.   (Shame on me. I threw away a relic of computer history.)
  3. A P166 "Mars" (1996-2000): I bought this when I am second year in College.   Since I spent most of my money on this machine, I was doing part-time during my degree.    And I was finally able to do some interesting stuffs on computer such as GUI programming.   The GUI programming stuffs makes me get a good contract from librarian who tries to develop cataloging software.   I also wrote my first isolated word speech recognizer on it.    Later I ran a speech recognizer written by a guy named Ricky Chan.    The recognizer was then used in my final year project.   Unfortunately, both the cataloging software and  my final year project were disasters:  I didn't know how to fix memory leaks in C/C++ at that point.   All my programs died horribly.   Good Ricky Chan has nothing to do with it.  It's all my fault. But, the horror of Windows 95's blue screen still haunt me even these days.  Of course, both the librarian and my then-boss saw me at very dim light.  (They probably still do.)  I was cleaning my basement this year and Mars was getting too dirty.  So I painfully threw it away with tears in my eyes.
  4. A P500 "Jupiter" (2000-):  I bought this in my first year of graduate school, half a year after I started to receive stipends.    This is the moment I was very into HTK (Hidden Markov Toolkit).  I still kept Mars, but if you want to train HMM for connected digit recognition using TIDIGITS, my P166 with 16Mb will take me close to a week.   My P500 though allows me to run TIMIT and I was even able to train triphones (Woo!) .    I also gleefully run every steps from the HTK manual V2.2 even though I had no idea what I was doing.   Jupiter was also the machine I wrote the modified Viterbi algorithm in my thesis (formally Frame-Skipping Viterbi Algorithm (FSVA)).  I still keep the mid-frame body of the "Jupiter" but I think it wasn't working well since around 6 years ago.
  5. A Book Mini-PC (2000): In between Mars and Jupiter, I bought a mini-form PC.  I tried to install Red Hat Linux on it, but I was very bad at any Linux installation then.   Eventually the mother board was burned and I gave it to my friend who claim to know how to fix motherboard.    (He never got back to me.)
  6. "eea045" (2000-2003):  It is a lab machine I used back in HKUST,  it was first a Pentium 500MHz, but soon my boss upgraded it to 1.7GHz.   I was jubilant to use it to run acoustic model training, I also ran most of my theses' experiments on it.
  7. A Toshiba laptop (2002) My mom gave it to me because she said it's not running too well.  It dies on me right at the day I was going to present my Master Thesis.   Luckily, someone helps me to borrow a machine from the EEE department so now I am a happy Master.
  8. "Snoopy" (2001-2003): I was then a Junior Speech Scientist at Speechworks. And this Pentium 500 was assigned to me.   It is also the first of the four machines I used with funny names.
  9. "Grandpa" (2001-2003): The laptop assigned to me in Speechworks.   It solved a lot of funny crises for me.   I really missed "Grandpa" when I was laid off from Speechworks.
  10. iBuddie 4 A928 (2002-2003):  A thing called desknote at the time,  it's like a laptop but you always have to give it juice.   Again, its motherboard burnt.  And again, I don't quite know how to fix it.
  11. "Lumpy" (2003-2006): This is the machine assigned to me from CMU SCS,  and I asked the admin many times if the name is some kind of very profound joke.  "No" is their answer.  But I always know it's a setup. 😐  Always know.
  12. "Grumpy"/"Big Baby" (2003-): This is a Dell Inspiron 9100 I bought in a hefty price of $3000.  Even at 2004, it was a heavy laptop.   I used it for most of my CMU work, including hacking Sphinx, writing papers.    Prof.  Alex Rudnicky, my then-boss in CMU, always jokingly asked me if Big Baby is a dock station.   (Seriously, No.)   I also used it as my temporary laptop in Scanscout.   The laptop is so large and heavy, I used it as my dumbbells in Scanscout.
  13. "The Cartoon Network"(2003-2006): This is the name of cluster in CMU Sphinx Group which is used by many students from the Robust Group, by me and David Huggins Daines, Alex's student, as well as Evandro, who was then working for Prof. Jack Mostow.  The names of the machines were all based on cartoon characters from Cartoon networks:  for example, Blossoms,  Bubbles and Buttercups are three 2G Hz machines which were not too reliable.   I have been asking Alex to name one of the machines to be Mojo Jojo.  But he keeps on refusing me.  (Why? Why Alex?)
  14. A G4 (2004-2006) This is the first Mac I ever used in my life but it's one of the most important.   I used it to develop for a project called CALO (Cognitive Agent that Learn and Organize), now venerable because several SRI participants started an engine which nowadays called Siri.   But what I learned is simpler:  Apple would grow big, since then I invested on Apple regularly, with reasonable profit.
  15. A Lenovo laptop (2007-2008):  In my short stay at Scanscout,  I used this machine exclusively to compile and develop what then called the SSFramework ("ScanScout Framework"), a java-Tomcat stack which Scanscout used to serve video ad.   I ghosted it to have two partitions: Windows and Linux.   I mostly worked on Windows.  At that point, I always have small issues here and there to switch back to Linux.  Usually, the very versatile tech guru Dr. Tadashi Yonezaki would help me. Dr. Yonezaki later became the Chief Scientist of Scanscout.
  16. "Scanscouts' Machines" (2007-2008): I can't quite remember how the setting is, but all machines from early Scanscouts were shared by core technology scientists, like Tadashi or me, and several developers, QAs.   I wasn't too into "The Scout" (how couple of early Alumi called it).   So I left the company after only 1.5 years.   A good ending though: Scanscout was later acquired by Tremor Video and got listed.
  17. Inspiron 530 "Inspirie" (2008 - ): There was around half a year of time when I resigned from Scanscout, I was unemployed.   I stayed home most of the time, read a lot and played tons of poker and backgammon on-line.  That was also the time I bought Inspirie.   For long time, it wasn't doing much other than being a home media center.    Last few years though, Inspirie played an important role as I tried to learn deep learning.   I ran all Theano's tutorial on it (despite it being very very slow).
  18. Machines I used in a S&P 500 company (2009-2011): Between "The Scout" and Voci, I was hired by a mid-size research institute as a Staff Scientist, and took care much of the experimental work within the group.   It's a tough job, has long hours and so my mind usually get very numb.   I can only vaguely remember there are around 3 incidences of my terminal were broken.    That was also the time I was routinely using around 200 to 300 cores, which my guess is around 10-15% of all cores available within the department.   I was always told to tone down usage.  Since there are couple of guys in the department were exactly like me, recklessly sending jobs to the queue,  the admin decides to have a scheme which limit the amount of cores we could use.
  19. A 2011 Macbook Pro 17 inches "Macky" (2011 - After several years of saving, I finally bought my first Macbook.   I LOVE IT SO MUCH! It was also the first time since many years I feel computing is fun.  I wrote several blogs, several little games with Macky but mostly it was the machine I carried around.   Unfortunately, a horrible person poured tea on top of it.   So its display was permanently broken, I have to connect it with an LCD all the time.   But it is still the machine I love most.  Because it makes me love computing again.
  20. "IBM P2.8 4 cores" (2011-) A machine assigned to me by Voci. Most of my recent work on the Voci's speech recognition framework was done on it.
  21. "Machines from Voci" (2011-) They are fun machines.  Part of it is due to the rise of GPUs.  Unfortunately I can't talk about theirs settings too much. Let's say Voci has been doing great work with them.
  22. "A 13 inches MacBook" (2014-) This is my current laptop.   I took most of my Cousera classes with it.    I feel great about its size and easy-goingness.
  23. "An HP Stream" (2015-) My current Windows machine.  I hate Windows but you got to use it sometimes. A $200 price tag seems about right.
  24. "Dell M70" and "HP Pavilion dv2000" (2015-) i.e. The machine you saw in the image up top of this post.   I bought each of them for less than $10 from Goodwill.   Both of them have no problem in operation, but small physical issues such as dent and broken hinges.   A screwdriver and some electric tape would fix them easily.

There you have it.  The 24 sets of machines I have touched.  Mostly a history of story of some unknown silicons, but also my personal perspective on computing.


(Edit at Dec 24: Fixed some typos.)