Categories
ASR

ASR Software from Academic Research

Before Voci, I have worked in 3 types of work environment, academic institute, industrial research lab and startups (such as Speechworks and Scanscout).   There is one common thread, all environments require strong development background.   My role has always been a craftsman, under supervision of scientists, researchers, principal investigators or company owners to produce software and achieve a certain goal.   Of course, my specialty is on ASR which is always my dearest topic.

There are many things you can say about software engineering in each environment.  But in my view, producing quality software in academia is probably the toughest situation.  I am not alone.  See this post “Producing Good Software From Academia” from Prof. John Regehr.  My observations, is very similar to what Regehr suggests: career professors are very unlikely to have time to maintain a good software package.   Most Professors either take hire research programmers (guys like me) or assign these coding tasks to graduate students.

What I want to add here is both paths are difficult.   Research staffs, for example, have high mobility.  The stories go, who is who who maintain and develop a certain project source code decides to join Google/Amazon/IBM or startups.  That obviously makes sense.  Commercial companies paid way more than academic institutions.  Research staffs, just like other human being, were driven by economic laws and seek for better employment.  (Or for me, more fun.)

If you assigned the tasks to student, on the other hand, you face the problem of how to balance the load for all students.  My observation on my couple of bosses is that it is a very hard problem.  It usually results in either

  1. certain privilege student become the “golden boy/girl” in the group who doesn’t need to do any grunge work.
  2. the group become more a company than a research group: for most of the time, research was on the sideline to make the whole group survive.

One version I heard was that if you work on academic research, only have around one-forth of your time would remain as “research time”.   It is sad but it’s brutally true.

Another deeper issue is the merit system: maintaining codebase for the community is not rewarded and sometimes just unappreciated.  On the other hand, writing papers earn you accolade.   This is a misfortune of our era: software maintenance is a very important discipline. People who are willing to spend this effort should be rewarded fairly and equally for researchers.

Arthur

Categories
Debugging

How to Compile a Debugged Version of Python

For the most time, you shouldn’t need to care about the internals of python.   It is usually thought as a tool and assumed to be bug-free.

Of course, there are moments you should question these assumptions.  Sometimes, the interpreter fails itself.  It could segfault, it could be too slow.

A more common scenario is that you write C-extension for a python and things are not working.  So what do you do?  You can go to stare at the source code and hope that you find the issues.  Or you can just debug the python.   e.g. you can simply see python as a program and your script as arguments.  Most of the time, when your interpret the custom extension, it will just link it with the library you wrote.

If you want to do it right, you also want to have a debugged version of python.   So the question you have is how to compile.  Is it difficult?  The answer is that compiling python is surprisingly simple.   In fact, I think it is a very joyful activity.  (That’s just me……)

Anyway, the following is a procedure I used.  I use Python2.7.5 as an example. It is so-called the old standard.  The tip now is Python2.7.6.  I think the same procedure works.

1, Download Python Source Code

For Python2.7.5, you can do it by

wget http://www.python.org/ftp/python/2.7.5/Python-2.7.5.tgz

2, Configure Python in Debug Mode

./configure –with-pydebug –prefix ./my_installation_path/

3, Compile and Install the debug version

make OPT=’-g’

make install

Notes:

  • Remember, valgrinding the debugged version of python will generates tons of messages (because valgrind would think much memory space was not freed.)  So remember to use the right suppression file. See http://svn.python.org/projects/python/trunk/Misc/README.valgrind
  • You also need to reinstall all libraries that make your application works.
  • And remember, different version of .pyc may not be compatible with each other. So make sure you use the correct version of python to rerun your applications.

Arthur

 

Categories
Version Control

A View on Version Control Systems

Ask 10 persons about what is the best version control system, they will come up with 10 different answers.   You might hate it, but some people still think rcs and cvs is the thing.   Some believe since subversion is a drop-in replacement of cvs so it is the best.   Some told me that that VisualSourceSafe is still their tool of the choice.  “Because we are using Windows.”

No matter what people’s choice are,  version control always sticks to an organization for long long time.   What should you learn as a programmer?  My take: All of them and then use the one when appropriate.

In another words, I was an agnostic user.  So why my view is useful?

Reason Number 1: If you work in software for a while, you always hear people here and there proclaim the superiority of their favorite programming language, IDE, platform.  They give you a feeling that feel “damn good” about it.  I honestly don’t.  Programming is fun and stuffs.  But for most parts, I do it for living.  So my criterion is that it has to be practical and efficient.   Programmers who are very evangelical about their tools are great turn-off for me.   Not to say, if there are institutional or political perspective when someone tries to push a tool, it only makes everyone suffocates.

Reason Number 2: I am agnostic but I did make a choice at the end.   Hopefully this article is more objective than many of those who feel “damn good” about their choice of tools.

So this is my take and I hope this is helpful for you.  Before I say anything about any systems, I will bring up a trivial choice about version control:

Do No Version Control

“You should grow up.”

That’s what I said to many seemingly intelligent people who wrote smart algorithms but never control the source code.   They may be very experienced researchers who are in the position of having no need to write any program in their lives.   They may be very talented programmers who can write 1000 lines of code without any mistakes.   They may be actually smart and can discern small issues in 10000 lines of code with a glance.    They might simply dislike the idea of check-in.  They might feel their code is not perfect enough or they don’t want to expose themselves to any responsibility

But what they should do is to grow up.

Why?  Let me first give you some exceptions of when you can skip version control:

  1. As a programmer, you only need to work alone for all of your life.
  2. If you could, in some method, be able to record all 1000 changes in previous years in some form.  I think that’s okay, I guess it’s a bit tedious.  You are still a very organized and when there are issues you should be able to look up your notes.
  3. You can remember all your changes.   People who has eidetic memory can do such things.
  4. You only learn programming for around a year or so.  You simply don’t know better.

If these 4 exceptions applied to you, I have nothing to say.  Do whatever you like in programming.  Version control is nothing important at all.

But now, when you need to work with some other people and find it necessary to record your changes.  You have some experience and certainly realize you can’t remember every single thing like Sheldon Cooper.   (Breaking exceptions 1,2,3 and 4 already.)  Then version control becomes a necessity.

For me,  when I start to work on a decoder in Hong Kong, I found that I need to share the code.  My paper record, already filled in 6 volumes of notebooks, can no longer caught up with complexity of my work.   I certainly don’t have photographic memory.   That was when I realized version control is necessary.   Here comes my version control system : CVS

CVS

I was using CVS in the company Speechworks.  Discovering CVS helps my programming and paper writing a lot.   Many of my early papers were checked-in into a self-made CVS repository.

If your programming is very simple and you don’t expect too many changes in the directory structure.  CVS is fine.   “cvs co” checkout the code, “cvs update” update the code, “cvs commit” will check-in your code. Nice and easy.

The simplicity stops there.  Once you need to check-in a binary file. Your cvs add needs to add -kb.  Weird? Isn’t it?  But that’s what many people have been doing for decades?  How about directory? Once you add a directory to your file structure.  You can’t version control the directory.  If you want to change the directory location, you need to manually do a mv in the repository. Scary?   If there are deadlocks, you need to ask your admin to remove a special lock file residing on the directory.  Obviously you need to be lucky to see if you can connect to a cvs server.

The reason is that CVS is essentially a hacked-up of an older version control system RCS, which doesn’t even allow concurrency.  (Therefore the ‘C’ in CVS, stands for concurrency.)  RCS based its system using text files.  There are many issues with such approach.  Permission can be a problem.   But then, as CVS was thought of a dropped-in replacement of RCS.  So no big deal, a lot of people were using it.

At the time when I first start using version control, there are not many free choices.   So I went with CVS for 3.5 years.   That’s very closed to end of my employment from CMU.   That was the time, couple of us realized CVS has too many issues to move forward with.  All the small issues I mentioned can eat you up for a day.   This makes a very scary trend for developers.  Developers are either checking in too quickly.  For fear anything they’ve done was lost.  Or they refrain from check-in until things are very stable.  Both are bad.

That was the time I started to use SVN.

Subversion

Subversion, was meant to be an improvement of CVS.  In a way it is.  It solves many of the problems I mentioned about CVS.  Files can now checked in regardless of whether they are text or binary.   You can version directory, you can version file removal.    The thing I like most is that versioning become a package-based rather than a file-based business.   It reduces much confusions when you need to work with a huge package.

There are still issues.  The major one is speed.  I remember checking out a SVN project for around 1.5 year took 30 minutes.   I remember my boss yelled at me as he couldn’t check-in stuffs into subversion.   I remember the guys ended up trying to write something on SVN and make sure all the tools can be used in practice.

But the major issue is still speed.  Here is why, let’s say you are a working programmer, most of the time, your life really comes down to some suited dudes giving you 1) random tasks 2) with random length of completion 3) at a random time.   So chances are while you were assigned to complete a feature which can take 2 weeks.  Suited dudes will come by and say we need to fix a bug in the GUI today! (“And you can’t have lunch!”)    So no matter what you do, being able to manipulate the source tree as quickly as you can is extremely crucial.   My many late nights were caused by slow/broken connection to the SVN server.  Doing certain things everyday is also impossible, e.g. do a clean check-out and test.  It got stuck from time-to-time.

Branching is also discouraged.  SVN is better on this regard.  But CVS is the worst.  This ultimately makes SVN still not a perfect tool for version control.  After all when you branch in SVN, unless you are careful, the check-out time will be longer.   So of course, we end-up still having people who refrain from check-out/check-in. Despite their bosses are yelling at them, I can only offer them (and myself) sympathy.

My Denial Period

After I used subversion, destiny treated me badly: I need to return to a CVS-based environment and I cursed every moment of it.  But that was the time I heard about GIT.   I’ve been to presentations by many intelligent people.  They tried to convince me with thousands different reasons. e.g. They will tell you GIT, as a distributed system can use to mimic any centralized version control system.  They will tell you branching is such a great thing when using GIT.   They will tell you all the tools of GIT are very refined and much better than CVS and SVN.

Now strangely, there was a period of time I don’t really give too much thought on GIT at all.   The reason is a little bit subtle: once you have been through couple of version control systems,  you realized that version control is an imperfect business.  True, you can’t version control a directory in CVS.  But oh well, you don’t have start another directory all the time…….

And with a new version control system, that means there are more changes to your environment.   Quite frankly, every couple of years there would be a new system come up.  Are we sure we really have something better?

That’s what I thought.  So shame on me, for couple of years I was unconsciously against adopting GIT.   But just like many prudent programmers, my reason is well calculated.  Some people will say, “You just need to read a little bit about GIT, you will learn that it’s good“.   Well, when you are prudent, you probably have some stuffs to keep you busy! When is the time to read up?

 

GIT

My bad, but my peer GIT evangelists did a bad job too.   The truth is many programmers calculates like me before they pick up a new tool.

So let me tell you one and only one important reason why GIT is a good choice for version control.   It is a keyword I mentioned more than once in this article already:

SPEED!!!

Yes. Speed.  Speed is the ultimate reason why any CVS and SVN user would want to switch to GIT.   True, GIT can simulate a centralized system. But who cares? If system B can only simulate system A, why the hack do I care about system B at all?   The ultimate answer is speed.  GIT was first developed to improved transmission of other version control system.  So a local check-out is usually much faster than SVN and CVS.

How about check-in? Again, speed is faster in GIT.  Because GIT is using distributed version control, check-in is fast and you mostly clean your system until you push to the centralized server.   Disaster recovery is better.

How about  branching? Again, speed is faster in GIT.  Because GIT branch by creating a pointer on the tree object.  No big deal in making a branch.  You can then frequently branch.  It is enjoyable to branch.  Branching even becomes part of your routine eventually!

That’s not because GIT is good at branching, it’s because GIT is fast at branching.

So the ultimate reason of why one wants to use GIT is speed.  You can hold other point of view, but those views will make it very hard for your colleague to be convinced.  You will also make the same mistakes I made: get too used to your own version control system.

Some Final Notes

I am not the first person who advocate GIT.  Nor I would be the last one.  So the point of this article is really not about whether you should use GIT or not.

I believe the point is this kind of story is that it teaches you why technical changes in an organization are that tough.   Many people who have years of experience on the belt, end up refraining new and good changes.   The worst thing is, just like me, they are have good-intention to be stubborn.

Another point is that technical ideas are necessarily spread even if they are good.   In my case, logic and experience,  shielded me from using GIT. Chances are 10 years later when somebody told me about a new super-duper version control system, I will refrain from it too!

How do I see my own mistakes? I guess my take is that before you make a decision on using/learning a certain tools, try to use the system at least once.  Read at least one book is also useful.  e.g. I read the following books on CVS, SVN and GIT:

  1. CVS Book by Karl Fogel
  2. Version Control with Subversion by C. Michael Pilato, Ben Collins-Sussman, Brian W. Fitzpatrick
  3. Pro GIT by Scott Chacon

In any case, thanks for reading this far.

Arthur

 

 

 

Categories
ASR

Learning ASR Through Coding

In a way, speech recognition is not that different from many skills.  You need to have a lot of practice to really grasp how certain things can be done.  e.g. if you never write a Viterbi algorithm,  it’s probably hard for you to convince anybody you know the search aspect of ASR.   And if you never write an estimation algorithm, then your knowledge in training would be shaky.

Of course, this might be too hard for many people.  Who will have time to write a decoder or a trainer?  Fair enough.  I guess the next best choice is to study implementations of open source speech recognizers, try to modify them to fit your goal.   In the process, you will start to build up understanding.

Which recognizers?

Let me say one thing for learners these days: you guys are lucky.  When I tried to learn to do any ASR coding back in 2000, you have to join a certain speech lab, get a license of HTK before you can do any tracing and modification.    Now you have many choices,  HTK, Sphinx, Kaldi, Julius, RWTH recognizer, etc….. So what will be the recognizers you should learn?

I will name three of them, HTK, Sphinx and Kaldi. Why?

Why HTK?

You want to learn HTK because it has a well-designed and coherent interface.  It also has some of the best of training technology: its ML training is assumption free and take care of small issues such as silence/short-pauses, multiple pronunciations.   It has one of the sort large vocabulary MMIE training.  All of these work are very nice.

HTK also has a well-written tutorial.   If you own either the TIMIT or the RM corpora, you can usually train the whole thing following through the instruction.  While going through the tutorial, you gain valuable understanding on data structures commonly used in speech recognition.

Though I mainly worked on Sphinx,  there were around 2-3 years of life I used HTK in a day-to-day basis.   The menu itself is a good literature that can teach you a lot of things.   I believe many designers of speech recognizers actually learn from HTK source code as well.

Why Sphinx?

“Because you work on Sphinx!”  True, I am biased in this case.   But I do have a legitimate reason to like Sphinx and claim that knowledge of Sphinx is more useful.

If you compare the history of HTK and Sphinx systems development, you will notice that HTK’s very nice interface stemmed from design effort in Entropic stage. Whereas Sphinx as whole are more work from PhD students, faculties and staffs.   In another words, Sphinx tools are more “hacky” than HTK.  So as a project, you will find that Sphinx seems to be more incoherent.   e.g. there are many recognizers written in C or Java.  The system itself seems to require much learning curves.

Very true, those are weaknesses.  But one thing I like about Sphinx is that it is fertile ground for any enthusiasts to play with.   The free BSD license gives people are chance to incorporate any part of the code into their projects.  As a result, historically, there are many companies which are using Sphinx in their company code.

Before we go, you may ask “Which Sphinx?”  If you ask 5 guys from the CMU Sphinx project, they will give you 5 different answers.  But let me just offer my point of view, which I think more related to learning.  Nick, the current maintainer-at-large, and I once chat, he believed that current Sphinx project should only support triple: Sphinx4/pocketsphinx/SphinxTrain.     I support that view.  As a project, we should only support and maintain focused number of components.

Though if you are enthusiasts, I will highly recommend you to study more.  Other than the triple, you will find Sphinx2 and Sphinx3 have their own interesting parts.  Not all of them is transferred to Sphinx4 or pocketsphinx.  But they are nonetheless fun code to read.   e.g. how triphones were implemented in different sphinx?  With all computation these days, I don’t full triphone expansion works for real-time system.   I believe in that aspect, 2 and 3 are very interesting.

Why Kaldi?

I am very excited about Kaldi.  You can think of it as the “new HTK with the Sphinx license”.   The technology is strong and new.  e.g. there is a branch which has all deep-neural network-based training.  The recognizer is based on WFST.    The best, all components are in very liberal licenses.   So you can surely do many amazing things with it.

The only reason why I don’t recommend it more is that it is still relatively new.   Open source toolkits have strange lives : if they are being supported by funding, they can live forever.   If they are not, their fate is quite unpredictable.    Say MITLM toolkit, there were a year or so the maintainer left and there was no new maintainer.   I am sure during the time users will need to patch a thing or two.   It is certainly a very interesting toolkit.  (Because automatic optimization of mKN smoothing weight.)   But sometimes it’s hard to predict what will happen.

In a way, development of Kaldi is rare, someone decides to share the best technology in our time to everybody.  (WFST, SGMM, DNN are all examples.)   I can only wish the project goes on.  If I could, may be I want to contribute a thing or two.

Arthur

 

Categories
ANN ASR

“The Grand Janitor Blog V2” Started

I moved “The Grand Janitor Blog” to WordPress.   Nothing much, Blogger is simply too constraining.  I don’t like the theme.  I can’t really customize a thing.  I can’t put an ad there if I want to sell something.   So it was really annoying and it’s time to change.

But then what’s new with V2?   First of all, I might blog more about how machine learning influence speech recognition.  It’s not new that machine learning is the source of how speech recognition. It has always been like that. Many experts who work in speech recognition have deep knowledge in pattern recognition.  When you look at their papers, you can sense that they have studied a certain machine learning method in great-depth.  So they can come up with creative ideas to improve the bottom-line, which is the only thing I care.  I don’t really care the thousand APIs wrap around a certain recognizer.  I only care about the guts inside the decoder, the trainer.  Those components are what really matters but those are also components which are most misunderstood.

So why now?  It’s obvious that the latest development of DBN-DNN (the “next big thing”) is one factor.   I was told in school (10+ years ago) that GMM is the state of the art.  But things are rapidly changing, work of Prof. Hinton has given a theoretical basis for making DBN-DNN training practically feasible.   Enthusiasts, some rather sophisticated, are gather around the Kaldi forum.

For me,  as I I will describe myself as a recovering ASR programmer.   What does it mean?  It means I need to grok ASR from theory to implementation. That’s tough.  I found myself studying again, dust off my “Advanced Calculus” and try to read and think creatively text such as “Connectionist Speech Recognition A Hybrid Approach” by Bourland and Nelson. (It’s highly entertaining technical text!)  Perhaps more in the future.   But when you try to drill a certain skill in your life, there got to be a point you need to go back to the basic.   Re-think all the things you thought you know.  Re-prove all the proofs you thought you understood.    That takes time and patience but at the end it is also how you come up with new ideas.

As for the readers,  sorry for never getting back to your suggested blog messages.  You might be interested in a code trace of a certain part of Sphinx.  You might be interested in how certain parts of the program work.  I kept a list of them and probably write-up something when I have time.   No promise though;  I have been very busy.   And to be frank: everyone who works in ASR is busy.  That perhaps explain why not many actively maintained blogs in speech recognition.

Of course, I will keep on posting on other diverse topics such as programming and technology.   I am still a geek.  I don’t think anyone can change that. 🙂

In any case, feel free to connect with me and have fun with speech recognition!

Cheers

Arthur Chan, “The Grand Janitor”

Categories
deep neural network Speech Recognition the grand janitor

Future Plan for “The Grand Janitor Blog”

I have been crazily busy so blogging was rather slow for me.   Though I have a stronger and stronger feeling that my understanding is closer to the state of the art of speech recognition.   And for now, the state of the art of speech recognition, we got to talk about the whole deep neural network trend.

There is nothing conceptually new in the use of hybrid HMM-DBN-DNN.   It has been proposed under the name HMM-ANN in the past.   What is new is that there is new algorithm which allow fast training of multi-layered neural network.   It is mainly due to Hinton’s breakthrough in 2006: it suggests training a DBN-DNN can be first initialized by pretrained RBM.

I am naturally very interested in this new trend.   IBM, Microsoft and Googles’ results show that DBN-DNN is not a toy model we saw last two decades.

Well, that’s all for my excitement on DBN, I still have tons of things to learn.    Back to the “Grand Janitor Blog”,  as I had tried to improve the blog layout 4 months ago,  I got to say I feel very frustrated by Blogger and finally decide to move to WordPress.

I hope to move within the next month or so.  I will write a more proper announcement later on.

Arthur

Categories
Dragon Goldman history Microsoft MMIE oop perl Python Sphinx

Apology, Updates and Misc.

There are some questions on LinkedIn about the whereabouts of this blog.   As you may notice, I haven’t done any updates for a while.   I was crazy busy by work in Voci (Good!) and many life challenges, just like everyone.    Having a lot of fun with programming, as I am working with two of my most favorite languages – C and Python.  Life is not bad at all.

My apology to all readers though, it could be tough to blog sometimes.  Hopefully, this situation will change later this year…..

Couple of worthwhile news in ASR,  Goldman-Sach won the trial in the Dragon law suit.  There is also the VB’s piece of MS doubling up speed in their recognizer.

I don’t know how to make out of the lawsuit but only feel a bit sad.  Dragon has been the homes of many elite speech programmers/developers/researchers.  Many old-timers of speech were there.   Most of them sigh about the whole L&H fiasco.   If I were them, I would feel the same too.   In fact, once you know a bit of ASR history, you would notice that the fall of L&H gave rise to one you-know-its-name player nowadays.  So in a way, the fate of two generations of ASR guys are altered.

As for the MS piece, we are following another trend these days, which is the emergence of DBN.  Is it surprising?  Probably not, it’s rather easy to speed up neural network calculation.  (Training is harder, but that’s what DBN is strong compared to previous NN approach.)

On Sphinx, I will point out one recent bug contributed by Ricky Chan, which exposed a problem in bw’s MMIE training.   I am yet to try it but I believe Nick has already incorporated into the open-source code base.

Another items which Nick has been stressing lately is to use python, instead of perl, as the scripting language of SphinxTrain.   I think that’s a good trend.  I like perl and use one-liner, map/grep type of program a lot.  Generally though, it’s hard to find a concrete coding standard for perl.   Whereas python seems to be cleaner and naturally lead to OOP.  This is an important issue – perl programmers and perl programming style seems to be spawned from many different type of languages.   The original (bad) C programmer would fondly use globals and write functions with 10 arguments.  The original C++ programmer might expect language support on OOP but find that “it is just a hash”.   These style difference could make perl training script hard to maintain.

That’s why I like python more.  Even very bad script seems to convert itself to more maintainable script.   There is also a good pathway for python/C connect.  (Cython is probably the best.)

In any case, that’s what I have this time.  I owe all of you many articles.  Let’s see if I can write some in the near future.

Arthur

Categories
333weeks translation

Translation of “Looking forward (only 263 weeks left)”

As requested by Pranav, a good friend of Sphinx, I translated one of article “Looking forward (only 263 weeks left)” from my Chinese blog “333 weeks” (original). So here it is, enjoy!

“April was a long long month.

I spent most of my time on solving technical problems.  With great help of colleagues, we finally got all issues resolved.  I also start to put some time into new tasks.  The Boston Marathon Explosion was tough for everyone, but we kind of having closure now.  As for investment, mine is in pace with S&P.  The weather is also getting better.  Do we finally feel spring again?

I think the interesting part in April is that I spent more time in writing, may it be blogging, articles.  I wrote quite a bit even when I was busy.  I mentioned the Selection of Cumulomaniac.  At this
stage, I am copyediting and proofreading the drafts.  It’s a good thing to write and blog as I love to connect with the like-minds.”

Arthur

Categories
333weeks blogging cumulomaniac

My Chinese Blogs : Cumulomaniac and 333 Weeks

Foreword

I hadn’t updated this blog for a while.   April had been a long month and the whole Boston Marathon Explosion was difficult for me.   I end up spending quite a bit of time to work on my other blogs such as Cumulomanic and 333 weeks.   If you go click them, they are all in Chinese.    In the past, it was okay, but recently there are more and more friends of mine asking me what the whole thing is about.    So it might deserves some explanation. 

Cumulomaniac

Cumulomaniac is more of my personal photography and writing blog.  From time to time, I go take pictures of clouds around Boston and shared it with my friends in Hong Kong.  You know Hong Kong?  It’s probably hard for my American friends to even start to imagine if they only watch Jackie Chan’s movie:   I used to live in a 500 square feet room with a pitiful size of bathroom and kitchen.  It’s called 500 sq feets but it feels like 300 sq feets because over the years my family has piled tons of stuffs there. 
 
The place I lived in Hong Kong, called Sham Shui Po, locate close to a flea market and a computer shopping malls.  That’s perhaps why I am in love with gadgets in the first place.  
For this context, the most important thing you should know is that there is no skyline in Hong Kong.   So it was a big change when I first to the States.  I guess there is a reason to share my friends with “my sky”.  

Startup Employee 333 Weeks 

As you might know, I am working on yet another startup, Voci with some great minds graduated from Carnegie Mellon.   When I took up the job, I decided to stay with this company for a while.   I set the time to be 333 weeks.   So the blog Startup Employee 333 weeks chronicled my story in the company.  
I chose to write in Chinese because it is yet another blog topic which was discussed to the death by American bloggers.   In Hong Kong/China though, there are still many people living in a bureaucratic system and live their lives as big companies’ employees, they might not very familiar with how “startupers” work and live.  There are also much misunderstanding from people who work in a normal traditional job on startup.  
My focus in 333 Weeks is usually project management, communication and issues when you work in a startup.   Those are what we programmers called “soft stuffs” so I seldom like to bring them up in The Grand Janitor’s Blog. 

Why Didn’t you Write Them In English?

I gave partial answers in the above paragraphs.  In general, my rule of blog writings is to make sure my message are targeted to a well-defined niche group.    The Grand Janitor is really for speech professionals while 333 weeks are written for aspiring start up guys.    So that’s pretty much sum up why you don’t see my other messages in the past?

Another (rather obvious) reason is my English.   My English writing has never quite caught up with my Chinese writing.   Don’t get me wrong.  I write English way faster than Chinese.   I also write a lot.   The issue is that I never feel I can embellish my articles with English phrases as I do with Chinese phrases.   

It changes quite a bit recently as I feel my English writing has improved.   (May be because I hanged out with a bunch of comedians lately. 🙂 )  But I still feel some topics are better to be written in a certain language.   
Hopefully this can be changed in near future.  In fact,  Pranav Jawale, a good friend of Sphinx, has recently interested in one article I wrote in 333 weeks.  And I am going to translate it soon.  
If you are interested in any articles I wrote in Chinese, feel free too tell me.  I can always translate them and put it to GJB. 
Arthur
Categories
Boston

The Boston Marathon Explosion : Afterthought

It has been a crazy week.  Lives were crazy for Bostonians …… and perhaps all Americans.   From the explosion to the capture of suspect were only 5 days.   I still feel still disoriented from the whole event.  

I feel the warmth from friends and families: there were more than 20 messages from facebook, linkedins, twitters from all over the world to ask about my situation in Boston.  They are all friends who never been to Boston so they don’t know that Copley square is a well-known shopping area and only few who are affluent enough would live there.   Saying so, I was lucky enough to decide not to return books to BPL central that day.   But I was shock by the whole thing.  Some describe it as the most devastating terrorist attach since 9-11.  I have to concur.   Even though we can’t establish a direct link between the two suspect and any terrorist organizations yet,  the event is at least be inspired by on-line instruction on how to make improvised pressure cooker bomb. 
To even now, no one could clearly explain the motives of the suspect.  Family members are giving confusing answers on psychological profiles of the suspects.   It’s hard to judge at this point and may be we should hear more from the authorities. 
My condolences to the families of all victims, to the transit police officer who died at the front-line, to all who was injured.   I sincerely hope the Boston authority can soon help us understand why the tragedy happens. 
Arthur