Full Bio

 





My personal mission statement: apply human language technology and machine learning to improve everyone’s life.

Some highlights of my career:
• Architect of Voci’s speech recognition engine since 2012.
• Ex-maintainer of open source CMU Sphinx from 2004 to 2006.
• Early employees of two startups, Scanscout (#4) and Voci (#9). Scanscout was acquired by Tremor Video at 2010.
• Research staff at Raytheon BBN, Speechworks (now part of Nuance), Scanscout (now part of Tremor Video) and Voci.
• Coauthor of one of the best papers in a prestigious international conference.

Skills:

Programming: C, Perl, Python, C++, Java. With working knowledge of web/iOS programming.

Speech Recognition-Related:

  • Architecture of Speech Recognition System (Decoder+Trainer), Very Fast Speech Recognition, Keyword Spotting, Speech-based Topic/Language/Emotion/Gender Classification, Robust Speech Recognition.
  • Sphinx (2,3,4 and pocketsphinx), Kaldi, HTK, Julius, Speechworks (<OSR 2.0), Byblos, CMULM, SRILM, MITLM.

Deep Learning-Related:

  • Speech recognition: source-code level handling of two major deep learning toolkits in speech recognition and language modeling.
  • Image classification: DNN, CNN, RNN and variants, trained systems on MNIST, CIFAR-10 and CIFAR-100. Also transfer learning.
  • Art and deep learning: style-transfer,
    Language generation: char-rnn style training,
  • Administration: knowledgeable in setup and install multiple deep learning tools.
  • Theano, Tensorflow, (Keras), Torch, neon, scikit-learn, libsvm, pandas, Dato/Turi’s graphlab

General Machine Learning-Related: Application Of Machine Learning Algorithms (Regression, SVM, GMM), Sentiment classification, Information retrieval, deep learning.

Soft Skills: Startup, MVP building, Speech Applications, Speech Analytics, Business Use of Speech Recognition and Machine Learning (in particular, for startup scenario), Open Source Speech Recognition.