The definitive weekly newsletter on A.I. and Deep Learning, published by Waikit Lau and Arthur Chan. Our background spans MIT, CMU, Bessemer Venture Partners, Nuance, BBN, etc. Every week, we curate and analyze the most relevant and impactful developments in A.I.
We also run Facebook’s most active A.I. group with 191,000+ members and host a weekly “office hour” on YouTube.
Editorial
Thoughts From Your Humble Curators
We are back! The big news this week is perhaps Prof LeCun is stepping down as the chief of Facebook A.I. Research (FAIR). More in the news section.
We also have a bunch of interesting content in our blog section. e.g. Arthur’s review of Course 4 of deeplearning.ai. Course 4 focuses on image classification as an application of deep learning. Arthur will walk through how it compares with an existing class such as cs231n.
Then, in our paper section, we present a read on the classic paper, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”.
As always, if you like our newsletter, feel free to forward it to your friends/colleagues!
This newsletter is a labor of love from us. All publishing costs and operating expenses are paid out of our pockets. If you like what we do, you can help defray our costs by sending a donation via link. For crypto enthusiasts, you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65.
A New Satellite of AIDL : Automatic Speech Recognition & Deep Learning (ASRDL)
We have a lot of demand from AIDL to start a new specialized group just on speech recognition. ASR is a really hot space lately. Our new group ASRDL will be a great place for you to learn more about the latest and greatest, and join the discussions.
Sponsor
Video conferencing that just works in your browser. No download. No PIN code. No login
Just launch it from Cyclop.io, or add to your Slack team and launch from Slack. Works in your browser. No Download. No login. Get up and running in seconds. Multiply team’s productivity
News
Prof. Lecun Stepping Down as FB Head of A.I. Research
Prof. LeCun is stepping down as the chief of FAIR. His replacement would be Jérôme Pesenti, former CEO of AI startup BenevolentTech. Presenti would be directly report to the CTO of Facebook, Mike Schroepfer. According the Quatz’s article, LeCun would still decide research directions of FAIR, but day-to-day operations would report up to Presenti.
What do we make of the event? We this might be a necessary change as A.I goes from research to being applied across all Facebook products. Facebook has been relying on A.I. on various aspects of the company’s operation, e.g. news ranking, face tagging and translation. Yet a large A.I. machinery also require a large software team to maintain. We speculate that Mr. Presenti might be a better choice for this next stage of evolution.
Blog Posts
Review of deeplearning.ai Course 4
This is Arthur’s review on Course 4 of deeplearning.ai which is about ConvNet.
PyTorch is now One Year Old
Since PyTorch is released, it has been a favorite of researchers. Let’s all say “Happy Birthday” to the framework?
Normalizing Flows
If you are interested in creating complex distributions based on simple Gaussian distribution using normalizing flow, this is a great tutorial for you.
Open Source
Facebook’s Dectectron
From the respected Ross Girshick: Facebook now open source detectron which includes two techniques – the first on MaskRCNN algorithm for segmentation, then on the the technique from “Focal Loss for Dense Object Detection” by T.Y. Lin et al on image detection.
DeepLeague
Forget about the big word such as “leveraging” in the article. DeepLeague is a software which predict bounding boxes from the mini-map of LoL. Also with 100k labeled image released.
Video
Paper/Thesis Review
“Deep Neural Networks for Acoustic Modeling in Speech Recognition” by Hinton et al
- This is the now-classic paper in deep learning, which was for the first time people confirmed that deep learning can improve ASR significantly. It is important in the fields of both deep learning and ASR. It’s also one of the first papers I read on deep learning back in 2012-3.
- Many people know the origin of deep learning from image recognition, e.g. many beginners would tell you stories about Imagenet, Alexnet and history and so on. But then the first important application of deep learning is perhaps speech recognition.
- So what’s going on with ASR before deep learning then? For the most part, if you can come up with a technique that cut a state-of-the-art system’s WER by 10%, your PhD thesis is good. If your technique can consistently beat previous techniques in multiple systems, you usually get a fairly good job in a research institute in Big 4.
- The only technique which I recall to be better than 10% relative improvement are discriminative training. It got ~15% in many domains. That happens back in 2003-2004. In ASR, the term “discriminative training” has very complicated connotation. So I am not going to explain much. This just gives you the context of how powerful deep learning is.
- You might be curious what “relative improvement” is. e.g. suppose your original WER is 18%, but you improve from 17%, then your relatively improvement is 1%/18% = 5.56%. So 10% improvement really means you go down to 16.2%. (Yes, ASR is that tough.)
- So here comes replacing GMM with DNN. In these days, it sounds like a no-brainer. But back then, it was a huge deal. Many people in the past tried to stuff various ML technique to replace GMM. But no one can successfully beat HMM. So this is innovative.
- Now then it is how GMM is setup – the ancestor of this work has to trace back to Bourlard and Morgan’s “Connectionist Speech Recognition” in which the authors tried to come up with a Context-independent HMM system by replacing VQ scores with a shallow neural network. At that time, the unit are chosen to be CI-states.
- Hinton’s and perhaps Deng’s thinking are interesting: The unit was chose to be context-dependent states. Now that’s an new change, and reflect how modern HMM system is trained.
- Then there is how the network is really trained. Now you can see the early DLer’s stress on using pre-training because training is very expensive at that point. (I suspect it wasn’t using GPUs).
- Then there is the use of entropy to train a model. Later on, in other systems, many people just use a sentence-based entropy to do training. So in this sentence, the paper is olden.
- None of these are trivial work. But the result is stellar: we are talking about 18%-33% relative gain (p.14). To ASR people, that’s unreal.
- The paper also foresee some future use of DNN, such as bottleneck feature and articulatory feature. You probably know the former already. The latter is more exoteric in ASR, so I am not going to talk about much.
Anyway, that’s what I have. Enjoy the reading!
About Us
To help defray our publishing costs, you may donate via link. Or you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65.
Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760