Issue 45 January 27th 2018

Editorial

Thoughts From Your Humble Curators

We are back! The big news this week is perhaps Prof LeCun is stepping down as the chief of Facebook A.I. Research (FAIR). More in the news section.

We also have a bunch of interesting content in our blog section. e.g. Arthur’s review of Course 4 of deeplearning.ai. Course 4 focuses on image classification as an application of deep learning. Arthur will walk through how it compares with an existing class such as cs231n.

Then, in our paper section, we present a read on the classic paper, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”.

As always, if you like our newsletter, feel free to forward it to your friends/colleagues!

This newsletter is a labor of love from us. All publishing costs and operating expenses are paid out of our pockets. If you like what we do, you can help defray our costs by sending a donation via link. For crypto enthusiasts, you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65.

Artificial Intelligence and Deep Learning Weekly

A New Satellite of AIDL : Automatic Speech Recognition & Deep Learning (ASRDL)

We have a lot of demand from AIDL to start a new specialized group just on speech recognition. ASR is a really hot space lately. Our new group ASRDL will be a great place for you to learn more about the latest and greatest, and join the discussions.

facebook.com

News

Prof. Lecun Stepping Down as FB Head of A.I. Research

Prof. LeCun is stepping down as the chief of FAIR. His replacement would be Jérôme Pesenti, former CEO of AI startup BenevolentTech. Presenti would be directly report to the CTO of Facebook, Mike Schroepfer. According the Quatz’s article, LeCun would still decide research directions of FAIR, but day-to-day operations would report up to Presenti.

What do we make of the event? We this might be a necessary change as A.I goes from research to being applied across all Facebook products. Facebook has been relying on A.I. on various aspects of the company’s operation, e.g. news ranking, face tagging and translation. Yet a large A.I. machinery also require a large software team to maintain. We speculate that Mr. Presenti might be a better choice for this next stage of evolution.

qz.com

Blog Posts

Review of deeplearning.ai Course 4

This is Arthur’s review on Course 4 of deeplearning.ai which is about ConvNet.

thegrandjanitor.com

PyTorch is now One Year Old

Since PyTorch is released, it has been a favorite of researchers. Let’s all say “Happy Birthday” to the framework?

pytorch.org

Normalizing Flows

If you are interested in creating complex distributions based on simple Gaussian distribution using normalizing flow, this is a great tutorial for you.

evjang.com

Open Source

Facebook’s Dectectron

From the respected Ross Girshick: Facebook now open source detectron which includes two techniques – the first on MaskRCNN algorithm for segmentation, then on the the technique from “Focal Loss for Dense Object Detection” by T.Y. Lin et al on image detection.

facebook.com

DeepLeague

Forget about the big word such as “leveraging” in the article. DeepLeague is a software which predict bounding boxes from the mini-map of LoL. Also with 100k labeled image released.

medium.com

Video

Amazon Go

Here is a video by CNBCTech on the new store of Amazon Go.

facebook.com

Paper/Thesis Review

“Deep Neural Networks for Acoustic Modeling in Speech Recognition” by Hinton et al

This is the now-classic paper in deep learning, which was for the first time people confirmed that deep learning can improve ASR significantly. It is important in the fields of both deep learning and ASR. It’s also one of the first papers I read on deep learning back in 2012-3.
Many people know the origin of deep learning from image recognition, e.g. many beginners would tell you stories about Imagenet, Alexnet and history and so on. But then the first important application of deep learning is perhaps speech recognition.
So what’s going on with ASR before deep learning then? For the most part, if you can come up with a technique that cut a state-of-the-art system’s WER by 10%, your PhD thesis is good. If your technique can consistently beat previous techniques in multiple systems, you usually get a fairly good job in a research institute in Big 4.
The only technique which I recall to be better than 10% relative improvement are discriminative training. It got ~15% in many domains. That happens back in 2003-2004. In ASR, the term “discriminative training” has very complicated connotation. So I am not going to explain much. This just gives you the context of how powerful deep learning is.
You might be curious what “relative improvement” is. e.g. suppose your original WER is 18%, but you improve from 17%, then your relatively improvement is 1%/18% = 5.56%. So 10% improvement really means you go down to 16.2%. (Yes, ASR is that tough.)
So here comes replacing GMM with DNN. In these days, it sounds like a no-brainer. But back then, it was a huge deal. Many people in the past tried to stuff various ML technique to replace GMM. But no one can successfully beat HMM. So this is innovative.
Now then it is how GMM is setup – the ancestor of this work has to trace back to Bourlard and Morgan’s “Connectionist Speech Recognition” in which the authors tried to come up with a Context-independent HMM system by replacing VQ scores with a shallow neural network. At that time, the unit are chosen to be CI-states.
Hinton’s and perhaps Deng’s thinking are interesting: The unit was chose to be context-dependent states. Now that’s an new change, and reflect how modern HMM system is trained.
Then there is how the network is really trained. Now you can see the early DLer’s stress on using pre-training because training is very expensive at that point. (I suspect it wasn’t using GPUs).
Then there is the use of entropy to train a model. Later on, in other systems, many people just use a sentence-based entropy to do training. So in this sentence, the paper is olden.
None of these are trivial work. But the result is stellar: we are talking about 18%-33% relative gain (p.14). To ASR people, that’s unreal.
The paper also foresee some future use of DNN, such as bottleneck feature and articulatory feature. You probably know the former already. The latter is more exoteric in ASR, so I am not going to talk about much.

Anyway, that’s what I have. Enjoy the reading!

googleusercontent.com

About Us

This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 93,000+ members and host an occasional “office hour” on YouTube.

To help defray our publishing costs, you may donate via link. Or you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

Artificial Intelligence and Deep Learning Weekly

Editorial

Thoughts From Your Humble Curators

Sponsor

News

Blog Posts

Open Source

Video

Paper/Thesis Review

About Us

Leave a Reply Cancel reply