The definitive weekly newsletter on A.I. and Deep Learning, published by Waikit Lau and Arthur Chan. Our background spans MIT, CMU, Bessemer Venture Partners, Nuance, BBN, etc. Every week, we curate and analyze the most relevant and impactful developments in A.I.
We also run Facebook’s most active A.I. group with 191,000+ members and host a weekly “office hour” on YouTube.
Editorial
Thoughts From Your Humble Curators
We have the profile of SenseTime this week in the News section this week, and we also analyze Apple’s personalized “Hey Siri” feature in the Blog section.
This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 132,000+ members and host an occasional “office hour” on YouTube. To help defray our publishing costs, you may donate via link. Or you can donate by sending Eth to this address: 0xEB44F762c58Da2200957b5cc2C04473F609eAA65. Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760
Sponsor
Comet.ml – Machine Learning Experimentation Management
Comet.ml allows tracking of machine learning experiments with an emphasis on collaboration and knowledge sharing.
It allows you to keep your current workflow and report your result from any machine – whether it’s your laptop or a GPU cluster.
It will automatically track your hyperparameters, metrics, code, model definition and allows you to easily compare and analyze models.
Just add a single line of code to your training script:
experiment = comet_ml.Experiment(api_key=”YOUR_API_KEY”)
Comet is completely free for open source projects. We also provide 100% free unlimited access for students and academics. Click here to sign up!
News
A Profile on SenseTime
On Issue #55, we reported in our Deal section that SenseTime become the biggest AI startup in the planet now. In this article from Quartz, we learn how the company started by a professor from Chinese University in Hong Kong, how their business boom from China’s lending industry.
One sentence should wow you.
SenseTime claims to have a training database of over 2 billion images
and as you know, most public database tops at around 10 million images (e.g. Imagenet has ~1.2M images. So we are talking about a factor of 100 difference.
Where does SenseTime get all the data? The Chinese government. SenseTime’s story should make us realize that China’s advantage on AI is real. Unlike European countries in U.S., China has much lax rules on privacy of data. A recent survey shows 76.3% of citizens feel that A.I. would invade their privacy. But this lack of privacy allows data to be shared more easily, which is good for ML/AI development.
Waymo applies for SDC testing in California
According to SF Chronicle, it will start testing around the Waymo’s campus first.
Blog Posts
How to Build an Image Database Quickly
We’re always a big fan of Joey Adrian Rosebrock. Here is another one of his practical and useful articles, which concern how to create an image database.
Update Google DIY Toolkits
The very popular Google Vision and Voice Toolkit is now getting an update. It’s mostly and update on hardwares, and it’s now available in Target.
How does “Hey Siri” work?
Apple ML Blog doesn’t disappoint, it gives an expose on how personalized “Hey Siri” works, which includes the architecture of the system (LSTM) and the use of curriculum learning.
Just to be clear here “personalized” means the system would only adapt to the user’s speech. Some may interpret it as you can use any keywords, but that’s normally not the case for keyword wakeup system. If it is up to users to choose, they might come up overly-short or easy to confuse words.
In fact, here is one sentence which caught our eyes:
The phrase “Hey Siri” was originally chosen to be as natural as possible;
Why? A lot of times when you build a keyword wakeup system, the designers want to choose a keyword which is complicated and difficult to be confused. Yet, Apple’s engineers choose to go against this rule, they rather follow what users like instead. This gives them a good product feature, but keep them busy to resolve bigger issues such as false alarm : As the article argued, when a user used a natural phase “Hey Siri”, other users would likely to use the same phrase as well.
Respect to the Apple team. Notice that personalized wakeup keyword is quite unique to Siri, we don’t see similar features in products such as Alexa. Also the whole speaker ID and keyword wakeup process is done on device. It makes it extra difficult to create a good system.
Member’s Question
Maximum Number of Classes
Question: What’s the maximum number of classes a classification algorithm can support ??
Answer: (By Arthur) If the objects themselves are not confusing, the answer is theoretically infinite, provided that you also have the right amount of training data for each class.
But if you want to differentiate between two objects but they are both apples, and there is no feature which you can differentiate them. Then the answer is you algorithm will only have 50% chance to come up with the right answer. This sounds funny, but it happens a lot in speech recognition where two words can have the same pronunciations.
About Us