Category: Uncategorized

AIDL Weekly #41 – How Everyone See Deep Learning in 2017?

Post author By grandjanitor
Post date July 8, 2019
No Comments on AIDL Weekly #41 – How Everyone See Deep Learning in 2017?

Issue 40 December 9th 2017

Editorial

Thoughts From Your Humble Curators

Last week was the week of NIPS 2017. We chose 5 links from the conferences in this issue.

The news this week is all about hardware, we point you to the new Titan V, the successor of the Titan series. Elon Musk is also teasing us the best AI hardware of the world. Let’s take a closer look.

And as you might read from elsewhere: is Google building an AI which can build another AI? Our fact-checking section will tell you more.

Finally, we cover two papers this week:

The new DeepMind paper which describes how AlphaZero become master of chess and shogi as well,
Fixing weight decay regularization in Adam.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

As always, if you like our newsletter, feel free to subscribe/forward it to your colleagues.

Artificial Intelligence and Deep Learning Weekly

News

New AI Hardware from Tesla

Elon Musk is teasing a new AI hardware from Teslas. In a non-stream conversation, he said “Jim is developing specialized AI hardware that we think will be the best in the world,”, according to one person at the event. This CNBC report actually quotes Stephen Merity, which we know has a lot of credibility.

cnbc.com

Titan V

One thing we learn from NIPS 2017. Titan X has a successor: Titan V. It now has 110 teraflops of raw computing capability, 9x of its predecessor. But it costs $3000. This is close to a low-end Tesla card such as K2000 or K4000. Jensen Huang told us this is the best card for desktop, we have no doubt.

venturebeat.com

Factchecking

On Google’s “AI Built an AI That Outperforms Any Made by Humans”

For those who are new at AIDL. AIDL has what we called “The Three Pillars of Posting”. i.e. We require members to post articles which are relevant, non-commercial and non-sensational. When a piece of news with sensationalism start to spread, admin of AIDL (in this case, Arthur) would fact-check the relevant literature and source material and decide if certain pieces should be rejected. And this time we are going to fact-check a popular yet misleading piece “AI Built an AI That Outperforms Any Made by Humans”.

The first thing to notice: “AI Built an AI That Outperforms Any Made by Humans” by a site which historically sensationalized news. The same site was involved in sensationalizing the early version of AutoML, as well as the notorious “AI learn language” fake news wave.
So what is it this time? Well, it all starts from Google’s AutoML published in May 2017 If you look at the page carefully, you will notice that it is basically just a tuning technique using reinforcement learning. At the time, The research only worked at CIFAR-10 and PennTree Banks.
But then Google’s AutoML released another version in November. The gist is that Google beat SOTA results in Coco and Imagenet. Of course, if you are a researcher, you will simply interpret it as “Oh, now automatic tuning now became a thing, it could be a staple of the latest evaluation!” The model is now distributed as NASnet.
Unfortunately, this is not how the popular outlets interpret. e.g. Sites were claiming “AI Built an AI That Outperforms Any Made by Humans”. Even more outrageous is some sites are claiming “AI is creating its own ‘AI child'”. Both claims are false. Why?
As we just said, Google’s program is an RL-based program which propose the child architecture, isn’t this parent program still built by humans? So the first statement is refutable. It is just that someone wrote a tuning program, more sophisticated it is, but still a tuning program.
And, if you are imagining “Oh AI is building itself!!” and have this imagery that AI is now self-replicating, you cannot be more wrong. Again, remember that the child architecture is used for other tasks such as image classification. These “children” doesn’t create yet another group of descendants.
A much less confusing way to put it is that “Google RL-based AI now able to tune results better than humans in some tasks.” Don’t get us wrong, this is still an exciting result, but it doesn’t give any sense of “machine is procreating itself”.

We hope this article clears up the matter. We rate the claim “AI Built an AI That Outperforms Any Made by Humans” false.

Here is the original Google’s post.

Artificial Intelligence and Deep Learning Weekly

NIPS 2017

Is Machine Learning Alchemy?

Perhaps the highlight of this NIPS is the debate between NIPS Test of Time Award winner, Al Rahimi, and deep learning demi-god Prof. Yann Lecun. What happened?

In the Award presentation, Rahim said “Machine learning has become alchemy.” This is meant to be a counter to another well-known saying from Andrew Ng: “Artificial intelligence is the new electricity.” He cuts deep into the current problem of machine learning: lacks of theoretical framework.

Prof. Lecun seems to be very upset by Rahimis’ comment. In his long Facebook post, he raised his disagreement. His main point is that historically engineering effort always precede theoretical results. As he said,

the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science.

This exchange spawns a great debate within the community. e.g. Ferenc Huszár in Alchemy, Rigour and Engineering, he raised a good point that while you can have incomplete or non-rigorous theoretically understanding, but having non-rigorous testing method like many papers is bad.

We really don’t want to take side in the debates. But let’s wrap it up with the rhyme battle between “Bored Yann LeCun”(The parody account of Prof. LeCun) and Ali Rahimi?

Rocking that alchemy // from my penthouse balcony // my empirical, lyrical modality // it’s like fine sashimi, Ali Rahimi // I choose elbow grease // over rigor police #feelthelearn

for which Ali Rahimi replied,

phat beats to the dome // like weights dropped at random // my training methodology // exposes yours’ pathology // i’m getting warmed up // take your ball and go home

youtube.com

DeepMind at NIPS 2017

Here’s a list of paper published by DeepMind at NIPS 2017.

deepmind.com

Papers for NIPS Machine Learning for Creativity and Design 2017

Shared by David Ha. Here are all the papers accepted at NIPS Creativity workshop.

github.io

NIPS live video

Here is a feed for NIPS 2017 live video.

facebook.com

Cake We All Love

Enough said

twitter.com

Blog Posts

Optimization for Deep Learning Highlights in 2017

Another great article by Sebastian Ruder, which you can see it as a sequel of his An overview of gradient descent optimization algorithms. This includes a short but concrete explanation of the latest ADAM with proper implementation of weight decay, warm restart, latest studies of generalization and more.

ruder.io

The Last 5 Years In Deep Learning

Adit Deshpande wrote a great article on the development of deep learning the last 5 years. We enjoy it a lot – not only it summarized what happened, it also gives a set of great pointers to different papers and resources.

github.io

Intelligence Augmentation

This is a brilliant article written by Shan Carter and Michael Nielsen. It explains how generative technology of AI help humans to create. Long time readers of AIDL shouldn’t find Nielsen foreign. He is the author of the very educational Neural Network and Deep Learning. Of course, he is more well-known to write the popular textbook Quantum Computation and Quantum Information.

distill.pub

Open Source

VGG Face2 Database

Here is update of the very popular VGG Face database. Pretrained model is included.

ox.ac.uk

PyTorch v0.3.0

PyTorch v0.3.0 is just released, featuring new layers and ONNX supports.

github.com

Member’s Question

How do you read Duda and Hart’s “Pattern Classification”?

Question (rephrase): I was reading the book “Pattern Classification” by Duda and Hart, but I found it difficult to follow the mathematics, what should I do?

Answer: (by Arthur) You are reading a good book – Duda and Hart is known to be one of the Bibles in the field. But perhaps is slightly beyond your skill at this point.

My suggestion is to make sure you understand basic derivations such as linear regression and perceptron. Also if you get stuck with the book for a long time, try to go through Andrew Ng’s Machine Learning. Granted the course is much easier than Duda and Hart, but you would also have the outline what you are trying to prove.

One specific advice on the derivation of NN – I recommend you to read Chapter 2 of Michael Nielsen’s book first because he is very good at defining clear notation. e.g. meaning of the letter z changes in different text books, but it is crucial to know exactly what it means to follow a derivation.

Artificial Intelligence and Deep Learning Weekly

Paper/Thesis Review

Fixing Weight Decay Regularization in Adam

Here is a read on the paper “Fixing Weight Decay Regularization in Adam”, a major correction of how weight decay with ADAM should be used together.

The key to understand this paper is that weight decay is often implemented as L2 regularization in the optimization function. And we often think that the two concepts, weight decay and L2 regularization, are the same.
But then the authors, Loshchilov and Hutter observed a very simple fact: which is if you implement weight decay through L2 regularization often reduce the effect of weight decay. So that explain why ADAM has poorer generalization power.
We will just refer you guys to Algorithm 1 and 2 on p.3. And if you follow the text you will quickly realize the past implementation was just wrong. The authors also proposed how you can fix the update to get weight decay correctly. (The green highlights.)
Reading the paper also required you to understand the idea of warm restarts. You can read it from the paper SGDR: Stochastic Gradient Descent with Warm Restarts.
The authors went ahead to show that the idea works well on Top-1 Error of CIFAR-10 and Top-5 Error of Imagenet32x32, a downsampled Imagenet. Looks good.
This sounds like a great idea. In fact, the author of ADAM, DP Kingma, thought so too. So is Jeremy Howard, the technique is already implemented in fast.ai’s code.

The original Link can be found at here.

arxiv.org

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

As you know the story, AlphaZero is not only just playing Go, and is now playing Chess and Shogi. By itself this is a significant event, because most stoa board game engine are specific to games. General game playing engines are seen as novelties but not a norm.
Another note, most Chess and Shogi engines are based on alpha-beta search. But then AlphaZero is now using Monte-Carlo Tree Search which simulate board positions. Positions are order by scores from a board NN. State is entered in the order of visit counts and value of the board according to NN. So you can see this is not just AlphaZero is beating up more games, it will be more a paradigm shift of both computer Chess and Shogi community.
As you know, AlphaZero beats the strongest program in 2016, Stockfish. But one analysis which caught my eyes: In chess, DeepMind researchers also fix the first few moves of AlphaZero so that it follows the top 12 most-play openings for black and white. If you are into chess, Queen’s Gambit, several Sicilian Defences, The French, KID. They show that AlphaZero can beat Stockfish in multiple type of situations, and openings doesn’t matter too much.
But then, would AlphaZero beat all computer players such as Shredder or Komodo? No one knows the answers yet.
One more thing: AlphaZero doesn’t assume zero knowledge neither. As Denny Britz points out in his tweet, AlphaZero was provided with perfect knowledge in terms of rules. So intriguing rules such as castling, threefold repetition or 50-move drawing rules are all provided to the machine. Perhaps Britz points out, may be we want to focus on how to let the machine to figure out the rules themselves in the future.

arxiv.org

Uncategorized

AIDL Weekly #40 – Special Issue on NIPS 2017

Post author By grandjanitor
Post date July 8, 2019
No Comments on AIDL Weekly #40 – Special Issue on NIPS 2017

Issue 40 December 9th 2017

Editorial

Thoughts From Your Humble Curators

Last week was the week of NIPS 2017. We chose 5 links from the conferences in this issue.

And as you might read from elsewhere: is Google building an AI which can build another AI? Our fact-checking section will tell you more.

Finally, we cover two papers this week:

The new DeepMind paper which describes how AlphaZero become master of chess and shogi as well,
Fixing weight decay regularization in Adam.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

As always, if you like our newsletter, feel free to subscribe/forward it to your colleagues.

Artificial Intelligence and Deep Learning Weekly

News

New AI Hardware from Tesla

cnbc.com

Titan V

venturebeat.com

Factchecking

On Google’s “AI Built an AI That Outperforms Any Made by Humans”

The first thing to notice: “AI Built an AI That Outperforms Any Made by Humans” by a site which historically sensationalized news. The same site was involved in sensationalizing the early version of AutoML, as well as the notorious “AI learn language” fake news wave.
So what is it this time? Well, it all starts from Google’s AutoML published in May 2017 If you look at the page carefully, you will notice that it is basically just a tuning technique using reinforcement learning. At the time, The research only worked at CIFAR-10 and PennTree Banks.
But then Google’s AutoML released another version in November. The gist is that Google beat SOTA results in Coco and Imagenet. Of course, if you are a researcher, you will simply interpret it as “Oh, now automatic tuning now became a thing, it could be a staple of the latest evaluation!” The model is now distributed as NASnet.
Unfortunately, this is not how the popular outlets interpret. e.g. Sites were claiming “AI Built an AI That Outperforms Any Made by Humans”. Even more outrageous is some sites are claiming “AI is creating its own ‘AI child'”. Both claims are false. Why?
As we just said, Google’s program is an RL-based program which propose the child architecture, isn’t this parent program still built by humans? So the first statement is refutable. It is just that someone wrote a tuning program, more sophisticated it is, but still a tuning program.
And, if you are imagining “Oh AI is building itself!!” and have this imagery that AI is now self-replicating, you cannot be more wrong. Again, remember that the child architecture is used for other tasks such as image classification. These “children” doesn’t create yet another group of descendants.
A much less confusing way to put it is that “Google RL-based AI now able to tune results better than humans in some tasks.” Don’t get us wrong, this is still an exciting result, but it doesn’t give any sense of “machine is procreating itself”.

We hope this article clears up the matter. We rate the claim “AI Built an AI That Outperforms Any Made by Humans” false.

Here is the original Google’s post.

Artificial Intelligence and Deep Learning Weekly

NIPS 2017

Is Machine Learning Alchemy?

Perhaps the highlight of this NIPS is the debate between NIPS Test of Time Award winner, Al Rahimi, and deep learning demi-god Prof. Yann Lecun. What happened?

the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science.

We really don’t want to take side in the debates. But let’s wrap it up with the rhyme battle between “Bored Yann LeCun”(The parody account of Prof. LeCun) and Ali Rahimi?

Rocking that alchemy // from my penthouse balcony // my empirical, lyrical modality // it’s like fine sashimi, Ali Rahimi // I choose elbow grease // over rigor police #feelthelearn

for which Ali Rahimi replied,

phat beats to the dome // like weights dropped at random // my training methodology // exposes yours’ pathology // i’m getting warmed up // take your ball and go home

youtube.com

DeepMind at NIPS 2017

Here’s a list of paper published by DeepMind at NIPS 2017.

deepmind.com

Papers for NIPS Machine Learning for Creativity and Design 2017

Shared by David Ha. Here are all the papers accepted at NIPS Creativity workshop.

github.io

NIPS live video

Here is a feed for NIPS 2017 live video.

facebook.com

Cake We All Love

Enough said

twitter.com

Blog Posts

Open Source

VGG Face2 Database

Here is update of the very popular VGG Face database. Pretrained model is included.

ox.ac.uk

PyTorch v0.3.0

PyTorch v0.3.0 is just released, featuring new layers and ONNX supports.

github.com

Member’s Question

How do you read Duda and Hart’s “Pattern Classification”?

Question (rephrase): I was reading the book “Pattern Classification” by Duda and Hart, but I found it difficult to follow the mathematics, what should I do?

Answer: (by Arthur) You are reading a good book – Duda and Hart is known to be one of the Bibles in the field. But perhaps is slightly beyond your skill at this point.

Artificial Intelligence and Deep Learning Weekly

Paper/Thesis Review

Fixing Weight Decay Regularization in Adam

Here is a read on the paper “Fixing Weight Decay Regularization in Adam”, a major correction of how weight decay with ADAM should be used together.

The key to understand this paper is that weight decay is often implemented as L2 regularization in the optimization function. And we often think that the two concepts, weight decay and L2 regularization, are the same.
But then the authors, Loshchilov and Hutter observed a very simple fact: which is if you implement weight decay through L2 regularization often reduce the effect of weight decay. So that explain why ADAM has poorer generalization power.
We will just refer you guys to Algorithm 1 and 2 on p.3. And if you follow the text you will quickly realize the past implementation was just wrong. The authors also proposed how you can fix the update to get weight decay correctly. (The green highlights.)
Reading the paper also required you to understand the idea of warm restarts. You can read it from the paper SGDR: Stochastic Gradient Descent with Warm Restarts.
The authors went ahead to show that the idea works well on Top-1 Error of CIFAR-10 and Top-5 Error of Imagenet32x32, a downsampled Imagenet. Looks good.
This sounds like a great idea. In fact, the author of ADAM, DP Kingma, thought so too. So is Jeremy Howard, the technique is already implemented in fast.ai’s code.

The original Link can be found at here.

arxiv.org

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

As you know the story, AlphaZero is not only just playing Go, and is now playing Chess and Shogi. By itself this is a significant event, because most stoa board game engine are specific to games. General game playing engines are seen as novelties but not a norm.
Another note, most Chess and Shogi engines are based on alpha-beta search. But then AlphaZero is now using Monte-Carlo Tree Search which simulate board positions. Positions are order by scores from a board NN. State is entered in the order of visit counts and value of the board according to NN. So you can see this is not just AlphaZero is beating up more games, it will be more a paradigm shift of both computer Chess and Shogi community.
As you know, AlphaZero beats the strongest program in 2016, Stockfish. But one analysis which caught my eyes: In chess, DeepMind researchers also fix the first few moves of AlphaZero so that it follows the top 12 most-play openings for black and white. If you are into chess, Queen’s Gambit, several Sicilian Defences, The French, KID. They show that AlphaZero can beat Stockfish in multiple type of situations, and openings doesn’t matter too much.
But then, would AlphaZero beat all computer players such as Shredder or Komodo? No one knows the answers yet.
One more thing: AlphaZero doesn’t assume zero knowledge neither. As Denny Britz points out in his tweet, AlphaZero was provided with perfect knowledge in terms of rules. So intriguing rules such as castling, threefold repetition or 50-move drawing rules are all provided to the machine. Perhaps Britz points out, may be we want to focus on how to let the machine to figure out the rules themselves in the future.

arxiv.org

Uncategorized

AIDL Weekly #39 – Amazon The AI Powerhouse

Post author By grandjanitor
Post date July 8, 2019
No Comments on AIDL Weekly #39 – Amazon The AI Powerhouse

Issue 39 December 1st 2017

Editorial

Thoughts From Your Humble Curators

Its AWS re:INVENT week – this is the big annual AWS conference. Amazon had several announcements on its AI offerings. So we will take a closer look in this Issue.

In our Blogs Posts section, we have a line-up of many interesting blog posts and paper reviews this issue, including:

Google Vision AIY Kit,
Stephen Merity on Understanding the Mixture of Softmaxes (MoS),
One LEGO at a time: Explaining the Math of How Neural Networks Learn,
Arthur’s Review of Course 3 of deeplearning.ai

We also present our read on the CheXnet paper which alleged beat human radiologists. We are taking a closer look of the results.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

As always, if you like our newsletter, feel free to subscribe and forward it to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

AI Powerhouse: Amazon

Verge ran a piece on Amazon DeepLens, a $250 AI-enabled camera, also SageMaker, an automatic transcription and translation tools. Of course, you might also heard of Rekognition which now gives developers access of powerful readily-made feature such as real-time video analysis. In a nutshell, you don’t have to implement YOLO yourself.

theverge.com

Blog Posts

Google AIY Vision Kit

After the Voice AIY toolkit, Google will release a new Vision AIY toolkit. You will need your own Raspberry Pi Zero, and Raspberry Camera. But the toolkit will come with VisionBonnet which has a Intel Movidius MA2450 vision processing unit.

Price: only $45. It sounds like we can all have some fun Christmas time then.

blog.google

Understanding the Mixture of Softmaxes (MoS)

In this piece, our favorite writer Stephen Merity tries to explain the latest ideas of mixture of softmaxes in language modeling by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. As always, Merity is a quality writer and he also provided source code for you to play with.

smerity.com

One LEGO at a time: Explaining the Math of How Neural Networks Learn

A great beginner’s article on back-propagation. One aspect is covered which is less well-discussed is the modularity aspect of the algorithm.

github.io

A Year After Pledging Openness, Apple Still Falls Behind On AI

This is a more critical piece on Apple’s current deep learning research. At least, in AIDL, members are divided on the merit of Apple’s research. Check it out at this thread?

buzzfeed.com

How does Boston Dynamics’ Robots work?

In this interesting conversation started by Denny Britz, he quoted Eric Jang’s speculation on Quora on how BD’s backfliping robot actually work, in which he suggest that BD is not using ML in the process.

That leads to a conversation on how/when ML would really able to learn this process automatically, and Prof Pieter Abbeel chimed in.

twitter.com

Arthur’s Full Review of deeplearning.ai Course 3: Structuring Machine Learning Projects

This is Arthur’s review on Course 3 of deeplearning.ai. He argues that this is perhaps the most important course within the specialization. See more on why in the text.

thegrandjanitor.com

Open Source

Nilearn

Nilearn is a scikit-learn-based toolkit for fast processing of neuroimaging data.

github.io

Open Images V3

OpenImages is now at V3, with 9 million URLS and 4.5 anchor boxes.

github.com

Member’s Question

Are MOOC Certificates Important?

Our thought (from Arthur): For the most part, MOOC certificates don’t mean too much in real life. It is whether you can actually solve problem matters. So the meaning of MOOC is really there to stimulate you to learn. And certificate serves as a motivation tool.

As for OP’s question. I never got the Udacity nanodegree. From what I heard though, I will say the nanodegree will require effort to take 1 to 2 Ng’s deeplearning.ai specialization. It’s also tougher if you need to take a course in a specified period of time. But the upside is there are human graders that give you feedback.

As for which path to go, I think it solely depends on your finances. Let’s push to an extreme: e.g. If you purely think of credential and opportunities, perhaps an actual PhD/Master degree will give you the most, but then the downside is multi-year salary opportunity costs. One tier down would be online ML degree from Georgia tech, but it will still cost you up to $5k. Then there is taking cs231n or cs224d from Stanford online, again that will cost you $4k/class. So that’s why you would consider to take MOOC. And as I said which price tag you choose depends on how motivate you are and how much feedbacks you want to get.

Artificial Intelligence and Deep Learning Weekly

Paper/Thesis Review

Nature ML Journal Launching in 2017

It’s online, but will it be free? We will see. Regardless, it says a lot of the importance of ML in scientific research. Though some in the community does raise concern if Nature is the best host of the new journal. Some use JMLR as an example that an academic journal could just be organized and edited by academicians themselves.

nature.com

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

This is a note on CheXNet, the paper. As you know it is the widely circulated paper from Stanford, purportedly outperform human’s performance on Chest X-ray diagnostic.

BUT, after we read it in detail, my impression is slightly different from just reading the popular news including the description on github.
Since the ML part is not very interesting. We will just briefly go through it – it’s a 121-layer Densenet, basically it means there are feed-forward connection from every previous layers. Given the data size, it’s likely a full training.
There was not much justification on the why of the architecture. Our guess: the team first transfer learning, but decide to move on to full-training to get better performance. A manageable setup would be Densenet.
Then there was fairly standard experimental comparison using AUC. In a nut shell, CheXNet did perform better than humans in every one of the 14 classes of ChestX-ray-14, which is known to be the largest of the similar databases.
Now here is the caveat popular news hadn’t mentioned:
1, First of all, humans weren’t allow to access previous medical records of a patient.
2, Only frontal images were shown to human doctors. But prior work did show when the lateral view was also shown.
That’s why on p.3 of the article, the authors note:
“We thus expect that this setup provides a conservative estimate of human radiologist performance.”

Reading so far should make you realize that may be it will still take a bit for deep learning to “replace radiologists”.

See the original discussion at AIDL-LD.

arxiv.org

Uncategorized

AIDL Weekly #38 – The FaceID Hack

Post author By grandjanitor
Post date July 8, 2019
No Comments on AIDL Weekly #38 – The FaceID Hack

Issue 38 November 17th 2017

Editorial

Thoughts From Your Humble Curators

Our main story this week is the FaceID hack. Is it a valid hack? How much should we care? We will take a closer look.

In other sections, we cover Karpathy’s “Software 2.0”, CheXnet and other topics.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

As always, if you like our newsletters, feel free to subscribe and forward it to friends!

Artificial Intelligence and Deep Learning Weekly

News

The FaceID Hack

We all learned last week, iPhoneX FaceID was purportedly hacked by a Vietnamese security company bkav. But then the more mainstream reports all suggest that we shouldn’t be too worried about the hack. So let’s take a closer look of the matter.

In this Youtube video shows a mask spoofing FaceID. The mask is a combination of a 3D mask with 2D printing of the eyes and the mouth. Later, Bkav also did a live-demo with BBC.

However, once you ask how one might this as an exploit, it’s much tougher than it looks. First of all, Bkav refused to create a new mask after the request of BBC reporters. And according to the engineers who work on it, the whole process require 9 hours and the user has to be present so that the 2D+3D mask can be adjusted.

Another part which doesn’t make sense is that Apple seemed to have tested FaceID extensively with masks as well. So, if Bkav refuses to replicate the process, it’s hard to say definitively if FaceID is hackable.

techcrunch.com

Blog Posts

Jensen Huang is Fortune 2017 Businessperson of the Year

For his role as the leader of Nvidia and its impact of artificial intelligence and deep learning. Also see the fortune.com’s write up.

nvidia.com

Software 2.0 By Andrej Karpathy

Andrej Karpathy wrote a new piece on why neural network is actually the new software. Some question whether we really reach the point where DNN can just replace programming. That’s a valid question, yet if you read the article closely, Karpathy was really arguing for using neural network as a skill to build several ML components such as ASR, CV and translation, which traditionally would require huge amount of programming effort, but now it can be significantly reduced by deep learning.

We think that Karpathy here is playing the role of futurist, much like some of his past articles, e.g. Short Story on AI in which he speculate how AI will look like when we scale up supervised learning.

medium.com

How (and why) to create a good validation set by Rachel Thomas

This is a new article by Prof. Rachel Thomas who discuss the how to create a good validation. She delves deeper than the usual “train-validation-test” set type of discussion, and ask when a random validation set might not work. We found it a thought-provoking piece.

fast.ai

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

This is a project from Stanford which shows that pneumonia detection can be done by deep learning in the level of radiologists.

The model is trained on the recently released ChestX-ray14 which has 14 types of diseases annotated for each of 110k images. The architecture is a 121-layer Densenet. The authors show that CheXNet exceed the ability of human radiologists for both specificity and sensitivity. The original paper can be found here.

github.io

Open Source

Cornell NLVR dataset

Here is a dataset opened by Cornell on using natural language to reason visually. There is a leaderboard as well.

cornell.edu

numpy is planning to drop python 2.7 support

As many deep learning scripts are using numpy, and many of you know that python library compatibility issues are really difficult to solve. So that’s why numpy dropping 2.7 support is potentially a big issue for many projects and you deserve to know.

github.com

Video

Robot Learning 2017

Here are the 8 hours feed of Robot Learning 2017 (RL 2017).

youtube.com

What’s New, Atlas?

Speaking about robotics, check out this super cool video from Boston Dynamics on the latest development of Atlas?

youtube.com

Uncategorized

AIDL Weekly Issue 37 – First Level 4 SDC, Pieter Abbeel and Raquel Urtasun

Post author By grandjanitor
Post date July 5, 2019
No Comments on AIDL Weekly Issue 37 – First Level 4 SDC, Pieter Abbeel and Raquel Urtasun

Issue 37 November 12th 2017

Editorial

Thoughts From Your Humble Curators

The biggest news last week: Waymo was putting the first Level 4 SDC on the ground. We also learned that Pieter Abbeel has left OpenAI and started his own robotic startup, Embodied Intelligence. Wired profiled Uber’s new head of their Toronto’s team, Raquel Urtasan. We cover all these pieces in our News section.

The rest of this issue should be very interesting as well: First is the new Distill article by (Chris) Olah, Mordvintsev and Schubert, which is an excellent review on visualization. Then, there is Google PhD Fellow Anirban Santara gave us his take on how to build a career in ML. AIDL-LD members, Ben Davis, gave us a nice summary of a paper on image fusion schemes. And you may feel interested in the two papers published by the Salesforce Einstein’s Lab last week, both on NNMT, which Arthur reviewed this week.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

As always, if you like our newsletter, feel free to subscribe/share the letter to your colleagues.

We will host our own show at the AI World events: “Attack of the AI Startups”. If you live in the New England area, feel free to join!

Artificial Intelligence and Deep Learning Weekly

News

First Level 4 SDC in US

This is huge. So this is the first Level 4 SDC on the road. Level 4 is commonly known as “mind off”, which means human supervision is not necessary. Deploying a Level 4 on the road shows that Waymo is confident about their technology.

Phoenix was chosen because restrictions of SDC is nada at this point. But then cities around the States is frantically changing laws so that they can be the new hub of SDC. In this rate, we can expect SDC deployment would spread across united states soon.

theverge.com

Pieter Abbeel left OpenAI, started Embodied Intelligence

We are surprised by another superstar researcher leaving OpenAI. Back in June, we learned that Andrej Karpathy join Teslas as a director of AI. This time, RL expert Pieter Abbeel left OpenAI and started his own company Embodied Intelligence.

How should we see this? There should be at least three perspectives. The first is that OpenAI wasn’t able to keep many star researchers decided. That’s perhaps because it is a non-profit institution, and funding for her seems to be scarce. Remember the web version of OpenAI gym? Researchers at OpenAI seems to give up its maintenance because of lack of resources (Also see Weekly Issue 30(http://aidl.io/issues/36#start) for more coverage.)

Then it is the question about whether one should develop AI via research or via deploying real-life production system. In the case of both Karpathy and Abbeel, they chose to come up with commercial solutions. Perhaps the difference is one decided to work for a S&P 500 company, one decided to start his own company.

As for Abbeel’s decision itself, it is certainly an interesting space he got into. Embodied Intelligence’s business model is to create solution which allow robotics arm to adapt more quickly. Their solution is, AFAWK, using VR to accelerate learning. The space though is rather hot and crowded, as Spectrum reported, there are other companies existing including Kindred.ai, Kinema Systems, and RightHand Robotics. Arguably, they lack of brilliance of Abbeel and his OpenAI’s colleagues, but then you got to ask if $7 million (as reported by NYT) for such investment-intensive application such as robotics.

wired.com

Meet Raquel Urtasun

Who is Raquel Urtasan? You might wonder. As it turns out Raquel Urtasan was an Associate Professor of University of Toronto before she joined Uber as the head of Uber’s Toronto’s SDC group. While her PhD was more on human motion models, her tenure at UoT was more focused on how to use deep learning in computer vision and SDC.

Urtasan’s philosophy of SDC is very different from the mainstreams such as Waymo’s. She advocates the use of cameras instead of the more expensive LIDAR as the main sensors of SDC. For example, she and her students have many papers on using deep learning to create coherent images as well as utilizing different type of mapping images in SDC. (See her publication page.

Coincidentally, she joins when Waymo just started their high-profile lawsuit against Uber and Levandowski, which as you know is all about whether Levandowski is stealing trade secret of Waymo on LIDAR. Performance of Uber’s SDC was also known to be lagging with high disengagement rate.

Will Urtasun succeed? It’s hard to say, LIDAR and cameras are generally complementary technologies. You can imagine sometimes you need information from one of them. So it’s very hard to say which technology will prevail. Of course, those who support to use camera as the only sensor would face another problem: LIDAR’s cost of production is reducing, would there be one day LIDAR is as commonplace as camera?

For Uber though, Urtasan’s hire is a kind of change of heart, with camera-based technology, they will not prone to compete with giants such as Waymo who had invested on LIDAR-based technology for more than 10 years. Camera-based technology could give Uber an “out”, and gives a chances for her SDC research to grow again.

wired.com

Blog Posts

Feature Visualization – A Distill Article.

If not the best, this is one of the best review of feature visualization in a convolutional neural network. For years, our go-to tutorial on visualization is usually Johnson’s lecture in cs231n. But then we never seen visualization can achieve stunning quality like the Distill authors (Olah, Mordvintsev and Schubert) did.

Notice it is not just a review article, the authors also introduce a new criterion to improve diversity of visualization. We are also happy that Distill continues to maintain high quality in its publication.

distill.pub

Andrew Ng: “Enough Papers!”

Prof. Ng reportedly said,

We have enough papers. Stop publishing, and start transforming people’s lives with technology.

Sounds like many young researchers already agree with him, just look at Andrej Karpathy? And Pieter Abbeel, who we cover this issue.

medium.com

MS or Startup in Deep Learning by Anirban Santara

At AIDL, we frequently got questions on how to start an DL and data science career. Here is the take of Google PhD Fellow Anirban Santara on whether you should take a Master or join a startup to build such career.

medium.com

Open Source

Models from NASNet

This is based on the work of the paper “Learning Transferable Architectures for Scalable Image Recognition”. The variants are trained on CIFAR-10 and Imagenet and can be used in varieties of applications.

github.com

Paper/Thesis Review

Weighted Transformer

This is one of the two papers SalesForce Einstein lab published last week. Both of them requires understanding of MT, NNMT and purely attention-based NNMT. Since this first one is not too difficult to understand, I would just give you some background on NNMT first.

When NNMT was first perceived. The original form starts with an Encoder of the text, convert it to what usually known as a “thought vector”. The thought vector will then decode by the Decoder. In the original setting, both Encoder and Decoder are usually LSTMs

Then there is the idea of attention. Well, you can think of it as more like an extra layer just on the thought vector on the decoder side. The goal is decide how much attention you want to pay on the thought vector.

Now of course, people have then played with various architecture for these Enc-Dec structure. The first to notice is that such structure usually has a giant LSTM or CNN. But notice that no one really like them! LSTM is hard to parallelize and CNN can consume a lot of memories.

That makes Google work mid of this year, “Attention is all you need” a stunning and useful result. What the authors were saying is proposing is to just use the idea of attention to create a system, they call it transformer. There are multiple tricks to get it work but perhaps the most important one is “multi-head attentions”, in a way this is like the concepts of channels in Convnet, but now instead of doing one single attention, we are now attend in multiple places. Each head will learn to attend differently.

Naturally the method is fast because you can also parallelize it, but then Google’s researchers also find it to be better in the BLEU score. That’s why top house are switching to purely attention-based method these days.

Now finally I can talk about what the Salesforce paper is about. In the original Google’s paper, representation learned by multi-attention heads are only concatenate with each other to form one “supervector” But then the authors of the paper decide to use another set of weighting. This again, further improve the performance on WMT14 by 0.4 BLEU score, which is quite significant.

einstein.ai

Non-Autoregressive Neural Machine Translation

This is the second of the two papers from Salesforce, “Non-Autoregressive Neural Machine Translation” . Unlike the “Weighted Transformer, I don’t believe it improves SOTA results. But then it introduces a cute idea into a purely attention-based NNMT, I would suggest you my previous post before you read on.

Okay. The key idea introduced in the paper is fertility. So this is to address one of the issues introduced by a purely attention-based model introduced from “Attention is all you need”. If you are doing translation, the translated word can 1) be expanded to multiple words, 2) transform to a totally different word location.

In the older world of statistical machine translation, or what we called IBM models. The latter model is called “Model 2” which decide the “absolute alignment” of source/target language pair. The former is called fertility model or “Model 3”. Of course, in the world of NNMT, these two models were thought to be obsolete. Why not just use RNN in the Encoder/Decoder structure to solve the problem?
(Btw, there are totally 5 types IBM Models. If you are into SMT, you should probably learn it up.)

But then in the world of purely attention-based NNMT, idea such as absolute alignment and fertility become important again. Because you don’t have memory within your model. So in the original “Attention is all you need” paper, there is already the thought of “positional encoding” which is to model absolute alignment.

So the new Salesforce paper actually introduces another layer which brought back fertility. Instead of just feeding the output of encoder directly into the decoder. It will feed to a fertility layer to decide if a certain word should have higher fertility first. e.g. a fertility of 2 means that it should be copied twice. 0 means the word shouldn’t be copy.

I think the cute thing about the paper is two-fold. One is that it is an obvious expansion of the whole idea of attention-based NNMT . Then there is the Socher’s group is reintroducing classical SMT idea back to NNMT.

The result though is not working as well as the standard NNMT. As you can see in Table 1. There is still some degradation using the attention-based approach. That’s perhaps why when the Google Research Blog mention the Salesforce results : it said “towards non-autoregressive translation”. It implies that the results is not yet satisfying.

einstein.ai

Medical Image Segmentation Based on Multi-Modal Convolutional Neural Network: Study on Image Fusion Schemes

This is a well-written summary by one of our members. We decide to feature it here without further comments. So check it out! Only in our Literature Discussion group.

facebook.com

Contemporary Classic

Deep Learning Architecture Diagrams

This is a now classic article on architecture diagrams of deep learning. The section which criticizes Azimov Institute’s infographics of deep learning should be a must-read for everyone.

fastml.com

Uncategorized

AIDL Weekly Issue 36 – Capsules, Capsules and Capsules

Post author By grandjanitor
Post date July 5, 2019
No Comments on AIDL Weekly Issue 36 – Capsules, Capsules and Capsules

Issue 36 November 5th 2017

Editorial

Thoughts From Your Humble Curators

We were out last week. The hottest news this week is all about Prof. Hinton’s capsule models!

Prof. Hinton and students just released an arxiv paper on how the idea of capsule can be used and specifically how it can outperform MNIST as well as its more difficult cousins, affMNIST and MultiMNIST, which had distorted MNIST with affined transform and heavy overlapping. So we dedicate this issue on capsules. We provide our own analysis in the thesis/paper review section, highlight a popular link from Wired, and cover the latest developments. So far, we already know of two implementations which attempt to repeat the result.

Other than that, check out other interesting links such as ex-Google Brain Resident David Ha’s work on evolution strategy, and our piece on “Unsupervised Machine Translation Using Monolingual Corpora”.

Join our community for real-time discussions with this iOS app here: https://itunes.apple.com/us/app/expertify/id969850760

As always, if you like our newsletter, feel free to subscribe/share the letter to your colleagues.

We will host our own show at the AI World events: “Attack of the AI Startups”. If you live in the New England area, feel free to join!

Artificial Intelligence and Deep Learning Weekly

News

Sony AIBO is back! And Powered by Deep Learning!

Deep learning is revitalizing one of the most iconic product: Sony AIBO. The new AIBO, priced at ~$1730, will have sensors and run deep learning algorithm to recognize images and sounds.

theverge.com

Course 4 of deeplearning.ai is here!

Alright, ML nerds, Course 4 of deeplearning.ai is finally here! Course 4 looks good. It’s all about image classification with Convnet, object detection and fun exercises such as transfer learning and face verification.

coursera.org

Wired’s Coverage of Hinton’s Capsules Theory

Here is a more popular account of Prof. Hinton’s capsules theory. Notice that Wired assume the OpenReview paper is from students and Hinton as well. But at this point we really don’t have any confirmation yet.

wired.com

Blog Posts

Nvidia’s Progressive Growing of GAN

Here is one very impressive results you probably saw last week – GAN is now able to generate very realistic celebrity images (trained using the CelebA-HQ database). So how does it work?

Turns out it has a lot to do how the training is done. The authors start with using a GAN with both generator and discriminator with only 4 units. The training then progressively doubles the size of units and eventually increasing it to 1024 units. Doing so, the authors claims, helps to generate fine details.

What else is in the recipe? There are two we spotted:

The authors used facial landmarks to crop an image in the CelebA-HQ dataset.
Then the various normalization scheme

nvidia.com

The State of ML and Data Science Report from Kaggle

This is a great survey by Kaggle on data scientists and ML experts. What is their demographics? What tool do they use? And what is their most favorite machine learning algorithm? They are all answered in the report.

kaggle.com

A Visual Guide to Evolution Strategies

We often learned a lot just by reading what Google Resident David Ha share in his Facebook and LinkedIn feed. He has impressive experience in different sub-fields of machine learning. His seemingly casual experiments on reinforcement learning is one of the most interesting to read and understand.

This time, David himself wrote an extensive guide on evolution strategies, which compares various methods such as genetic algorithm (GA), covariance matrix adaptation evolution strategy (CMA-ES), REINFORCE and even OpenAI latest strategy. It’s certainly eye-opening for us. As in many of David’s work: code is released. So Enjoy!

otoro.net

Arthur’s Full Review of deeplearning.ai Course 2

Here is Arthur’s review of deeplearning.ai Course 2. This time he focus on why learning the details of deep learning could be a good thing for beginners of DL.

thegrandjanitor.com

Open Source

Implementation of CapsNet

Here is one of the first implementations which attempts to reimplement CapsNet.

github.com

A PyTorch Implementation

And here is a PyTorch implementation.

github.com

Paper/Thesis Review

What We Know About Capsules So Far

We wrote the previous piece on Monday on our Literature Discussion group. But then it triggered very interesting discussion which we learnt couple of things:

Hinton and students might have published another paper on ICLR 2018. It’s very likely to involve EM as the routing mechanism.
There are already two available implementations of CapsNet. (See the Implementation Section.)

facebook.com

Capsules

This is the Hinton’s new invention of capsules algorithm. Here is a write up: It’s TL;DR but we doubt we completely grok the idea anyway.

The first mention of “capsule” is perhaps in the paper “Transforming Auto-encoders” which Hinton and students coauthored.
It’s important to understand what capsules try to solve before you delve into the details. If you look at Hinton’s papers and talks, capsule is really an idea which improve upon Convnet. Hinton has two major complaints.
First, the general settings of Convnet assumes that one filter is being used across different locations. This is also known as “location invariance”. In this setting, the exact location of a feature doesn’t matter. That has a lot to do with robust feature parameter estimation. It also drastically simplify backprop with weight sharing.
But then location invariance also removes one important information of an image: the apparent location.
Second assumption is max pooling. As you know, pooling usually removes a high percentage of information from the previous layer. In early architectures, usually pooling is the key to shrink the size of a representation down. Of course, later architectures had changed. But pooling is still an important component.
So the design of capsule has a lot of do to tackle problems of max pooling: Instead of losing information, can we “route” pixel values from previous layer correctly so that they are in optimal use?
Generally “capsule” represents a certain entity of an image, “such as pose (position, size, orientation), deformation, velocity, albedo, hue, texture etc”. Notice that they are not hard-wired and automatically discovered.
Then there is how the low level information can “route” to higher level. The mechanism is intriguing in this current implementation:
First, your goal is to calculate a softmax in the form of
exp(b{ij} / Sum_k exp(b{ik} where b_{ij} is the output of lower level capsule i to a higher level capsule j. This is something you can train.
Then what you do is iteratively estimate b_{ij}. This appears in Procedure 1. The 4 steps are:
a, calculate the softmax weight b.
b, compute the prediction vector from a capsule i, then form a weighted sum,
c, squash the weighted sum
d, update softmax weight b based on the squash value and weighted sum.
So why the squash function, our guess is it is to normalize the value computed in b. According to Hinton, a good function is
v_j = |s_j|^2 / (1 + |s_j|^2) * s_j / |s_j|
The rest of the architecture actually looks very much like a Convnet. The first layer was a Convnet with ReLU activation.
Would this work? The authors say yes. Not only it reaches the state of art benchmark of MNIST. It can also tackle more difficult tasks such as CIFAR-10, SVNH. In fact, the authors found that in both task they already achieve better results when first Convnet was first used to tackle these tasks.
It also works well for two tasks called affMNISt and multiMNIST. First is MNIST go through affine transform, second is MNIST regenerated with many overlappings. This is quite impressive, because you will need to use much data augmentation and effort of object detection to get these cases working.
The part, we have some doubts – is this model more complex than convnet? It’s possible that we are just fitting a more complex model to get better results.
Nice thing about the implementation: it’s in Tensorflow, so we can play with it in the near future.
Have fun!

arxiv.org

Unsupervised Machine Translation Using Monolingual Corpora

This is an impressive paper by FAIR authors which claims that one only need to use monolingual corpora to train a usable translation model. So how does it work? Here are some notes.

For starter, indeed you don’t need to use a parallel corpora, but you still need a bidirectional dictionary to generate translation. You also need to have monolingual corpora in both languages. That’s why the title is about monolingual corpora (plural) but not monolingual corpus (singular).
Then, there is the issue of how you actually create translation. It’s actually much simpler than you thought, first imagine there is a latent language which both your source and target languages mapped to.
How do you train? So let’s just use the source language as an example first. What you can do is create an encoder-decoder architecture which translate your source to the latent space, then translate it back. Using BLEU score, you can then setup an optimization criteria.
Now this doesn’t quite do the translation. Now you apply the same procedure on both source and target language. Don’t you now have a common latent space? One you train up such common latent subspace, in actual translation, what you need to do is to first map the target language in the common latent space, then map it back to the source language.
Many of you might recognize that such encoder-decoder scheme which map the language to itself as very similar to autoencoder. Indeed, the authors in the paper actually use a version of autoencoder: denoising autoencoder(dA) to train the model.
The final interesting idea I spot is to idea of iterative training. In this case, you can imagine that you can first train an initial translator, but then use its output as the truth and retrain another one. The authors found tremendous gain in BLEU score in the process.
The results are stunning if you consider no parallel corpus is involved. BLEU score is around 10 points lower, but do remember: deep learning has pretty much improved BLEU scores by absolute 7-8 points anyway from the classical phrased based translation models.

Member Ben Davis also wrote a fairly good summary for the paper. Check it in our thread?

arxiv.org

Uncategorized

AIDL Weekly Issue 35 –

Issue 35 October 21st 2017

Editorial

Thoughts From Your Humble Curators

Big announcement – last week, we launched our own topic-based messaging app called Expertify, to help you connect with other AI and DL professionals in our 45,000-member community. More details below on why we rolled our own and specific AI / DL features we want to add to it over time…

Download Expertify iOS app

We’d love for you you to try it and give us some feedback, if you are on iOS. We’re working on a web app and a bit down the road, Android.

In other news, we heard of stunning news that AlphaGo beats itself again and created the first Go player which has Elo-rating over 5000.

In technical news, Google created a new activation function which works better than even ReLU. And we wrote a full review of Coursera deeplearning.ai Course 1, which was quite well-received in different networks.

As always, if you like our newsletter, feel free to subscribe/forward the letter to your colleagues.

Artificial Intelligence and Deep Learning Weekly

News

We launched a messaging app to help AI / DL practitioners connect with one another. Here’s why we rolled our own

Many in our 45,000-person AIDL community have asked us if there’s a way for them to interact with one another (advice, recruiting, where to get training data, keeping up with new research, etc.) in a more real-time fashion. Messaging apps are a dime a dozen (we looked at Slack, Telegram, etc.) but we haven’t found one that is topic-based (that’s not Reddit) and makes it easy for professionals to have high-quality group or 1-on-1 discussions in a simple format.

The other big reason we rolled our own is that we want it to serve as a laboratory for practitioners to test various DL ideas and get feedback. There are many ways to customize the app to enable some DL-specific features that no other platforms have. For example, we may want to enable users to build or connect their chatbots, classifiers, anything else you can think of to our platform, test it and receive feedback from other DL practitioners. We are also exploring ways to use it as a way to crowdsource training data. The possibilities are endless.

We’d love for you to use it and help us define our roadmap, so we can build features that are useful to you and other folks in DL.

Download Expertify iOS app

apple.com

Facebook/Intel Collaboration

We heard last week Facebook is collaborating with Intel on their latest Nervanna chip. We saw the quote

We are thrilled to have Facebook in close collaboration sharing their technical insights as we bring this new generation of AI hardware to market,

from Intel CEO Brian Krzanich. We don’t know much details yet. Will report more as we hear more.

alphr.com

Apple’s SDC spotting.

Verge is running a piece Apple’s SDC. There are certainly some big guns here : such as 6 Velodyne-made LIDARs.

theverge.com

Blog Posts

AlphaGo Zero Now Learn Go From Scratch

We heard from DeepMind again on a new development of AlphaGo. Once again, the team created an even stronger Go player. From an Elo rating standpoint, Master, the one we saw that beats Ke Jie, has rating ~4900. But Zero’s rating is above ~5100. And it beats Alpha Go Lee in a record of 100 to 0.

What’s even more amazing is that Zero learns all by self-play – previous versions of AlphaGo has at least some human added feature. One more technical detail we like: instead of doing rollout to predict who would win, this time a neural network is used instead. So it is a rather drastic change from the system perspective.

deepmind.com

Review of deeplearning.ai Course 1: Neural Networks and Deep Learning

This is written by Arthur, and it will address issues such as what Course 1 is about. Is it a difficult class? And should you take the class if you already have some experience? We will address those issues on the article.

thegrandjanitor.com

Mixed Precision Training

Nice discussion on mixed precision training. It’s a good complement if you’d like to read the paper from Baidu recently.

reddit.com

A Rare Glimpse of How “Hey Siri” Works.

This is a bit old but Apple’s engineers wrote a new piece on how “Hey Siri” works. Or as we call it in the industry, keyword wakeup. The post has fairly detail explanation on what models are used on acoustic modeling as well as experimental details. It’s interesting to note that Apple engineers decides not to use the best model (LSTM) but using a simpler model (DNN) in order to run everything on a device.

apple.com

Member’s Question

Should I be a Software Engineer or an ML Engineer?

Answer from Arthur:

The most important factor is where your passion is. Do you like to be an ML guy? Do you want to be a software engineer? Notice that there is a wide spectrum of jobs in the world which is in between ML engineer and software engineer. There’re researchers/scientists who are purely about ML. There are software engineers who purely play with code. Then there are architect which you need to know a bit of everything. But it is what do you like to do decide your future.

The second most important factor I would say is reality. If you are starving, you can’t fulfill your passion. So there’s also no shame to just come up with a practical career and work hard on it.

facebook.com

Paper/Thesis Review

Swish: a Self-Gated Activation Function

Perhaps the most interesting paper last week is the Swish function. Here are some notes:

Swish is extraordinarily simple. It’s just
swish(x) = x * sigmoid(x).
Derivative? swish'(x) = swish(x) + sigmoid(x) (1 – swish (x)) Simple calculus.
Can you tune it? Yes, there is a tunable version which the parameter is trainable. It’s call Swish-Beta which is x * sigmoid( Beta * x)
So here’s an interesting part of why it is a “self-gating function”. So…. if you understand LSTM, essentially it introduced a multiplication sign. e.g. input gate and forget gate, give you are weight of “how much you want to consider the input” and “how much much you want to forget”. (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
So swish is not too different – there is the activation function but it is weighted by the input itself. Thus the term self-gating. In a nutshell, in plain English, “because we multiply”.
It’s all good, but does it work? The experimental results look promising. It works on Cifar-10, Cifar-100. On Imagenet, it beats Inception-v2 and v3 when swish replace ReLU.
It’s worthwhile to point out the latest Inception is in v4. So the imagenet number is not beating stoa even within Google, not to say the best number in Imagenet 2016. But that shouldn’t matter, if something consistently improve on some models of Imagenet, it is a very good sign it is working.
Of course, looking at the activation function. It introduces a multiplication. So it does increase computation when compare with a simple ReLU. And that seems to be the complaint I heard.
That’s what I have. Enjoy!

arxiv.org

Uncategorized

AIDL Weekly Issue 34 –

Issue 34 October 14th 2017

Editorial

Thoughts From Your Humble Curators

Perhaps the biggest new to us last week is California will allow car companies to test their fully autonomous vehicles on the street as soon as 2018.

Deal of the week last week is the Amazon/MS joint release of Gluon.

Then there are interviews. You would be interested to hear Google CEO Sinchar Pinchai’s view on what being AI-first means.

As always, if you like our newsletter, feel free to subscribe/forward it to your colleagues.

Artificial Intelligence and Deep Learning Weekly

News

Interview with Fei-Fei Li

This is an interview with Fei-Fei Li, on her view about AI is not human-centric enough. And indeed, all we are building are A(S)I, S for special. Everyone is looking for the next paradigm change that takes us beyond the restrictive pattern recognition approach today.

technologyreview.com

Sundar Pinchai and The AI-First Google

This is the story of Google’s CEO Sundar Pinchai, a calm presence in Google and the Valley, talking about his view on AI-First Google. His point – more isn’t always better.

theguardian.com

SDC in Carlifornia in 2018

California is going to let 42 companies test up to 285 autonomous vehicles. And it will happen as soon as 2018.

theverge.com

Amazon and Microsoft release Gluon

We covered Gluon at Issue 23. Technically, we found it an interesting hybrid between imperative (like PyTorch) and declarative (like Tensorflow) style of programming. We learn from the announcement that it is now available in AWS and MS.

For us the most interesting part is perhaps the partnership between MS and Amazon. It is the second time we heard they work with each other this year. In fact, back in September, we learn that they partner in integrating ther voice assistants.

geekwire.com

Blog Posts

LSTM by numpy

This is a rare from-scratch implementation (with derivation) of LSTM using pure numpy.

varunajayasiri.com

Open Source

ONNX AI Format Gains Traction

The ONNX format introduced by FB and MS starts get more partners. This sounds good. For the most part, having an open format would allow results to spread more quickly across different sites/frameworks.

fb.com

Tensorflow Lattice

When we first look at this piece, we thought that Lattice is yet another cool brandname. As it turns out Lattice is a release of an interesting mathematical model call deep lattice model (DLM).

So what is DLM then? It’s all about lattice analysis – normal regression analysis usually doesn’t impose order relationship between your inputs and outputs. So what if you hope that your output and input monotonically increase? That’s the point of lattice.

Of course, things are more complicated when the output is more monotonically increased with multiple inputs. Then using multi-layers of lattice layer would make sense. That was what the Google’s original paper is about. Perhaps more importantly, they showed great results in several ML tasks.

techcrunch.com

Video

But what is a Neural Network? | Deep learning, Part 1 by 3Blue1Brown

Here is a very good introduction of what neural network is by the beloved Math teacher, 3Blue1Brown.

youtube.com

Paper/Thesis Review

Tutorial on Variational Autoencoders

We were trying to read DP Kingma’s thesis on VAE. As you might know, he wrote the paper on reparametization trick on VAE. But this is not my topic this time. We just want to talk about a simple tutorial paper by Carl Doersch, which we found it to be better for beginners. Here are some notes:

VAE is really quite different sparse AE or denoising AE except you can think of both of them like having an encoder structure.
The Tutorial would guide you through the setup of VAE. e.g. You probably know that VAE is based on latent variable. But then the setup is special which the latent variable is randomly sampled from a Gaussian.
Then there is a detail readable section on how the common optimization evidence lower bound (ELBO) is formulated. What it bugs me a bit is that it doesn’t quite use the term ELBO.
Lastly it’s the reparametization trick. we don’t fully grok it but that’s why we still need to read Kingma

So far, this seems to be a great tutorial on VAE and it’s a great first read on the topic.

arxiv.org

Uncategorized

AIDL Weekly Issue 34 – Interviews, Partnership, SDC Testing in California 2018

Post author By grandjanitor
Post date July 3, 2019
No Comments on AIDL Weekly Issue 34 – Interviews, Partnership, SDC Testing in California 2018

Issue 34 October 14th 2017

Editorial

Thoughts From Your Humble Curators

Perhaps the biggest new to us last week is California will allow car companies to test their fully autonomous vehicles on the street as soon as 2018.

Deal of the week last week is the Amazon/MS joint release of Gluon.

Then there are interviews. You would be interested to hear Google CEO Sinchar Pinchai’s view on what being AI-first means.

As always, if you like our newsletter, feel free to subscribe/forward it to your colleagues.

Artificial Intelligence and Deep Learning Weekly

News

Interview with Fei-Fei Li

technologyreview.com

Sundar Pinchai and The AI-First Google

This is the story of Google’s CEO Sundar Pinchai, a calm presence in Google and the Valley, talking about his view on AI-First Google. His point – more isn’t always better.

theguardian.com

SDC in Carlifornia in 2018

California is going to let 42 companies test up to 285 autonomous vehicles. And it will happen as soon as 2018.

theverge.com

Amazon and Microsoft release Gluon

geekwire.com

Blog Posts

LSTM by numpy

This is a rare from-scratch implementation (with derivation) of LSTM using pure numpy.

varunajayasiri.com

Open Source

ONNX AI Format Gains Traction

fb.com

Tensorflow Lattice

When we first look at this piece, we thought that Lattice is yet another cool brandname. As it turns out Lattice is a release of an interesting mathematical model call deep lattice model (DLM).

techcrunch.com

Video

But what is a Neural Network? | Deep learning, Part 1 by 3Blue1Brown

Here is a very good introduction of what neural network is by the beloved Math teacher, 3Blue1Brown.

youtube.com

Paper/Thesis Review

Tutorial on Variational Autoencoders

VAE is really quite different sparse AE or denoising AE except you can think of both of them like having an encoder structure.
The Tutorial would guide you through the setup of VAE. e.g. You probably know that VAE is based on latent variable. But then the setup is special which the latent variable is randomly sampled from a Gaussian.
Then there is a detail readable section on how the common optimization evidence lower bound (ELBO) is formulated. What it bugs me a bit is that it doesn’t quite use the term ELBO.
Lastly it’s the reparametization trick. we don’t fully grok it but that’s why we still need to read Kingma

So far, this seems to be a great tutorial on VAE and it’s a great first read on the topic.

arxiv.org

Uncategorized

AIDL Weekly Issue 33 – All About DeepMind – Its Cost, Its Ethic Society and Its Recent Wavenet Launch

Post author By grandjanitor
Post date July 3, 2019
No Comments on AIDL Weekly Issue 33 – All About DeepMind – Its Cost, Its Ethic Society and Its Recent Wavenet Launch

Editorial

Thoughts From Your Humble Curators

We learnt more about DeepMind last week – we learn the hefty price to run it and its struggle to correct its image since the debacle of DeepMind Health-Royal Free event. But we also learnt that our beloved Wavenet is now in production within Google Assistant, and it’s whopping 1000 times faster. This week, we’ll cover DeepMind more in-depth.

We also have an issue filled with contents: “Confession of AI researchers” is our favorite link. We answer our member question on how to come up with new AI idea in “Questions from Members”. And we dive into an interesting paper: “Unsupervised Hypernym Detection by Distributional Inclusion Vector Embedding” by Haw-Shiuan et al.

As always, if you like our newsletter, feel free to subscribe and forward it to friends!

Artificial Intelligence and Deep Learning Weekly

News

How Much Does DeepMind Cost?

As it turns out: it lost ~$162 million last year. It makes around $40 million, but mostly work for its parent company Alphabet.

qz.com

DeepMind Ethics & Society

This Guardian piece is perhaps the best at describing the new DeepMind Ethics & Society. At first, we thought that this is repurposing the older DeepMind Health advisor panel but then the piece clearly stated that it is a separate society.

This move is in response to the previous year debacle of DeepMind on how to treat patient data. As we reported in Issue #20, an independent commissioner commented on the how DeepMind inappropriately treat patient data: “Just that you can, doesn’t mean that you should.” The whole event also cost 5 times the legal fee for DeepMind than previous year (see the previous piece on “How Much Does DeepMind Cost?”).

theguardian.com

Blog Posts

DeepMind Launches Wavenet In Google Assistant.

DeepMind launching Wavenet in production system is perhaps one of the most exciting news last week. What is amazing – it’s 1000 times faster than one-year-old version and it can run on a TPU. The speed up doesn’t seem lossy – MOS around ~4.35 which human voice is not too far away: 4.67.

deepmind.com

Confession as an AI researcher; seeking advice

This is an interesting thread on how one could overcome the issue of lacking enough Mathematics background when doing research or reading AI papers. Well, 1) there’s never enough Math and 2) everyone actually feels as inadequate as you do. Everyone has impostor syndrome except maybe for Geoffrey Hinton.

reddit.com

Why Continuous Learning is the key towards Machine Intelligence

Here is a good overview of continuous learning, its pros and cons, and some of its latest development.

medium.com

Nonlinear Computation in Deep Linear Networks

Here is an interesting article from OpenAI’s Jakob Foerster, which we think it’s a very cute idea – using the non-linearity of IEEE float, Foerster was able to use the a linear network to achieve non-linearity based on evolution strategy.

openai.com

Blizzard and DeepMind cohost StarCraft II AI Workshop

Applied by Oct 11, happened at the same time of Blizzconf 2017.

battle.net

Open Source

NIHS Release 100k Anonymized Chest X-Ray Images

This is perhaps one of the largest dataset of the same type.

nih.gov

Video

Cognitive Computational Neuroscience 2017

Here are all the videos of the first Cognitive Computational Neuroscience (CCN) conference. Perhaps the ones which interest us is the opening remark from Yann Lecun and the closing remark from Yoshua Bengio.

ccneuro.org

Member’s Question

How to Think of a New Idea in AI or Machine Learning.

Answer (from Arthur):

1, What other people are doing and is it possible to make a twist about it?

2, What is a problem which you want to solve in your life. Then think, is there anyway AI/ML can help you? Everyone has some – e.g. I really like to make a the old-style Nintendo Final Fantasy style game. But then drawing the graphics of bitmap character takes insanely amount of time. So is there any way A.I. can help me? Yes, one potential idea is to create an image generator.

Would these ideas work? Who knows? But that’s how you come up with ideas. You ignore the feasibility part for the moment. If you feel it is really hard for you to come up with ideas, chances are you are too caught up with the technical field. Read some books, listen to music, make some art and daydream a bit. Then ideas will come.

See the full discussion at this thread

thegrandjanitor.com

Paper/Thesis Review

Unsupervised Hypernym Detection by Distributional Inclusion Vector Embedding

Modified from my note at here

Some notes and thoughts:

This work is to model relationship such as “poodle is-a dog” or generally the word entailment problem in computational linguistic.
But what it follows is the more modern word-embedding-based approach. The hypernym problems are more often solved by Bag of Words (BOW) model. So this paper is perhaps one of the more earlier papers which used word embedding type of method on hypernym detection.
It’s good to think of why this method works. So in the case of synonyms, we all know about the distributional hypothesis – or from Firth: “a word is characterized by the company it keeps”.
What about hypernyms? the corresponding hypotheses are perhaps two “Distribution Informativeness Hypothesis” and “Distributional Inclusion Hypothesis”. First stated that semantically general word occur more often. Second states that context sets of the words tends to be subset of the hypernym. So combining the two, “chihuahua” would occur less often than “dog”, and adjective which can be applied to “chihuahua” can all be applied to “dog”, but not necessarily vice versa.
Then there is DIVE. So in essence, it is a modification of Mikolov’s skipgram but with two major modificiations: Non-negativity of skip-gram as well as weighting the negative sample with the inverse of how often a word appears. That’s pretty much is what DIVE is.
Then there is another part about PMI (point-wise mutual information). In a nutshell, Levy and Goldberg proved that the skip-gram formulation is equivalent to matrix factorization with elements equals to PMI or (p(w,c)/p(w)p(c). The authors also give a similar formulation to DIVE. The nifty part is they also apply a filter to the PMI matrix if the occurring words are too small.
Then there is a fairly extensive evaluation on the technique which include 11 datasets. And the authors have tested their technique with breakdown such as with/without DIVE, with/without PMI filtering. There is a very good improvement from SBOW approach.

Interesting enough. This doesn’t seem to be picked by arxiv-sanity. Perhaps hypernym is more a side topic in linguistic. But it’s certainly a good read. Let’s see how other members think about this?

arxiv.org