Category: Uncategorized

AIDL WEEKLY ISSUE 32 – The Consciousness Prior

Post author By grandjanitor
Post date July 3, 2019
No Comments on AIDL WEEKLY ISSUE 32 – The Consciousness Prior

Issue 32 September 30th 2017

Editorial

Thoughts From Your Humble Curators

Many interesting news last week including,

MS released new tools on Azure and Visual Studio to support deep learning,
Intel released Lolhi, a new neuromorphic chip,
Nvidia released its own deep learning accelerator chip design,
And we also read a Wire profile on the controversial figure – Anthony Lewandoski.

All of these pieces would normally be the highlights of the week. But then, Prof. Bengio come up with an intriguing note called “The Consciousness Prior”. Upon closer read, it gives an interesting take on a potential mechanism to mimic human consciousness, or a kind of access of attention as neuroscientists like Stanislas Dahaene would call. So his paper is this week’s theme.

As always, if you like our newsletter, feel free to subscribe and forward our letter to your colleagues!

Artificial Intelligence and Deep Learning Weekly

If you live in Boston area….

As Waikit and Arthur are preparing our own “Attack of the A.I. Startups” in December, we also start a new AIDL meetup group in the Boston area. Feel free to join us if you live in the New England or New York area!

meetup.com

Interview of Waikit and Arthur

Botsupply’s Grasia B. Hald and Shridhar Kumar were kind enough to invite us to their podcast. We had a fun time talking about how we started our AIDL community, our experience in commercial and technical side of machine learning. So check it out!

soundcloud.com

News

Yoshua Bengio Calls for Breaking Up Big Tech AI Research

At an AI conference in Montreal, Prof. Yoshua Bengio said:

“Concentration of wealth leads to concentration of power. That’s one reason why monopoly is dangerous. It’s dangerous for democracy.”

One consequence of deep learning becoming more mainstream is that the barriers of learning and practicing it are continually being lowered. In the past, working on state-of-the-art machine learning requires clusters of high-end machines. But now, it is possible to use a PC with a good GPU card, and do some experiments on your own.

At the same time, however, such democratization doesn’t extend upstream to the cutting-edge research side. Deep learning research has increasingly been concentrated in hands of a few large companies. Such drain of talents from academia is becoming a problem, and perhaps why Bengio is sounding the alarm.

axios.com

A Profile on Anthony Levandowski

If you are interested in SDC, you would notice that the Waymo-Uber lawsuit is still going on. In fact, unlike the seemingly more gentle stance earlier in Summer, Waymo in reinitiating charges against Uber after discovering new Uber technical documents.

The whole drama centers on one person, Mr. Anthony Levandowski. This Wired piece is a good profile on him.

While peripheral, it’s an interesting study into a key person (up till recently at least) whose interesting worldview include robots taking over the world. As quote in the Wired article:

“We’re going to take over the world. One robot at a time,” wrote Levandowski another time [to Kalanick]”

wired.com

The New Intel Neuromorphic Chip

Intel just released a chip but it is not a standard deep learning chip as Google’s Tensor processing unit (TPU) or Nvidia’s recently released NVLDA.

Intel’s line of work seems to base on spiking neural network (SNN) which was always described as closer to human brain than the usual deep neural network. What’s the difference? For starters, deep neural network doesn’t quite consider spike train as human brain neurons do. And timing of the spikes usually convey a lot of information within our biological neural networks.

Neuromorphic design also uses less energy. In the case of Lolhi, the new chip, it could use down to 2000 times less energy consumption.

Perhaps one thing we are not sure is how fast the new chip could be. Currently we only heard about test case in MNIST.

intel.com

Blog Posts

Sebastian Ruder on Multi-Task Learning

Ruder is doing it again, this time he wrote a very accessible text on multi-task learning. He talked about strategies of choosing auxilliary cost functions as well as how different tasks can be shared.

ruder.io

Review of EMNLP 2017 by Leon Derczynski

A great review of EMNLP 2017. The ones I like is Derczynski’s reviews on the workshops on both noisy text data as well as embedding evaluation.

approximatelycorrect.com

Open Source

Microsoft Releases New Tools for Machine Learning

MS has always been a major player of machine learning. But then its recent move to release several tools such as Microsoft Azure Experimental Service, as well as allow MS Visual Studio to work with toolkits such Caffe and MS’s own CNTK. That’s a very important step forward – remember that installing/compiling any of the DL toolkits are very painful process.

techcrunch.com

Theano at 1.0 and its final version.

Just yesterday, Yoshua Bengio announced that the grandfather of all deep learning toolkit, Theano, will be at 1.0, but it’s also the last version MILA would release. I think we should take a moment to remember how much Theano has help jumpstarted deep learning.

(In fact, Arthur’s first deep learning experiments are all ran in Theano on an old Dell Inspiron 530 – not using GPU Card btw.)

google.com

Nvidia open source Deep Learning Accelerator Chip Design

This is a rather big move from Nvidia, as it is seldom the case that chip company would open source their chip design to the public. Beside the fact that this could be a publicity stun, it’s possible that they were too behind of Google’s effort of Tensor processing unit (TPU) development, which as you know Google just release its v2(!) in this April. It’s reasonable to believe that Google may be 5 years ahead of the game.

Nvidia open sourcing their own accelerator make the community to work on the chip design together. In a way, it can attract contribution to the source code such that there is a chance to catch up.

Of course, the downside is that hardware development usually requires more investment. So unlike the software-counterpart, it’s harder for anyone who could come up with a good design to rival TPU v2 soon.

Regardless, we think it is a good development for the DL community. Let’s see what the community come up in the next few years.

nvdla.org

Paper/Thesis Review

Reading Prof. Yoshua Bengio’s “The Consciousness Prior”

Prof. Yoshua Bengio released an intriguing note last week on an idea called “The Consciousness Prior”. The framework is interesting and I would point out several interesting aspects of it.

The consciousness mentioned in the paper is much less of what would think as qualia but more about access of the different representations.
The terminology is not too difficult to understand, suppose there is a representation of the brain at a current time h_t, a representation RNN F is used to model such representation.
Whereas the protagonist here is the consciousness RNN, C, which is to used to model a consciousness state. What is *consciousness state& then? It is actually a low-dimension vector of the representation h_t.
Now one thing to notice is that Bengio believe that consciousness RNN, C should by itself include some kind of attention mechanism. What that means is that attention being used in NNMT these days should be involved. In a nutshell, C should “pay attention” to only important details within this consciousness vector when it updates itself.
I think so far the idea is already fairly interesting, in fact, just the idea one interesting thought : what if we just initialize the consciousness vector to be random instead, in that case, there will be a new representation of brain appears. As a result. this mechanism mimic human brains on exploring different scenario we conjured with imagination.
Bengio’s thought also encompass a training method which he called verifier network, V. The goal of the network to match the current representation h_t with previous consciousness state c_{t-k} (states?). The training as he envisioned can be a Variational autoencoder (VAE) or GAN.
So far the idea doesn’t quite echo with human’s way of thinking. Human seems to create high-level concepts, like symbols to simplify our thinking. So Bengio addresses these difficulty by suggesting we can just use another network to generate what we mean from the consciousness state, he called it U. Perhaps we can call it generation network. This network can well-be implemented by memory-augmented networks style of architecture which distinguish key/value pairs. In this case, we can map the consciousness to more concrete symbols which symbolic logic or knowledge representation framework can use. … Or we humans can also understand this consciousness representation.
This all sounds good, but as you may hear from many readers of the paper. There is no experimental results. So this is really a theoretical paper.
To be fair though, the good professor has outlined how each of the above 4 networks can be actually implemented. He also mentioned how such idea can be experimented in practice. E.g. he believe one good arena is reinforcement learning tasks.

All-in-all, this is an interesting paper, it’s a pity that the detail is scanty at this point. But it’s still quite worthwhile for your time to read.

arxiv.org

Uncategorized

AIDL Weekly Issue 31 – Get Rid of Backprop and Start Over

Post author By grandjanitor
Post date July 2, 2019
No Comments on AIDL Weekly Issue 31 – Get Rid of Backprop and Start Over

Issue 31 September 23rd 2017

Editorial

Thoughts From Your Humble Curators

Prof. Hinton said we should just get rid of backpropagation and start over. What does he mean? Let’s find out in this issue. We also include the link of all videos from deeplearning.ai here as well. Of course, also check out our blog and paper sections.

We will be hosting an AIDL Meetup at the AI World Conference in Boston on Dec 12 at 6:15pm where some cutting-edge AI companies will present. We got FREE tickets for you all! Come join us in person if you can!!

Attack of the AI Startups – https://aiworld.com/sessions/mlai/ at AI World – aiworld.com

All attendees need to register to attend. To register, please go to: https://aiworld.com/live-registration/
To receive your FREE expo pass (thru September 30), use priority code: AIWMLAIX

To receive a $200 discount off of your 1, 2 or 3 day VIP conference pass, use priority code: AIWMLAI200

AI World is the industry’s largest independent event focused on the state of the practice of enterprise AI and machine learning. AI World is designed to help business and technology executives cut through the hype, and learn how advanced intelligent technologies are being successfully deployed to build competitive advantage, drive new business opportunities, reduce costs and accelerate innovation efforts.

The 3–day conference and expo brings together the entire applied AI ecosystem, including innovative enterprises, industry thought leaders, startups, investors, developers, independent researchers and leading solution providers. Join 100+ speakers and 75+ sponsors and exhibitors and thousands of attendees.
Other than that, we also include some of our analyses on two paper as well as multiple interesting links for blogs and open source resources. So check it out!

As always, if you like our newsletter, subscribe/forward it to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Hinton’s Suspicion of Back Propagation

When the father of deep learning is suspicious of something, you listen. Prof. Hinton believes we need to get rid of backpropagation. To quote Axios:

But Hinton suggested that, to get to where neural networks are able to become intelligent on their own, what is known as “unsupervised learning,” “I suspect that means getting rid of back-propagation.”

The first thing we should ask is : Is the quote real? When we search the quote online, it all links back to the Axios article. That means Axios is our only source. Notice this view is quite different from researchers such as Yoshua Bengio. It’s also surprising because he is one of the inventors of the algorithm (together with David Rumelhart and Ronald Williams).

What does the Professor really means by “unsupervised technique” then? No one knows. Our guess would be some types of unsupervised models such as Boltzmann machine which can automatically learn the distribution of the data.

axios.com

Blog Posts

Udacity added a New “Introduction to Self Driving Cars” Nanodegree

For us, any opportunity to learn is important. This new Udactiy class seems to be a preliminary class of its standard SDC nanodegree. There are also 400 scholarships provided by Lyft.

techcrunch.com

Sebastian Ruder’s review of EMNLP 2017

This is summary written by Sebastian Ruder which summarize trends and news of EMNLP 2017.

aylien.com

Open Source

Unity RL framework

This post from Arthur Juliani, on the Unity RL framework. It looks fairly impressive as it support multiple agent types as well as more advanced features such as curriculum learning. Can it be a replacement of an OpenAI gym? So far we know, OpenAI maintenance seems to be inconsistent. Perhaps another framework/toolkit is needed.

unity3d.com

Video

CMU Deep Learning on NLP Class

This is a still on-going but very promising sets of videos from CMU on deep learning on NLP. Graham Neubig has written some great tutorials on NNMT, so I think his teaching should be very valuable for learners.

youtube.com

deeplearning.ai official Youtube Channel

It sounds like a new release from the course staffs and you can find all course videos there.

Btw, we are often asked in one of our satellite groups, Coursera deeplearning.ai on when would Course 4 and 5 will be released. As far as we know, the staff has sent out a mail to the students, stating the Course 4 would start in early October, and Course 5 would start soon afterward.

youtube.com

Paper/Thesis Review

StarSpace: Embed All The Things!

This is StarSpace, a new embedding method from FB’s Weston’s group. So what’s so special about the technique? In our view, the generality of the technique is the first thing to note – it was able to unified supervised and unsupervised embedding, as well as collaborative filtering which is commonly used in recommendation system.

Here are some details of how the method works: First of all, the optimization is implemented as ranking loss, which one usual choice is just a max margin loss or hinge loss. The authors also tried out softmax, but they found that ranking loss is giving better results across the board. What we don’t know is perhaps whether more advanced losses such as weight ranking loss or WARP was being used.

Just a digression: Is supervised embedding a thing? Indeed it is, it was also one of the Weston’s research as well. Here is a fairly good review.

By comparing with many existing STOA results such as FastText in unsupervised embedding, and WSABE in supervised embedding, the authors show that the techniques show across the board improvement. Perhaps why the author name the techniques as “*-Space”.

arxiv.org

Imagenet Training in 24 mins – A New Record from Berkeley.

Yes, you read it right, Imagenet training in 24 mins. In particular, an Alexnet structure in 24 mins and Resnet-50 in 60 mins. In terms of Alexnet, in fact, You’s work break the previous Facebook’s record: 1 hour for Alexnet training. Last time we checked, our slightly-optimized training with one single GPU will take ~7 days. Of course, we’re curious how these ideas work. So this post is a summary:

This is not based on GPUs. This is mostly a CPU platform but accelerated by Intel Knight Landing (KNL) accelerator. Such accelerator is suitable in HPC platforms. And there are couple of supercomputers in the world which were built up to 2000 to 10000 such CPUS.
The gist of why KNL is good: it can divide processors on chip with the memory well. So unlike many clusters you might encounter with 8 to 16 processors, memory bandwidth is much wider. That’s usually is a huge bottleneck in training speed.
Another important line of thought here is “Can you load in more data per batch?” because that allows calculation to be parallelized much easier. The first author, You’s previous work already allow the Imagenet batch goes from the standard, 256-512 to something like 8192. This thought has been there for a while, perhaps since Alex Krishevzky. His previous idea is based on adaptive calculation of learning rate per layers. Or Layer-wise Adaptive Rate Scaling (LARS).
You then combined LARS with another insight from FB researchers: a slow warmup in learning rate. That results in his current work. And it is literally 60% faster than the previous work.

Given what we know, it’s possible that the training can be even faster in the future. What has been blocking people seem to be 1) No. of CPUs within a system 2) How large a batch size can be loaded in. And I bet after FB read You’s paper, there will be another batch of improvement as well. How about that? Don’t you love competition in deep learning?

You’s previous work: Large Batch Training of Convolutional Networks

FB 1 hour work: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

arxiv.org

Uncategorized

AIDL Weekly Issue 30 – The A.I. in iPhone X

Post author By grandjanitor
Post date July 2, 2019
No Comments on AIDL Weekly Issue 30 – The A.I. in iPhone X

Issue 30 September 15th 2017

Editorial

Thoughts From Your Humble Curators

Perhaps the biggest AI news last week is the release of iPhone X, which now includes FaceID and A11 Bionic. We will take a closer look at the two technologies this week.

Other than that, we will also take a look of the interesting winning entry of 2017 Imagenet Squeeze and Excitation Network and the MILABOT in our paper review section.

As always, if you like our newsletter, feel free to subscribe and forward the letter to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

iPhone X and Artificial Intelligence

Apple released iPhone X last week, which critics billed it as the future of smart phone. Indeed, it is a beautiful device, a 5.8 inches OLED screen with 1125 x 2436 resolution (,which Apple called it “super retina display). It will cost a hefty $999, but it does give you a feeling that Apple is innovative again.

Of course, at the Weekly, what we mostly care is AI and deep learning. So what’s new there? Perhaps the two features which are noteworthy are FaceID and A11 Bionic chips. Let’s take a closer look?

FaceID : This is the new function which allows users to authenticate with their faces. Of course, there is an obvious issue about using face to authenticate – e.g. What if you face changes? According to Apple, FaceID is supposed to adapt to your face over time.

Does it work? Oh well, it doesn’t seem to work well at demo time on Federighi’s face. Apple’s explanation is that because the phone was passed around so many times to different people off-stage, since FaceID tries to authenticate many people’s faces without success, it ends-up require Federighi’s passcode authentication. It does sound like a reasonable explanation.

A11 Bionic Chip : which also consists of new Neural Engine. We think the best piece is from Mashable which summarizes this “chip in the chip” well. The first thing to notice is that A11 Bionic was developed around 3 years ago. That was also the time we generally believe that Apple AI development was quiet. So such bet shows that Apple was serious about AI earlier than we thought.

Perhaps, as many critics said (as the Verge piece we quote), A11 Bionic suggests Apple is taking a very different approach to take on AI – rather than doing cloud-based processing just like Google and Amazon do, Apple’s approach is edge-based. Such approach require control of the hardware stack and it’s cost intensive. But you end-up have complete controls of your product and you don’t need to rely external technologies (such as Nvidia GPU).

Corrections at 20170916 at 12:35 p.m.: The article previously states that “FaceID adapted to many different staffs’ faces, but fail to respond to Federighi’s. “, but the problem is not due to adaptation, but that Face ID actually trying to authenticate the other persons’ faces, and after many failures, it requires passcode authentication. Thanks Jin Hon Tan to point it out.

theverge.com

Blog Posts

Open AI Gym Toolkit Website Closing Down

One incredible resource from Open AI Gym is its toolkit website. Unfortunately, it was closed down without notice. That caused grieves to many RLers – many found invaluable insights on the leader board as well as write-up from different participants. The only reason we know so far is from Greg Brockman on Twitter citing there is a lack of maintenance of the site.

Just like us, many RLer thinks it’s sad to see the website, so many come forward to help maintenance. We hope the site can be live again as it is an invaluable resource for the community.

reddit.com

Third Quick Impression on deeplearning.ai’s HODL

Written by Arthur. This time mostly on Pieter Abbeel and Yuanqing Lin.

thegrandjanitor.com

Udacity introduces AI Challenger, a $300k Global AI Competition for Students

Udacity introduces a global AI competition for a $300k prize tag. One thing to note: it involves an sub-imagenet-scaled database for human skeletal system keypoints, which features 300k images across over 700k persons.

There are 5 tracks of the competition, details is still unknown to us but we will keep you posted.

udacity.com

Open Source

Tensorboard API

Google just released the Tensorboard API which allows customization of its visualization capability. So far, it looks quite impressive. For example, the Beholder application showcased in the post can show live gradient information when the model is being trained.

googleblog.com

Jobs

Computer Vision Engineer at Dishcraft Robotics

Bay Area-based startup Dishcraft looking for a machine learning engineer. Well-funded by tier-1 brand-name investors (led by First Round Capital) and are doing extremely well. For the right candidate, willing to relocate the person.
Looking for basic traditional ML (SVM and boosting). Kaggle experience is a plus, Deep Learning for 2D images and 3D volumetric data (CNN focused), Tensorflow + Keras. Desirable computer vision skills: point cloud processing, signal and image processing, computational photography (familiarity with multi-view geometry and stereo vision, and color processing)

dishcraft.com

Paper/Thesis Review

Imagenet 2017 winner: Squeeze and Excitation Network

One work which caught our eyes is that the paper version of Imagenet 2017’s winner is finally released. And we are talking about squeeze and excitation network, which gives ~25% relative improvement from the 2016 participants, which translates to absolute improvement from 2.95% to 2.25%.

So the first question is should we care about Imagenet after 2015? That’s when all the big houses such as Google and Microsoft left the game. Our answer is yes. In fact, 25% of relative improvement is still an impressive improvement in machine learning. In speech recognition, for example, improvement of 15% relative of the state of the art is usually well sought by all sites.

Then the impressive part of the technique: it is more an add-on for a large class of transformation. So you can use it on many different architecture such as ResNet or Inception.

What is the essence of the technique? The authors use more obscure terms such as “squeeze” and “excitation” to describe their technique. Once you look closer though – it is actually a kind of attention models on channel information, similar to the spirit of the attention model found in seq2seq model’s decoder. And just like other attention models, it can be trained end-to-end. The author then extends many of the existing architectures such as Resnet and Inception modules using S&E techniques which results in SE-Resnet and SE-Inception, which is how they got the better results.

It’s a pity that the paper didn’t get more attention (pun intended). But then that’s the reality of large scale ML evaluation – once big houses left the game, the thrill is gone.

arxiv.org

A Deep Reinforcement Learning Chatbot

When thinking of building bot, you may think of building a pure rule-based system or a pure seq2seq system, both won’t work out well. You need to create some kind of glue code to combine several systems together. You can certainly code it manually. But more than often, a RL approach is used, and MILABOT is one such modern example. RL is used to select the right responses from an ensemble of models. The idea is not new, but the study’s scale and scholarship are what caught our attention.

arxiv.org

Uncategorized

AIDL Weekly Issue 29 – A Sobering Look at IBM Watson, Google in Beijing and Amazon in Barcelona,

Post author By grandjanitor
Post date July 2, 2019
No Comments on AIDL Weekly Issue 29 – A Sobering Look at IBM Watson, Google in Beijing and Amazon in Barcelona,

Issue 29 September 8th 2017

Editorial

Thoughts From Your Humble Curators

Summer is gone, and AI/DL development is heating up again. You heard that Google is building a new team in Beijing, and Amazon is having a new R&D center in Barcelona? The internationalization of A.I., partly due to talent scarcity, will continue to be a theme.

In other news, StatNews investigates whether IBM Watson is fulfilling is promise in the domain of cancer care. Watson’s challenges in healthcare do reflect in some ways the same challenges in the more general deep learning community. So this article should be interesting for you all.

We also cover interesting technical topics such as Uber’s Michaelangelo and PassGAN in our blog and paper sections. So check it out!

As always, if you like our newsletter, feel free to subscribe/forward it to your colleagues!

Artificial Intelligence and Deep Learning Weekly

Meme of The Week

Courtesy from Nikolay Pavlov from Ukrainian AI Community.

(Yeah…. We know that there should be activation functions. And all the weight should be indexed in terms of layers. But hey, it’s still funny.)

Artificial Intelligence and Deep Learning Weekly

News

A Sobering Look at IBM Watson on Cancer Care

Statnews’ Carey Ross and Ike Swetlit investigated how well IBM Watson actually works in cancer care. It is a sobering look at the challenges Watson encountered. For eample, how do you deal with frequent updates of medical literature and standard? How do you address the “do no harm” concerns of doctors? How do you balance privacy and data? This StatNews piece explore each of these issues.

The investigation is thorough. One of the key areas the authors peeled back is that currently Watson doesn’t seem to provide any scientific evaluation of how well Watson works:

The actual capabilities of Watson for Oncology are not well-understood by the public, and even by some of the hospitals that use it. It’s taken nearly six years of painstaking work by data engineers and doctors to train Watson in just seven types of cancer, and keep the system updated with the latest knowledge.

Another question posed by the authors is how well Watson’s advice would be valid outside America:

In Denmark, oncologists at one hospital said they have dropped the project altogether after finding that local doctors agreed with Watson in only about 33 percent of cases.

We think the conclusion of the authors were mild, but valid:

But the outlook for Watson for Oncology is challenging, say those who have worked closest with it. Kris, the lead trainer at Memorial Sloan Kettering, said the system has the potential to improve care and ensure more patients get expert treatment. But like a medical student, Watson is just learning to perform in the real world.

Whereas if you recall the Gizmodo’s piece back in Aug 11, 2017, a quote from Claudia Perlich, a ex-data scientist from IBM:

“The reality however is what Watson as a commercial product represents is very different, even from a technology perspective, from Watson who won Jeopardy!”

Our take: IBM’s problem could be another example where technology could not yet quite catch up with the moon-shoot promise from the marketing department. IBM still has many talented AI researchers. Also, perhaps more importantly, what IBM is encountering is what other deep learning researchers may increasingly encounter – beware of the hype.

statnews.com

Google Build an ML team in Beijing

South China Morning Post (SCMP) as well as Financial Times, reported that Google is looking to build a new ML team in Beijing. If you are familiar with the relationship between Google and China, this is interesting news. China’s censorship policy doesn’t go with Google well, and it was known that back in 2010, Google decided to quit the China market, and only at 2016, the company decides to reenter the market. Initially the deal was based on Android but now it seems like Google start to further developed in A.I. activity in China as well.

So what make Beijing an ideal site? The political seat of power, tech startup ecosystem and both Beijing University and Tsinghua are all located there.

Specific to A.I., perhaps the more important reason here is that several companies such as MS already heavily invest in A.I. talents in Beijing. e.g. As you know MSR China is the one that came up with the innovative techniques such as Resnet. Google cannot afford not to have a presence in the ML ecosystem in China.

scmp.com

Amazon New Research Center in Barcelona

Another expansion news from tech giant, Amazon is planning to hire more than 100 engineers and scientists in the Barcelona engineering center. The goal of the center is not surprising – part of it is to allow Amazon to keep tap of talents in local universities, part of it is to develop non-English version of products such as Alexa.

venturebeat.com

U.S. House passed a Bill to test 100k SDC

This is a quiet yet significant piece of news. U.S. House just passed a bi-partisan bill, or SELF-DRIVE Act, which allow exemptions of autos/tech companies to get exemptions from federal regulations. As such, close to 100k autonomous vehicles can be tested on the road. The bill is not yet a law, and it still has to go through U.S. Senate.

recode.net

Blog Posts

How I failed to replicate an $86 million project in 1 line of code

This is a response to the article How I replicated an $86 million project in 57 lines of code in which the author claims that he comes up with a 57-line python system which can replace an 86 million system.

While the intention was good, we found that “How I replicated an $86 million project in 57 lines of code” is an exaggerating title. That’s why Ryan Baumann’s response is called for. Also, many members at AIDL points out that the original article is meant to be a proof-of-concept (POC), while we agreed, but a more modest title could be “How I create a POC with 57 lines of code”.

medium.com

Uber’s ML Platform: Michelangelo

Despite the many negative news reporting of Uber of late, it still remains one of the most technically interesting companies in the planet. We were introduced to Michaelangelo, the company internal ML platform. The system sounds incredibly cool – it can control data and flow of training. More importantly it allows experimenters to share their workflow.

uber.com

Open Source

Tutorial on Deep Generative Models by Shakir Mohamed and Danilo Rezende

This tutorial published by DeepMind authors, Shakir Mohamed and Danilo Rezende, go through applications of generative models and its several important classes.

shakirm.com

ONNX

In the world filled with different deep learning toolkits, FB and MSFT announcement on Open Neural Network Exchange format is surprising. If you ever use multiple frameworks to train deep models, you know that one of the headaches is that models are not necessarily compatible with one another. And you can imagine different companies don’t always have the incentives to share the models with one another. Here comes ONNX, which can supposedly run in both Facebook’s Caffe2 and PyTorch, as well as MS’ cognitive toolkit. It is a strong plus for users to use PyTorch/Cognitive Toolkit.

It remains to see if Google/TF would follow the suit. Generally, we love this piece of news, because open format is the key for efficient development.

fb.com

Paper/Thesis Review

PassGAN

No, no, no. This is not exactly a password guesser. This is a password list generator. But the thinking is very smart – can we just train a ML model from a large open password lists and use it to generate passwords? That’s PassGAN, the training method is based on I. Gulrajani’s IWGAN.

arxiv.org

EuroSAT

This is a mid-size database for land use classification using satellite image. Just judging the number (98.45%), it seems to be slightly harder than MNIST, but a bit easier than CIFAR-10, which can make it a good data set to test for papers.

arxiv.org

Uncategorized

Weekly Issue 27 – Origin Story of Prof. Andrew Ng, Backdoored ANN and Superconvergence

Post author By grandjanitor
Post date July 2, 2019
No Comments on Weekly Issue 27 – Origin Story of Prof. Andrew Ng, Backdoored ANN and Superconvergence

Editorial

Thoughts From Your Humble Curators

Our main piece this week is about Prof. Andrew Ng, on his earlier work on machine learning and why he decided to teach ML to the public. We also take a closer look at the secret sauce of Waymo’s SDC development: their Texas campus and their simulation facilities.

Other than that, we have links on six blog posts and three papers. They are mostly technical topics – leisure reading for the last week in the Summer. Perhaps what caught our eyes is the idea of backdoored neural network and superconvergence. The former is a new form of attack on DNN from the a malicious model preparer. The latter promises 10x training speed for a class of loss functions.

Finally, if you want to take the deeplearning.ai class, AIDL now has a new satellite group just for you!

As always, if you like our newsletter, remember to subscribe and forward it your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Origin Story of Prof. Andrew Ng

A piece on Prof. Ng., his view on AI and his recent work on deeplearning.ai. What fascinates us? Do you know Andrew and his team once crash a $10000 helicopter drone? And he starts to program neural network when he was 16! So check it out.

apnews.com

Waymo Simulation Facilities

In this article, Atlantic’s Alexis C. Madrigal brought us to the secret sauce of Waymo’s SDC development. There are so many interesting part we can comment. But the whole idea that the Waymo team would simply set up physical prop and test SDC is fairly interesting. Another important idea there is simulation-based driving.

theatlantic.com

Blog Posts

A Write-Up from Brain Resident Colin Raffel

This is a great write-up by Colin Raffel who was a Brain Resident, and now a Google Brain Scientist. Raffel’s description on his technical work is certainly interesting, but more fascinating is how Google Brain works as a research institute.

colinraffel.com

OpenAI Release Two More Baseline Algorithms

That includes the algorithm Actor Critic using Kronecker-factored Trust Region (ACKTR) and Asynchronous Advantage Actor Critic (A3C).

openai.com

How does Physics Connect to Machine Learning?

This is a great note on Ising model, as well as how physics and ML are related in general. I think it is an interesting read for any one who is studying probabilistic graphical models.

jaan.io

Pytorch or Tensorflow?

Here is a point by point comparison between Pytorch and Tensorflow. It does confirm our feeling that while Tensorflow is still the most prominent toolkit, Pytorch is gaining ground and its debugging capability gain more love from researchers.

github.io

Backprop is Not Just The Chain Rule

Backpropagation is a deep concept. There are thousands of tutorial will teach you to “understand ANN” by using very small network, and doing differentiation explicitly. To Dr Karpathy’s, a much more sophisticated thinking is seeing the data as flowing backward in the computational graphs. Theory of back-propagation is fascinating. And as Tim Viera said, it is not just simply about chain rule.

In this post, the OP describes the significance of using backprop to do audo-differentiation, which is significantly different from what we learn in calculus or symbolic differentiation. He also poses differentiation as a Langrangian problem. The blog describes several important results such as calculation of gradient of function is provably be as fast as a function. We found it very important to grok, especially if you want to learn the detail of the modern deep learning framework.

github.io

Imposter Syndrome

An interesting blog post which discuss being a data scientist but not having formal credential. I bet it resonates with many of our readers.

github.io

Paper/Thesis Review

BadNets: A Backdoored Neural Network

NYU presents a new idea which suggest that a malicious human model preparer can create a backdoor weakness in a neural network. That results in what they called BadNets. In essence what you need to do is just to train two classifier, one normal and one trained with backdoor samples. Then you just create a network that run in parallel of the two. Indeed, in the sea of parameters within an NN, it would be a very difficult to detect such malicious behavior.

NYU researchers does argue that it is very dangerous to outsource model training to a 3rd party. It’s also very important to use trusted individual to create a benign network.

arxiv.org

A Brief Survey of Deep Reinforcement Learning

A great survey on deep reinforcement learning, which read like a condensed version of both David Silver’s class as well as the Berkeley RL’s class combined. The author starts from the basics such as DP-based, MCMC-based algorithms. Then move towards to motivate deep reinforcement learning which a deep neural network is used to represent the value function. Then touch modern topics such as trusted policy region optimization (TRPO) and the latest challenges in the field such as limitation of behavior cloning, exploration vs exploitation etc. It will be a great read for you if you already took at least one RL class.

arxiv.org

Super Convergence

This is interesting preprint published 2 days ago and it quickly become the top-hyped at arxiv-sanity. If what the author said is true, it would mean a class of loss function could be trained with 10x the speed. The method is also quite simple which all of you can implement. There are Math and code to support it.

It’s still a preprint. So be cautious to use it in practice!

arxiv.org

Uncategorized

AIDL Weekly Issue 28: MyriadX, ChainerCV and DLSS 2017 Videos

Post author By grandjanitor
Post date June 28, 2019
No Comments on AIDL Weekly Issue 28: MyriadX, ChainerCV and DLSS 2017 Videos

Issue 28 September 1st 2017

Editorial

Thoughts From Your Humble Curators

It’s the final week of August, and we are lighter on A.I. news. But if you pay attention, you may also hear the stories like Intel MyriadX, partnership between Microsoft and Amazon’s voice assistants. We also learn the exodus of Apple’s engineers to Zoox.

There are also many exciting developments in open source, such as ChainerCV, which re-implements several object detection algorithm and promises simpler training. Videos from DLSS 2017 are also released. Just from the titles, they look very entertaining. Check them out.

Finally, as pointed out by courtesy of David Ha, a Google Brain resident, in our FB group, you know A.I. is hot when even Yves Saint Laurent features a Stanford A.I. researcher in its perfume billboard ad (photo above).

Hey, A.I. guys need to smell good too.

As always, if you like our newsletter, remember to subscribe and forward it your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Intel Myriad-X

Intel comes with a new chip, Myriad-X. We are not too surprised because Movidius, which Intel acquired in 2016 has perfect capability to create such chip. But we are still happy to see the power of the Myriad-X which not only accelerates deep learning, but also has the capability to speed up vision-related functionality such as optical flows.

The original Intel’s coverage has more marketing terms such as visual processing unit (VPU) or visual intelligence. In terms of applications, we wonder if Myriad-X can compete in key market such as SDC, where Nvidia is dominant in its nascent stage and establish many partnership with autos.

intel.com

Amazon and Microsoft agree their voice assistants will talk

TC’s Natasha Lomas sum up this new deal between Amazon and Microsoft well:

Those betting big on AI making voice the dominant user interface of the future are not betting so big as to believe their respective artificially intelligent voice assistants will be the sole vocal oracle that Internet users want or need.

This is an interesting alliance between Microsoft and Amazon. We wonder how much of this is driven by competitive pressure from Google. We also wonder if this will last beyond what might be an experiment. Amazon and Microsoft compete heavily on the cloud side and Amazon is encroaching more on Microsoft’s application software stack on the enterprise collaboration side with recent product releases.

That said, the idea that one bot assistant platform could be the end-all-and-be-all for a consumer isn’t exactly a foregone conclusion. If anything, we believe in a more heterogenous vision where different bots could interact with each other and each platform and bot could specialize (each one becoming essentially loosely coupled micro-services). We hope this alliance could potentially form an archetype of how that would work.

techcrunch.com

Apple Auto Engineers Joining Startup Zoox

It’s not exactly news that Apple is scaling back its SDC development. We heard it from this NYT post also from Slate. But then a high-profile exodus of Apple engineers is still a rather unusual event.

What is clear to us : many people think that it is the difficulty of development SDC that trips Apple, the truth seems to be that AI development within Apple is just slower than other major competitors such as Google and Facebook.

bloomberg.com

Blog Posts

Quick Impression on Heroes of Deep Learning – Interview with Prof. Yoshua Bengio

Here is another impression by Arthur, on Heroes of Deep Learning. This time is Prof. Ng. Interview with Prof. Yoshua Bengio.

thegrandjanitor.com

DeepRLHacks

Written by William Falcon, a nice set of notes extracted from the talk by John Schulman titled “The Nuts and Bolts of Deep RL Research”.

github.com

Open Source

ChainerCV

What caught our eyes in ChainerCV is its reimplementation of certain object detection algorithms such as Faster R-CNN and SSD. As well as Segnet, this is incredibly useful because all these well-known algorithms are there, but coming up an implementation is tough. More importantly, ChainerCV provides “reference code and tools to train models, which is guaranteed to perform on par with the published results”. This part is also important because repeating published results is not easy.

arxiv.org

Fashion MNIST

Here is a new dataset Fashion-MNIST, which is meant to replace MNIST. In fact, it was a while people are voicing to move away from MNIST – indeed when a benchmark can reach the 99.7%. And Kagglers can game it so much that all of them seem to be able to get close to perfect results, it’s really hard to keep using it as a benchmark. Fashion-MNIST seems to have fairly good popularity, see here. You can also find the original paper at here.

github.com

Game Of Throne Book Generator

Enough said.

github.com

Video

DLSS 2017 Videos

Finally, video lectures of deep learning summer school 2017 are finally there. Several topics such as “Theoretical Neuroscience and Deep Learning Theory” and “Deep Learning in the Brain” look incredibly interesting.

videolectures.net

Paper/Thesis Review

Interpretation of Mammogram and Chest Radiograph Reports using Deep Neural Networks – Preliminary Results

We all heard by now deep learning is getting into healthcare. So here is a paper from U. of Toronto, which gives preliminary results on using deep learning to interpret mamograms and chess radiograph reports. The study, as titled, is preliminary, because it is still at the stage of comparing against good old AI techniques such as random forrest and SVM. But then there are several interesting architectural choices in the work which worth your time to take a look. e.g. the use of bi-CNN instead of just CNN.

arxiv.org

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

In this paper by the MILA group, with Yoshua Bengio as the last author, proposed a rather intriguing idea to evaluate dialogue system. First off, some context, usually a dialogue system, deep learning or not, is evaluated using evaluation techniques for statistical machine translation (SMT) such as BLEU or ROUGE. In a nutshell, both of these techniques require human references and correctness of the response will cross-check with the reference. So if there are more words from the reference in the response, generally you get a higher score.

But then we know that dialogue system is not exactly machine translation. Aren’t there many ways just to come up with the same response in a dialogue? Like “Great”, “Good”, “Fine” pretty much are responses for the question “How are you?” But what if the references only have just “Great” and “Good”? That’s the problem of what the authors called “word overlap” metric. Indeed if the word doesn’t appear in the reference, even if your response make sense, you can’t get high score.

So instead of doing a whole word comparison, the author think “can’t we just compare in the embedded word space”? That’s the idea of word embedding. And the intriguing part about the paper is that it posed dialogue evaluation as measuring distance in this embedded word space. The authors use HRED, which they found a good representation of the reference dialogue.

That results in a rather powerful method. Not only it shows a high correlation with human scores. Also, new response can be more easily evaluated because the comparison happens in the semantic space.

We think this is a highly interesting paper of the week. So check it out!

arxiv.org

Uncategorized

Issue 27 – Origin Story of Prof. Andrew Ng, Backdoored ANN and Superconvergence

Post author By grandjanitor
Post date June 28, 2019
No Comments on Issue 27 – Origin Story of Prof. Andrew Ng, Backdoored ANN and Superconvergence

Issue 27 August 25th 2017

Editorial

Thoughts From Your Humble Curators

Finally, if you want to take the deeplearning.ai class, AIDL now has a new satellite group just for you!

As always, if you like our newsletter, remember to subscribe and forward it your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Origin Story of Prof. Andrew Ng

apnews.com

Waymo Simulation Facilities

theatlantic.com

Blog Posts

A Write-Up from Brain Resident Colin Raffel

colinraffel.com

OpenAI Release Two More Baseline Algorithms

That includes the algorithm Actor Critic using Kronecker-factored Trust Region (ACKTR) and Asynchronous Advantage Actor Critic (A3C).

openai.com

How does Physics Connect to Machine Learning?

This is a great note on Ising model, as well as how physics and ML are related in general. I think it is an interesting read for any one who is studying probabilistic graphical models.

jaan.io

Pytorch or Tensorflow?

github.io

Backprop is Not Just The Chain Rule

github.io

Imposter Syndrome

An interesting blog post which discuss being a data scientist but not having formal credential. I bet it resonates with many of our readers.

github.io

Paper/Thesis Review

BadNets: A Backdoored Neural Network

NYU researchers does argue that it is very dangerous to outsource model training to a 3rd party. It’s also very important to use trusted individual to create a benign network.

arxiv.org

A Brief Survey of Deep Reinforcement Learning

arxiv.org

Super Convergence

It’s still a preprint. So be cautious to use it in practice!

arxiv.org

Uncategorized

Issue #26 – Reissue: OpenAI DotA-2 Bot, Early Reviews of deeplearning.ai and cs231n 2017

Post author By grandjanitor
Post date June 26, 2019
No Comments on Issue #26 – Reissue: OpenAI DotA-2 Bot, Early Reviews of deeplearning.ai and cs231n 2017

Issue 26 August 19th 2017

Editorial

Thoughts From Your Humble Curators

Woohoo! As deeplearning.ai launched last week, we started to see more reviews of the class. We will look at one by Arvind Naragaj. We will also zero in on one of the optional series within the class, called “Heroes of Deep Learning”. This week, we will look at the Prof. Hinton interview by Prof Ng.

Oh, how about the OpenAI DotA-2 bot? Has it conquered the world of DotA-2 yet? From what we gather so far it doesn’t seem to be the case….. So let’s take a look in our Fact-checking section.

Other than deeplearning.ai and DotA-2, Stanford also just released the latest videos from cs231n 2017. So check out our Open Source Section!

As always, if you like our newsletter, please subscribe/forward to your colleagues!

Artificial Intelligence and Deep Learning Weekly

Reissuing of Issue 25

Due to a technical problem we encountered yesterday, we decide to reissue Issue 25 today. We apologize for any inconvenience.

Artificial Intelligence and Deep Learning Weekly

News

Prof. Andrew Ng is Also Raising 150m AI Fund

It doesn’t come as a surprise to us as financing is a core component of building the AI ecosystem. We don’t have much detail yet. Other have questioned how the fund will differentiate itself from other funds such as Element.AI or Gradient as the proliferation of AI-focused funds continues.

We are of a different opinion – Prof. Andrew Ng is the secret sauce here. Having Prof. Ng as an investor is in itself a very positive signal that the AI is deep and real (as opposed to many startups that call themselves AI startups), and that would attract talent and follow-on investors.

techcrunch.com

Factchecking

OpenAI’s Dota 2 Bot In Perspective

One of the biggest news last week is perhaps an OpenAI bot was able to beat pro Dota 2 player Dendi) (See Footnote 1). Public outlets rush to report the news and many of them reminded us how dangerous AI can become. And as you might also know, Elon Musk who launch OpenAI, says,

OpenAI first ever to defeat world’s best players in competitive eSports. Vastly more complex than traditional board games like chess & Go.

And remember in Issue 24, we looked at DeepMind/Blizaard Starcraft II environment, and we said [on using reinforcement learning on SCII],

So far, DeepMind researchers are still perplexed by the problem – and all RL algorithms so far cannot beat the built in AI agents.

So had we perceived the status of technology incorrectly? We are certainly not the only group who felt surprised. AI researcher, Danny Britz also feels the same. So is SalesForce researcher, Stephen Merity

Since this issue is already discussed quite well by Britz’ blog post and Merity’s tweets’ discussion. The Verge piece is pretty good if you want a less technical piece. OpenAI researchers also wrote two messages on the task. (Part I and Part II)

So we will just extract several important take-aways here:

A multiplayer online battle arena (MOBA) is not an real-time strategy (RTS) Game. e.g. League of Legends or Dota 2 are MOBA, whereas Starcraft I,II are RTS. They look the same, but computationally they can be very different. A MOBA has significantly fewer actions to choose from because you only control one single character. Whereas RTS require you to control not only the Heroes, and it requires you to control all the buildings.
OpenAI bot is a 1v1 bot. And A MOBA 1v1 game is very different from a MOBA 5v5. Most tournament game in DotA-2 is actually 5v5, So the machine has to deal with a team of 5 coordinated players. So even OpenAI researchers opine in their post: “1v1 is complicated, but 5v5 is an ocean of complexity.” Also playing a MOBA 1v1 game usually mean you and your opponent will use the same lane, so reflex of the player will be the key. Of course, in this case machines have an huge edge.
Then you should observe that DotA-2 API actually provide a lot of vital information which give advantage to the bot. For example, as the Verge piece points out – distance information can be easily access and give advantages to machines, which human doesn’t have such advantage.
Consider all these, many also observe that Open-AI engine has many human element involved. The e.g. Heroes was chosen manually out of the 110+ choices. Shadow Fiend was chosen. So the character picking is not done by machine. Then there is a key technique of creep blocking, which allows creep to reinforce a defense. Turns out it is trained separately.

So all of these 4 points should make you convince that we seem to be quite far away from beating general RTS games, not to say making Skynet in general.

So how do we see it? Our opinion is very similar to Britz – while that we believe popular outlets and Musk’s comment are over the top. By its own, OpenAI DotA-2Bot is still an impressive engineering project. Their status is currently comparable to say DeepBlue before it met Kasparov. There are some known issues, such as many players rumored that they can beat the bot by a technique called creep-control. But it may be a bug in the engine, and expect OpenAI researchers would fix it one day.

On the other hand, you should still notice that beating DotA-2 in 5v5 and a general RTS game are still faraway from us. Hopefully saying so would stop your nightmares (and frankly, curb your enthusiasm) of an AI-powered doomsday machine.

Footnote

If this is the first time you heard of DotA-2, check out this video for the gameplay?

Artificial Intelligence and Deep Learning Weekly

Blog Posts

AI Heroes of Deep Learning – Geoffrey Hinton

This piece is Arthur’s impression of the first video (long, 40 mins) of AI Heroes of Deep Learning which interviews of Geoffrey Hinton. Unlike other parts of deeplearning.ai class, “AI Heroes of Deep Learning” series has more research discussion, which is more suitable for working practitioners of deep learning.

thegrandjanitor.com

Early Review of Deeplearning.ai Specialization

This is an early review of deeplearing.ai from Arvind Nagaraj. It’s also the first review of the class. The part we like is his comparison between fast.ai and the deeplearning.ai. In a nutshell, fast.ai is more top-down approach of teaching – it teaches you how to run the script first before describing the internals. Whereas Prof. Ng’s deeplearning.ai is more a bottom-up approach – it first teaches you the internals, then build up from there.

The specialization is still very new. So expect more reviews will come soon. Another good one we haven’t cover is Gautam Karmarker’s post, but check it out!

medium.com

Open Source

Stanford cs231n 2017 videos

cs231n 2017 is finally released to the public. Compare to the class at 2016, there are three new lectures sets: Lecture 13 on generative models, Lecture 14 on Deep Reinforcement learning and Lecture 16 on adversarial training. That makes it a must-watch sets of videos even if you have seen it once.

youtube.com

Tensorflow v1.3 Released

In our view, this is more a release with upgrades such as adding new estimators, and perhaps the last pre-build with cuDNN 6.0. (v1.4 will pre-build by cuDNN 7.0.) Dustin Tran on twitter also mentioned that this is the first time tf.distribution. According to Tran, so far it doesn’t break his very interesting toolkit, Edward.

github.com

All About NLP

Prof. Dragomir Radev shared us a search engine by Yale LILY group which has all useful information you need to learn NLP. You can also upvote/downvote a resource. Sounds quite nifty, so check it out!

yale.edu

Jobs

Computer Vision Engineer at Dishcraft Robotics

dishcraft.com

Paper/Thesis Review

Recent trend in NLP

In this survey paper from University of Singapore, the authors gave great explanation on several well-used techniques in current NLP application of deep learning. It also include several STOA results in several tasks such as POS tagging, translation etc.

arxiv.org

Paper Version of Tensorflow Playground

This is the paper version of the well-known Tensorflow Playground which provides great visualization on neural network training.

arxiv.org

Uncategorized

AIDL Weekly #25 – OpenAI DotA-2 Bot, Early Reviews of deeplearning.ai and cs231n 2017

Post author By grandjanitor
Post date June 22, 2019
No Comments on AIDL Weekly #25 – OpenAI DotA-2 Bot, Early Reviews of deeplearning.ai and cs231n 2017

Issue 25 August 18th 2017

Editorial

Thoughts From Your Humble Curators

Oh, how about the OpenAI DotA-2 bot? Has it conquered the world of DotA-2 yet? From what we gather so far it doesn’t seem to be the case….. So let’s take a look in our Fact-checking section.

Other than deeplearning.ai and DotA-2, Stanford also just released the latest videos from cs231n 2017. So check out our Open Source Section!

As always, if you like our newsletter, please subscribe/forward to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Prof. Andrew Ng is Also Raising 150m AI Fund

techcrunch.com

Factchecking

OpenAI’s Dota 2 Bot In Perspective

OpenAI first ever to defeat world’s best players in competitive eSports. Vastly more complex than traditional board games like chess & Go.

And remember in Issue 24, we looked at DeepMind/Blizaard Starcraft II environment, and we said [on using reinforcement learning on SCII],

So far, DeepMind researchers are still perplexed by the problem – and all RL algorithms so far cannot beat the built in AI agents.

So we will just extract several important take-aways here:

A multiplayer online battle arena (MOBA) is not an real-time strategy (RTS) Game. e.g. League of Legends or Dota 2 are MOBA, whereas Starcraft I,II are RTS. They look the same, but computationally they can be very different. A MOBA has significantly fewer actions to choose from because you only control one single character. Whereas RTS require you to control not only the Heroes, and it requires you to control all the buildings.
OpenAI bot is a 1v1 bot. And A MOBA 1v1 game is very different from a MOBA 5v5. Most tournament game in DotA-2 is actually 5v5, So the machine has to deal with a team of 5 coordinated players. So even OpenAI researchers opine in their post: “1v1 is complicated, but 5v5 is an ocean of complexity.” Also playing a MOBA 1v1 game usually mean you and your opponent will use the same lane, so reflex of the player will be the key. Of course, in this case machines have an huge edge.
Then you should observe that DotA-2 API actually provide a lot of vital information which give advantage to the bot. For example, as the Verge piece points out – distance information can be easily access and give advantages to machines, which human doesn’t have such advantage.
Consider all these, many also observe that Open-AI engine has many human element involved. The e.g. Heroes was chosen manually out of the 110+ choices. Shadow Fiend was chosen. So the character picking is not done by machine. Then there is a key technique of creep blocking, which allows creep to reinforce a defense. Turns out it is trained separately.

So all of these 4 points should make you convince that we seem to be quite far away from beating general RTS games, not to say making Skynet in general.

Footnote

If this is the first time you heard of DotA-2, check out this video for the gameplay?

Artificial Intelligence and Deep Learning Weekly

Blog Posts

AI Heroes of Deep Learning – Geoffrey Hinton

thegrandjanitor.com

Early Review of Deeplearning.ai Specialization

The specialization is still very new. So expect more reviews will come soon. Another good one we haven’t cover is Gautam Karmarker’s post, but check it out!

medium.com

Open Source

Tensorflow v1.3 Released

github.com

Stanford cs231n 2017 videos

youtube.com

All About NLP

Prof. Dragomir Radev shared us a search engine by Yale LILY group which has all useful information you need to learn NLP. You can also upvote/downvote a resource. Sounds quite nifty, so check it out!

yale.edu

Jobs

Computer Vision Engineer at Dishcraft Robotics

dishcraft.com

Paper/Thesis Review

Recent trend in NLP

arxiv.org

Paper Version of Tensorflow Playground

This is the paper version of the well-known Tensorflow Playground which provides great visualization on neural network training.

arxiv.org

Uncategorized

AIDL Weekly #24 – A Deep Dive Into deeplearning.ai

Post author By grandjanitor
Post date June 22, 2019
No Comments on AIDL Weekly #24 – A Deep Dive Into deeplearning.ai

Issue 24 August 11th 2017

Editorial

Thoughts From Your Humble Curators

Happy Summer (for those in the Northern Hemi anyway)! We were off last week and the past few weeks have proved very eventful. So we have lots of material for you in this issue.

The biggest headline is we finally know what deeplearning.ai is. We took a quick look at the curriculum. For example, what does the course covers, is it worthwhile to take? How does it compare to other similar on-line classes? Such as Hinton NNML, cs231n and cs224n? We wrote a long and detail piece for you this issue.

Then perhaps now an old (fake) news, you might have heard the claim “Facebook kills AI agents which create its own language.” We looked into this in Issue 18 and 23. Since then, Gizmodo has debunked it, Snope has debunked it, and even Facebook researchers came out to clarify what it was all about. So it became a much bigger deal than your normal fake news. Since The Weekly is one of the earliest one to debunk the claim, we present our own take on the matter below.

Other than deeplearning.ai and our fact-checking, we have 9 more items including some cool topics like audio super-resolution, DeepMind/Blizzard Starcraft API+Database. Shallow network can work as well as a deep one, so we link to a paper on that too!

As always, if you like our newsletter, feel free to subscribe and forward it to your colleagues/friends!

Artificial Intelligence and Deep Learning Weekly

News

deeplearning.ai – A Closer Look At Prof. Andrew Ng’s Deep Learning Course

By now, we all know that deeplearning.ai is a new series of courses, or specialization, developed by Prof. Andrew Ng. First off, we really appreciate Prof. Ng to create a new deep learning class right after he left the industry. One of us (Arthur) has quickly browsed through the curriculum of Course 1 to 3, here are some notes:

Only Course 1 to 3 are published now, they are short classes, more like 2-4 weeks. It feels like the JHU Data Science Specialization and it feels good for beginners. Assume that Course 4 and 5 are long, say 4 weeks. So we are talking about 17 weeks of study.
Unlike the standard Ng’s ML class, python numpy is the default language. That’s good in our view because close to 80-90% of practitioners are using python-based frameworks and knowledge of numpy is always very useful.
Course 1-3 has around 3 weeks of curriculum overlapped with “Intro to Machine Learning” Lecture 2-3. But you should still check the course out even if you have some ML background. The course helps you to see other ML techniques from the eyes of a DL researcher. For example, Course 1 would guide you how to optimize a logistic regressor with back-prop like algorithm.
Course 2 is about optimization, there we’re introduced Tensorflow.
Course 3 is more about how to setup a deep learning system pipeline. While it is only two weeks long, we find this Course the most exciting because we can hear what Prof. Ng thinks about DL after his years in industry.
Course 4 and 5 are about CNN and RNN respectively, they are not yet published. From the outline so far, they are good preliminary classes before you take cs231n or cs224n.
So our general impression here is that the specialization is more a comprehensive class, comparable with Hugo Larochelle’s Lectures, as well as Hinton’s NNML. Yet the latter two classes are known to be more difficult. Hinton’s class in particular, are known to confuse even PhDs. So that shows one of the values of this new DL class: it is a great transition from “Intro to ML” to more difficult classes such as Hinton’s.
But how does it compared with other similar course such as Udacity’s DL nanodegree then? We are not sure yet, but the price seems to be more reasonable if you go through the Coursera route. Assume we are talking about 5 months of study, you are paying $245. Compare to Udacity’s price tag of $549. Ng’s specialization looks like a bargain.
Better than that: many of you Weekly readers are likely to take many other courses before considering Ng’s class. In that case, you would find finishing the class faster than you thought. That also mean, you can spend less than $245 on the class.
We also found that many existing beginner classes advocate too much on running scripts, but avoid linking more fundamental concepts such as bias/variance with DL. Or go deep to describe models such as Convnet and RNN. cs231n did a good job on Convnet, and cs224n teach you RNN. But they seem to be more difficult than Ng or Udacity’s class. So again, Ng’s class sounds like a great transition class.
Throughout the class, there are interviews with luminaries in DL community, including Prof. Hinton, Dr. Ian Goodfellow and Dr. Andrej Karpathy. Just listening to them may worth the $49 price tag.

Our current take: We are going to take the class ourselves. And we highly recommend this class to any aspiring students of deep learning.

deeplearning.ai

Factchecking

A Closer Look at The Claim “Facebook kills Agents which Create its Own Language”

As we fact-checked in Issue 18 and 23, we rated the claim

Facebook kills Agents which Create Its Own Language.

as false. And as you might know, the faked news has spread to 30+ outlets which stir the community.

Since the Weekly has been tracking this issue much earlier than other outlets (Gizmodo is the first popular outlet call the faked news out), we believe it’s a good idea to give you our take on the issue, especially given all information we know. You can think of this piece as more a fact-checking from a technical perspective. You can use as a supplement of the Snope’s piece.

Let’s separate the issue into few aspects:

1, Does Facebook kills an A.I. agent at all?

Of course, this is the most ridiculous part of the claim. For starter, most of these “agents” are really just Linux processes. So….. you can just stop them by using the Linux command kill. The worst case…. kill -9 or kill -9 -r? (See Footnote [1])

2, What Language does AI Agents Generated and The Source

All outlets, seem to point to couple of sources, or the original articles. As far as we know, none of these sources had quoted academic work which is directly the subject matter. For convenience, let’s call these source articles to be “The Source” (Also see Footnote [2]). The Source apparently has conducted original research and interview with Facebook researchers. Perhaps the more stunning part is there are printouts of how the machine dialogue look like. For example. Some of the machine dialogue looks like

Bob: “i can i i everything else”

Alice: “balls have zero to me to me to me to me …..”

That does explain why many outlets sensationalized this piece, because while the dialogue is still English (as we explained in Issue #18), it does look like codeword, rather than English.

Where does the printout comes from? It’s not difficult to guess – it comes from the open source code of “End-to-End Negotiator” But then the example from github we can find there looks much more benign:

Human : hi i want the hats and the balls

Alice : i will take the balls and book <eos>

Human : no i need the balls

Alice : i will take the balls and book <eos>

So one plausible explanation here is that someone has played with the open source code, and they happened to create a scary looking dialogue. Th question, of course, are these dialogue generated by FB researchers? or does FB researchers just provide The Source the dialogue? Here is the part we are not sure. Because the Source does quote words from Facebook’s researcher (see Footnote [3]), so it’s possible.

3, What is Facebook’s take?

Since the event, Prof. Dhruv Batras has post a status at July 31 in which he simply ask everyone to read the piece “Deal or No Deal” as the official reference of the research. He also called the faked news “clickbaity and irresponsible”. Prof. Yann Lecun also came out and slam at the faked newsmaker.

Both of them decline to comment on individual pieces, including The Source. We also tried to contact both Prof. Dhruv Batra and Dr. Mike Lewis on the validity of the Source. Unfortunately, they are both unavailable for comments.

4, Our Take

Since it’s an unknown for us whether any of The Source is real, we can only speculate what happened here. What we can do is make as technically plausible as possible.

The key question here: is it possible that FB researchers have really created some codeword-like dialogue and passed it off to the source? It’s certainly possible but unlikely. Popular outlets have general bad reputation of misinforming the public on A.I., it is hard to imagine that P.R. department of FB don’t stop this kind of potential bad press in the first place.

Rather, it’s more likely that FB researchers only publish paper, but somebody else is misusing the code the researchers open sourced (as we mentioned in Pt. 2). In fact, if you reexamine the dialogue released by The Source:

Bob: “i can i i everything else”

Alice: “balls have zero to me to me to me to me …..”

It looks like the dialogue was generated by models which are not well-trained, this is true especially if you compare the print out with the one published by Facebook’s github.

If our hypothesis is true, we side with FB researchers, and believe that someone just write a over-sensational post in the first place causing a stirs of the public. Generally, everyone who spreads the news should take responsibility to check the sources and ensure integrity of their piece. We certainly don’t see such responsible behavior in the 30+ outlets who report the faked news. It also doesn’t look likely that The Source is written in a way which is faithful of the original FB research. Kudos to Gizmodo and Snope’s authors who did the right thing. [4]

Given the agent is more likely to behave like what we found on Facebook’s github, we maintain our verdict as in Issue 18 and 23, it is still very unlikely that FB agents are creating any new language. But we add qualifier “very unlikely” because as you can see in Point 3, we still couldn’t get Facebook researchers’ verification as of this writing.

So let us reiterate our verdict:
We rate the “Facebook killing agents” false.
We rate “Agents that create its own language” very likely false.

AIDL Editorial Team

Footnote:

[1] So, immediately after the event, couple of members was joking about the public was being ignorant about what so-called AI agents are.

[2] We avoiding naming what The Source is. There seems to be multiple of them and we are not sure which one is the true origin.

[3] The author of The Source seems to have communication with Facebook researcher Prof. Dhruv Batra and quote the Professor’s word, e.g.

There was no reward to sticking to English language,

as well as talking with researcher Mike Lewis,

Agents will drift off understandable language and invent codewords for themselves,

[4] What if we are wrong? Suppose the Source is real, and the Agents does Generated Codeword-Like Dialogue, Are they new Language?

That’s more a debatable issue. Again, just like we said in Issue 18, suppose you start from training a model like from an English database, the language you got will still be English. But can you characterize a English-like language as a new language? That’s a harder question. e.g. a creole is usually seen as another language, but a pidgin is usually seen as just a grammatical simplification of a language. So how should we see the codeword generated by purported “rogue” agent? Only professional linguist should judge.

It is worthwhile to bring up one thing: while you can see the codeword language just like any machine protocol such as TCP/IP, the Source implies that Facebook researcher have consciously making sure the language adhere to English. Again, this involves if the Source is real, and whether the author has mixed in his/her own researches into the article.

gizmodo.com

Blog Posts

Google at ICML 2017

This is a set of papers written by Google researchers at ICML 2017.

googleblog.com

DeepMind’s paper at ICML 2017

You can also find Part Two and Part Three.

deepmind.com

Fitting to Noise or Nothing At All: Machine Learning in Markets by Zachary David

While mostly criticizing one single paper, David’s article should also alarm you how easy it is to do it wrong in DL+fintech world, and how easy to read a wrong paper.

zacharydavid.com

DeepMind and Blizzard Release Starcraft II Research Environment

In this very entertaining post from DeepMind, we learn about the new Starcraft II AI research environment. So that includes API, dataset and a sets of mini-games that agents can play.

Those are all good, but I guess the most interesting question we should ask when would computers can beat Starcraft at human level? That should make us all curious. In fact, DeepMind’s post give us clues – currently, the problem seems to be very daunting – there are close to 300 basic actions at one moment you can use. According to DeepMind’s estimation: even for a small screen size 84 x 84, we are talking about, there are around 1 million possible actions. That compare to Atari’s 10 actions or in Go which has around few hundreds, we are talking about another breed of problems.

That’s perhaps why DeepMind first encourage researcher to work on mini-games first – That makes a lot of sense – it’s like before you play Go on a full board, you first learn how to play on a 9×9 board.

So far, DeepMind researchers are still perplexed by the problem – and all RL algorithms so far cannot beat the built in AI agents. And their next step is to imitation learning, which would be enabled by Blizaard’s database on how the winners won on the game.

deepmind.com

Open Source

PyTorch 0.2 is out

The new toolkit in town, PyTorch, is releasing the second version of its toolkit. It does look cool, the two coolest feature in our view is higher order gradient calculation and distributed training. The former saves you time if you love to analyze NN by looking at Hessians. The latter is cool, of course, because it allows PyTorch to be used in more industrial scenarios which can speed up training if resources allowed.

github.com

Distributome

Have you ever feel confused by probability distributions? We know we do. So that’s why Distributome is such a cool project. Not only it comes up with the description of different distribution and their relationship. It also allow you to sample them. So in a sense, this is a cool mathematical and computational tool for researchers.

distributome.org

Paper/Thesis Review

Audio Super-Resolution with Neural Network

If you are into audio processing and super-resolution, this rare gem is for you. This paper brought the idea of image superresolution such as C Dong’s method to audio.

arxiv.org

Natural Language Processing with Small Feed-Forward Networks

When the abstract says:

We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models.

This should make your pay attention, especially in our world where everyone is talking about using deep models. This article is a great rebuttal that in NLP you don’t have to do that all the time. Shallower models can work as well.

arxiv.org