AIDL Weekly #13 – Special Issue on GTC 2017

Post author By grandjanitor
Post date June 5, 2019
No Comments on AIDL Weekly #13 – Special Issue on GTC 2017

Issue 13 May 12th 2017

Editorial

The Moat of Nvidia – Thoughts From Your Humble Curators

There are many tech conferences each year. But none impressed us as much as GTC 2017. We curated 4 pieces about the conference, but in this Editorial, we’d to explain the incredible moat of Nvidia. And, we think this moat is getting stronger.

First, by “moat”, we mean competitive advantage. So what’s Nvidia’s moat? Some of you might quickly point out its hardware platforms such as its GTX, Quadro and Teslas (or Pascal or Volta) series of GPU cards, and software platform, CUDA. Beyond the obvious IP and chip design moat, there is also powerful software lock-in. Indeed, as developers, we compile code with CUDA daily. CUDA is an easy to learn extension of C and is quick to produce results. The surrounding rich software support makes it easy to get up and running, and has high switching costs, once enough efforts has been spent on top of it.

But increasingly, Nvidia is branching out into new areas of computing, creating new moats. It just tripled its data center business in a yoy basis. It has to do with the fact that they own both the hardware/software platform. And deep learning is not going anywhere soon.

Now, this moat is further strengthening in GTC 2017. Why? First, it announced that it is going to train 100k developers just this year, creating more potential customers steeped in their wares. This is a smart move – behaviors are hard to change. Secondly, they announced a new cloud platform initiative (curated under “Nvidia GPU Cloud”), which makes it easier for newcomers to start building on Nvidia’s platform. Now, it remains to be seen what the competitive dynamics would be with other large cloud platforms like Google, Amazon, and Microsoft which are also Nvidia’s customers. Nvidia might just see its own platform more as an educational platform and not necessarily a major revenue contributor like AWS long-term.

Currently, there are two potential competitors of Nvidia, one is AMD, but AMD is still struggling to come up with a new GPU to compete. Then there is a ASIC-platform, but most of them are still under development (Intel’s Nervanna) or proprietary (Google’s TPU). So virtually Nvidia is monopolizing the deep learning computing platform.

In this issue, we further analyze on Nvidia’s training plan, the new V100, new partners on Drive PX and its Cloud move. We also cover Medical Imagenet and other news.

As always, if you like our letters, please subscribe and forward it to your colleagues!

Edit at 20170514: Peter Morgan is kind enough to correct us – both Nervanna and TPU are based on ASIC, rather than FPGA. We have corrected the web version since.

Artificial Intelligence and Deep Learning Weekly

News

Nvidia’s Plan to Train 100000 Developers

The first surprising announcements from GTC 2017 is Nvidia’s plan to train 100k developers in 2017, which is around 10 times Nvidia’s Deep Learning Institute (DLI) is training now. As on-line courses go, currently Nvidia’s deep learning courses only occupy a very small space. On-line venues such as Coursera and EdX provides solid DL classes and you can count up to ~15 classes on that space.

But Nvidia does has a distinct advantage – it is the home of all GPUs and with their move into the cloud space, they can provide learners/developers enough machines. This happens to be the weakness of a lot of on-line classes – because many beginning developers just don’t have access of GPUs for their course work. Unfortunately, without GPU processing, some of the slightly difficult tasks in deep learning would take intractable amount of time to complete.

venturebeat.com

Nvidia GPU Cloud

Another stunning news from Nvidia during GTC2017 is its move into cloud computing. The product is currently dubbed as Nvidia GPU Cloud (NPC). As the CNBC report suggested Nvidia is not rebuilding the whole cloud data structure, rather they are building NPC but let other companies such as Amazon and Google to run it.

The first question you should ask is whether this is a viable business model. If you think about the current cloud platform, while companies such as Amazon and Google control the cloud, not all of them (except Google) have the right hardware to attract developers to train deep learning models. So they are still partially relying on Nvidia to provide GPU on their platforms.

During NPC is developed, this situation is unlikely to change, in fact Nvidia would continue to have an advantage. Nvidia simply owns yet another layer of the computing platform – and this time on the lucrative cloud.

And you could see how the whole GTC 2017 play out and how Nvidia has a nice syndicated strategy – NPC will enjoy a nice customer base in 2017 – because we just learn that Nvidia is also going to train 100k members in its platform.

nvidia.com

Inside Volta

Finally, it is V100, an update of the Nvidia’s most powerful line of GPU cards. Of course, the first thing you notice is that it is faster. But in what sense? And how real it is? For that, you only need to take a look of its speed of GEMM.

What is GEMM? GEMM is a BLAS command, or GEneral Matrix to Matrix Multiplication. Computation such as back-propagation can mostly rewritten as a matrix and matrix multiplication computation. That’s why GEMM is also thought as the heart of deep learning.

From Nvidia’s result, wow, V100 is not only 80% faster than P100 on F32 computation. It is around 8x faster when you use FP16 as inputs (!).

Btw, Caffe2 already support FP16 as input and you may look at how much speed gain by V100. We also curate one piece on how Facebook gain tremendous speed on machine translation using CNN, I believe they will keep this record next year if they also switch to FP16 calculation. The rest of the blog post has details on several amazing aspects of the GPU, so it deserves your time to take a quick look.

nvidia.com

Toyota uses Nvidia Drive PX

If you think Nvidia only sells high-end GPUs, think again. It also sells embedded platforms such Tx2 as well as car platforms such as Drive PX.

Of course, Drive PX focuses on self-driving, which is a growing robustly. Wired just curated a piece (see next item) on the top companies in the space. For years to come, Nvidia would enjoy another advantage from deep learning : SDC.

venturebeat.com

263 companies on SDC

As we curate news on deep learning lately, we found that if there is 10 pieces of news a day, 3 of them are about self-driving car. Wired create a cool infographics which shows a whopping 263 companies in the space. It shows how much opportunity in the space.

wired.com

Blog Posts

FAIR’s Novel Approach to Neural Machine Translation

To understand Facebook’s method of neural machine translation, you need to first understand how CNN and LSTM are really used in the industry. Obviously CNN is used in image classification and RNN/LSTM is used in time-series data such as SMT or text classification. Of course there are cross-overs. For example, CNN is sometimes used in text classification and RNN can be used in images. But those are more research-based systems or as an alternative models for model combination.

So in what sense the two models are different then? One single word: speed. CNN is much more easy to parallelized. As a result it is much faster. Whereas LSTM always require calculation of previous time points, so it’s harder to parallelize calculation on the whole time series. So LSTM in practice is usually one magnitude slower. That’s why in fields such as speech recognition, ideas such as time difference neural network (TDNN) is emerging because it has much better parallelization property.

So back to Facebook, the amazing about this novel approach of SMT is that they are able to make CNN works in time data and the performance is actually better than using LSTM. To make the idea works, FAIR researchers have implemented a refined attention model, also known as multi-hop model, as well as a good gating mechanism. They achieve BLEU score which better previous LSTM systems. Because they are using CNN, they are also able to parallelize tremendously in both GPU and CPU platforms.

What this research taught us perhaps is that which architecture to use is less rigid than we thought – it’s up to brilliant researchers to come up with a method which has the best accuracy/speed trade-off.

facebook.com

Mind Reading Algorithms through Deep Learning with fMRI.

Here is a widely-circulated work on “mind-reading”, or generally how one could re-generate original stimulus images based on the fMRI. The paper version can be found here.

It is certainly a very interesting work on deep learning but the model has a component of graphical model (GM). So the deep learning is more on representation of perceived images and the GM part was fMRI activity patterns. If you look at the diagram, the proposed model, deep generative multiview
model (DGMM) does produce much better images than previous methods.

(The top line is the original. While the bottommost line is generated by DGMM.)

technologyreview.com

Open Source

Medical Image Net

As we reported in Issue 12, deep learning is permeating into the medical imaging domain. Of course, one of the bottlenecks in using deep learning is availability of large database. That makes Medical Image Net are huge deal. Currently the Langlotzlab is planning to collect from within/outside Stanford Medicine. The image database includes chess radiographs, tumor radiographs, mammographs, and 4.4 million Stanford exams.

The lesser known about the database is that it also has a NLP component that includes multiple types of medical reports which is invaluable for research such as automatic diagnostic.

If there is one news which worth 10 posts, I (Arthur) chose this one. The database is still under collection. But for years to come, the community will make much progress based on this work.

stanford.edu

Caffe2 adds FP16

Facebook just released FP16 support for Caffe2. FP16 is known to give higher throughput yet gives very similar accuracy as in FP32. The more interesting part is that it also support the very new NVidia V100. (Also in this Issue.)

caffe2.ai

AIY Voice Kit

Google and AIY Projects comes up with a DIY VoiceKit project based on Raspberry Pi 3. If you take a look of the page, it only takes 12 components to build, voice recognition is based on Google Cloud Speech API. So this should be a great fun project for a weekend.

withgoogle.com

Uncategorized

AIDL Weekly Issue #12 – Lyrebird, Recursion Pharmaceuticals, and Campas

Post author By grandjanitor
Post date May 30, 2019
No Comments on AIDL Weekly Issue #12 – Lyrebird, Recursion Pharmaceuticals, and Campas

Issue 12 May 5th 2017

Editorial

Thoughts From Your Humble Curators

We start to see how machine intelligence can be applied in controversial manner, two related pieces this week:

Lyrebird – which astounded us by not only mimicking multiple politicians, but they claim only one minute of training data is enough.
Campas – which provides sentence judgement based on software.

We also discuss Recursion Pharmaceuticals and what makes deep learning particularly useful in the company.

As always, if you like our newsletter, remember to subscribe and forward to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Some Thoughts on Lyrebird

Lyrebird is perhaps the biggest news item last week. If you go to Lyrebird’s web site, you could click on several impressive demos, including a faked conversation between Obama, Clinton and Trump.

Now Lyrebird claims that they can create a new voice sample with 1 minute of the voice. This sounds like an astounding claim. Is that really true? We believe the answer is yes. We can just recall Wavenet, a work done by DeepMind last year on using casual convolution neural network to model voice and music. The key of the paper, other than the interesting dilated network, lies on the fact that the model can be conditioned. That is you can use extra variables as inputs of the model and guide the generation.

We brought up Wavenet because it is a simple way to understand such systems. Wavenet’s principle is quite similar to our familiar image recognition systems. For example, similar to an image recognition, you can probably use little data to train a new model, you just have to apply transfer learning-type of techniques. That makes us believe Lyrebird’s 1-minute training claim is feasible.

What is Lyrebird’s actual architecture then? Given the founder list, it’s likely we are talking about Jose Sotelo‘s char2wav. which is more an attention-based encoder-decoder architecture with extension.

char2wav is a fairly new work. Unlike Wavenet’s paper, we don’t quite see any mean-opinion score numbers yet in the author’s ICLR 2017 paper. That might mean the authors do not have the most optimized system yet. This thought seems to add up. If you compare Lyre’s demo with Wavenet’s, while both are impressive, Lyrebird’s demo still have certain chirpiness.

For now, the Lyrebird’s team is still preparing the API. Our guess is it would take the guys a while. For starter, you need to come up with the production logic of how to do transfer learning. For premium customers, you would guess Lyrebird would allow them to upload audios longer than one minute. So all are interesting, but hard to work out in coding. We will see how good the team is then?

Oh, of course, there are privacy issues, but it is analyzed to the death. So we won’t bring it up here. But hopefully through this blurb, you have a more a insider look of Lyrebird’s controversial technology.

techcrunch.com

Recursion Pharmaceutical

One interesting trend in the last few months is how deep learning permeate fields other than the three conventional use-cases: speech recognition, computer vision and statistical machine translation.

Health care is one of the fields which many luminaries think that deep learning would revolutionize next. For example, Prof. Geoffrey Hinton recently said,

“I think that if you work as a radiologist you are like Wile E. Coyote in the cartoon. You’re already over the edge of the cliff, but you haven’t yet looked down. There’s no ground underneath.”

Of course, many healthcare-related innovations so far is actually based on computer vision. For example, in radiology, deep learning is very useful in analyzing X-ray photos. In the case of Recursion, they use deep learning on huge amount of high-resolution cell-data. So that sounds like a good and well-defined use-cases for deep learning.

Perhaps an interesting question here is other than image-processing type of applications such as radiology or brain imaging. Are there any other use cases for deep learning in health care? One interesting paper lately is on how to apply Convnet on EEG, so the key is perhaps how you can re-cast a problem to some existing deep-learning use-cases, instead of investigating a new architecture from scratch.

yahoo.com

Sent to Prison by An Algorithm

This piece from NYT discusses the issue of using software in sentencing, it centers around a product called Compas by the company Northpointe Inc., which creates assessment reports for sentencing judges.

The legal community seems to be divided on whether algorithmic sentencing is a good idea. We like Justice Bradley’s view in the article: It makes sense to allow sentencing judges to use a algorithm sentencing but they should fully understand the software’s limitation.

Perhaps devils is in the details, understanding the limitation of a machine learning algorithm is a non-trivial problem. Just to ask how the algorithm was trained, how much data was used, how balanced are the classes requires quite some expertise and fair understanding of machine learning.

In the case of Compas and many similar products, there is a further ramification: for IP reason, they refuse to open the algorithm. So practically the issue boils down to how you can blackbox-test an algorithm’s fairness. It seems to us if the issue is not impossible, a very difficult one.

nytimes.com

Richard Socher Profile

This Forbes’s piece profile our beloved Lecturer of cs224d, Richard Socher, and his work so far in MetaMind and Salesforce.

forbes.com

Blog Posts

Updating Google Maps with Deep Learning and Street View

Reading Google blog articles often teaches you something new, and this new research on Google Maps is one of those.

For starter, we can only appreciate how strong Google’s classifier is – it can automatically update the address of a location and business name just by text on the wild, which is known to be a difficult problem given that view can be presented in different angles, and text can be blurry.

Then there is the innovation of modeling on the deep learning front, as described by the paper, it presents a novel method on how to make a location-dependent attention mechanism.

Finally, Google is kind enough to open the database to the public. That’s the part I appreciate Google the most – Many companies would just do all these cutting edge research and keep the data set.

googleblog.com

“Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection

Here is a database related to faked news classification – LIAR, introduced by William Wang, the annotation is based on decade-long short sentences from Politifact.com. By now this is probably the bigger data set one can use for faked news classification research. In the paper, Wang also used a new algorithm which can integrate meta-text into a Convnet.

arxiv.org

History of Object Detection – Infographics

This impressive graphic is created by Dang Ha The Hien. You can hardly called it infographics, but more like a very condensed summary of object recognition in a poster. Of course, all goodies of object recognition such as Alexnet is here, but then the newer material such as SSD and MaskRCNN can also be found here.

medium.com

Open Source

Facebook ParlAI Framework

As some of you might know, machine learning-based dialogue system is usually domain-specific. That makes the new ParlAI framework from Facebook interesting because it allows multiple dialogue data sets to be processed and evaluated. It sounds like a huge time-saver.

github.com

Video

PyTorch in 5 Minutes

Here is a great video from Siraj Raval who discussed some cool features of PyTorch. We like Siraj because he can present difficult technical material to beginners. For example, in this 5-minute reading, Siraj teaches us how PyTorch and Tensorflow differ. And he also demonstrates a simple 2-layer network in his video. For researchers who are interested in working on neural network, PyTorch presents a great alternative to Tensorflow. So check it out!

youtube.com

Member’s Question

Difference Between ML Engineer and Data Scientist?

Q: (From Gautam Karmaker) Guys, what is the difference between ML engineer and a data scientist? How they work together? How their work activity differ? Can you walk through with an use case example?”

A: (From Arthur, redacted)

“Generally, it is hard to decide what a title means unless you know about the nature of the job, usually it is described in the job description. But you can asked what are these terms usually imply. So here is my take:

ML vs data: Usually there is the part of testing/integrating an algorithm and the part of analyzing the data. It’s hard to say how much the proportion on both sides for each job. But high dimensional data is more refrained form simple exploratory analysis. So usually people would use the term “ML” more, which mostly means running/tuning an algorithm. But if you are looking at table-based data, then it’s like to be “data” type of job. IMO, that means at least 40% of your job would be manually looking at trends yourself.

Engineer vs scientist: In larger organization, there is usually a difference between the one who come up with the mathematical model (scientist) vs the one who control the production platform (engineer). e.g. If you are solving a prediction problem, usually scientist is the one who train, say the regression models, but the engineer is the guy who turn your model to create the production system. So you can think of them as the “R” and the “D” in the organization.

Both scientist and engineer are career tracks, and they are equally important. So you would find a lot of companies would have “junior”, “senior”, “principal”, “director”, “VP” prefixed the both track of the titles.

You will sometimes see terms such as programmer or architect replacing “engineer”/”scientist”. Programmer implies their job is more coding-related, i.e. the one who actual write code. Architect is rare, they usually oversee big picture issues among programmers, or act as a balance between R&D organizations.”

Artificial Intelligence and Deep Learning Weekly

About Us

This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 19,000+ members and host a weekly “office hour” on YouTube.

Share on Twitter | Share on Linkedin

Artificial Intelligence and Deep Learning Weekly

Uncategorized

AIDL Weekly Issue 11 – Groq: A Company No One is Talking About

Post author By grandjanitor
Post date May 29, 2019
No Comments on AIDL Weekly Issue 11 – Groq: A Company No One is Talking About

Issue 11 April 28th 2017

Editorial

Thoughts From Your Humble Curators

Perhaps the biggest news last week is about Groq, a company started by Google’s ex-employees who work on Tensor Processing Unit (TPU). We talk about the company and its current principals.

Of course, ICLR 2017 also held last week. We have two links this issue focused on the conference.

Other than Groq and ICLR 2017, we also cover:

Notes on Stanford cs228n, a Bayesian network class,
a note from Athelas’ Dhruv Parthasarathy on image segmentation,
another criticism of Neuralink.

As always, if you like AIDL Weekly, don’t forget to subscribe and forward to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Groq

According to the CNBC report, several ex-googlers, who are involved in the Google’s TPU project, form the company Groq,

So what do we know about Groq? Really. Not much. But let’s see what we find……

Groq has a website, it has a physical address, currently it is located in the same building of Palihapitiya’s Social and Capital.
According to the SEC filing, other than Palihapitiya, there are two more key personals in the companies. The first is CEO Douglas Wightman who was at Google X, which focused on moonshot projects.
Then there is Jonathan Ross. He was listed as one of the authors of the paper “In-Data Center Performance Analysis of a Tensor Processing Unit”.
According to the CNBC report, Palihapitiya claims that he hired 8 out of 10 original TPU engineers. Together with Ross’ presence, it’s no doubt that Groq is also building another TPU-like chip.

So how should we see the event? One plausible explanation is that while TPU is an important project, it doesn’t has enough traction within Google. That causes an exodus of developers. Groq happens to be able to capture them. The next question is would they succeed?

Compare to software development, hardware engineering is known to be a much harder field to invest. In the case of TPU, what is its real-life use cases? Perhaps the first important direction is production-scale deep learning inference, which will be a huge bottleneck for years to go. If the company can really come up with a chip which has equivalent to TPU, then companies other Google can start to compete.

Finally, would Groq pull another Otto?. In the case of hardware engineering, without the previous design, it seems to be very difficult for Groq to pull off a new product soon. It’s true that California’s Law allow employee’s work outside of work being protected from IP lawsuit. But how much of hardware work can be done outside Google? We doubt.

It’s no doubt to us though, this is just the dawn of the Great Chip War (See this fascinating write-up from Wired).

cnbc.com

Another Criticism on Musk’s Neuralink

Here is a piece by Tech. Review’s Antonio Regalado calls Musk’s idea “isn’t going to happen”. His assumption is that Musk is going to use surgical implant on patients, which normally can cause life-risking complications. Of course, such experiments would be scrutinized by regulators.

Compared to the IEEE piece we discussed in Issue 10. Regaldo’s piece didn’t quite consider the idea of neuralace in Neuralink. But he considered several recent results of BCI in his piece, which make his criticism worthwhile for your time.

technologyreview.com

Blog Posts

Best way to understand regularization

Enough said. Courtesy by Edward Grefenstette.

twitter.com

cs228 notes

This notes is for Stanford’s Graphical Model class or cs228 2016-2017. As you know, the same course was taught by Daphne Koller back 5 years ago at Coursera, and it was known to be a very difficult class. So these notes are useful for learners who try to go through the class.

One thing new about the 2016 class is its stress on how graphical model can be used in topics of deep learning. For example, the part about how to train variational autoencoders would worth your time.

github.io

A Brief History of CNNs in Image Segmentation: From R-CNN to Mask R-CNN

Image segmentation has always been a topic of computer vision. And recent advance of MaskRCNN (discussed in Issue 6) brought a lot of excitement to the community. This piece from Dhruv Parthasarathy, gives a fairly concise history of MaskRCNN from its origin of RCNN, and is widely circulated within Facebook and Twitterverse.

Btw, another good source of learning about image segmentation is Kapathy’s cs231n Lecture 8 and 13. Johnson has done a very nice job to describe relationship between object segmentation and detection models such as the RCNN family.

athelas.com

ICLR 2017

ICLR 2017 was held at Toulon France from Apr 24 to 26. The link points to a list of Google’s papers, which include two best papers. I (Arthur) found that openreview.net forum has more interesting discussion. But if you are want to see summaries, check out here [here] and some statistics here.

googleblog.com

The GAN Zoo

Here is list of GANs, collected by Avinash Hindupur, it seems to be one of most comprehensive lists we’ve seen so far. Another companion literature list could be really-awesome-ganhttps.

deephunt.in

DeepMind CEO on Kasparov

Deep Mind CEO, Demis Hassabis, review the book Deep Thinking written by Gary Kasparov, which would be released in May 2, 2017. Hassabis, himself a world-class chess player (with Elo Rating 2300 when he was 13), wrote about Kasparov’s chess prowess, and his later embrace of the technology which defeat him.

In these days, any chess engine in your pocket can beat Magnus Carlsen. But it still left you in wonder extraordinary humans like Kasparov can intuit how super-engineered machine works and tell us more about its limitation. That’s why we are looking forward for his book and you should definitely check out Hassabis’ review.

nature.com

Ethics NLP’s accepted papers.

This happened about 3 weeks ago, but we found it interesting enough to bring up – Ethics in NLP is a conference with focused on ethical challenges in NLP. So some sample paper includes “Gender and Dialect Bias in YouTube’s Automatic Captions” and it could be helpful for researchers who want to understand intrinsic bias of machine learning algorithms.

ethicsinnlp.org

Video

ICLR 2017 Facebook Page (+Videos)

Here is a page for ICLR 2017 conference, and you will find all live-stream videos in the page.

facebook.com

Uncategorized

AIDL Weekly Issue 10 – F8, Brain-Computer Interface, Apple’s SDC Permit

Post author By grandjanitor
Post date May 28, 2019
No Comments on AIDL Weekly Issue 10 – F8, Brain-Computer Interface, Apple’s SDC Permit

Editorial

Thoughts From Your Humble Curators

The next big platform everyone will be fighting over is your mind. Check out Elon Musk’s Neuralink and Facebook’s brain-typing and skin-hearing.

Last week also featured F8, which happened on April 18th and 19th. F8 gave us another week filled with some far-out news: Augmented reality? Caffe2.ai? Brain computer interface? Check, check and check. We have 4 items in this issue which cover all these cool stuffs.

We also had a very interesting live-streamed office hour with Sumit Gupta, VP of HPC, AI and Machine Learning at IBM. We went in-depth into what Sumit thinks are bottlenecks in Deep Learning today and other topics. Check out the video below.

Other than F8 and the IBM interview, we also cover:

Development of SDC, by Apple and Baidu,
Neuralink and its criticism.

As always, if you like our newsletter, subscribe and forward it to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Facebook AI in Augmented Reality

FB is turning to the humble camera as the next platform. FB envisions AR to be universal from simple Snapchat style of face filter to “why buy a physical TV when you can buy a $1 app that is a TV projected in AR?”. This path would require lots of visual detection, object recognition and SLAM-type techniques for virtual agents to check their own location and update mapping. As Mark Zuckerberg stressed that AR is Facebook’s 2nd Act, AI will also be more the DNA within Facebook.

wired.com

Neuroscientists weigh in on Neurolink

By now, we all learn about Neuralink which is a new company by Elon Musk on creating new brain-computer interface (BCI). If it is the first time you heard about BCI, it is more mature than you thought. For example, one treatment of Parkinson’s disease is to surgically insert wire into thalamus of the brain and emulate a normal brain stimulus within the region. So far, there are around 150k Parkinson’s patients received such treatment. Of course, there are also know cases where a mind-reading device could be implanted to stroke’s patients, which allow them to communicate with the outside world.

On the other hand, any invasive procedures such as surgery can easily cause issues like infections. So usually any BCI technique is used a last resort. And new inventions are usually heavily regulated.

Musk seems to have a different approach. In the past, one method he mentioned in various conferences/interview (see this tweet) is neural lace. Neural lace is more a sci-fi term, it could mean an extremely thin mesh that can be inserted from needle hole.

How feasible is the plan? Spectrum found 5 experts on the field to discuss the matter. If you check out the links, there are academics who are working on ideas similar to Musk, there are also entrepreneur who has counter opinions, especially the potential medical issues of using neural lace.

[Edit at 20170421, 3:00 p.m. Fixed a typo.]

ieee.org

Facebook Brain Computer Interface : Brain-Typing and Skin-Hearing

How real is Facebook’s Brain-Computer Interface (BCI)? Here are several thoughts:

As we mentioned in the Neuralink piece. BCI is already being used in practice and has many genuine applications: brain stimulus of Parkinson’s patients, reading minds from patents with massive stroke.
But what about ideas such as brain-typing or skin-reading? Those are more speculative technologies. For example, brain-typing is usually called “imagined speech recognition” in literature. Due to the highly noisy nature of electroencephalogram (EEG) signal, it is very tough to have a good working system. So many existing researches actually used Electrocorticogram(ECog), which is an invasive procedure. As we discussed in the Neuralink piece, invasive procedure would likely cause complications. In the case of Ecog, around 10% of chance that the procedure would results in minor complications. We are also far from complete mind reading – for example most applications are confined to recognize limited vocabularies.
Facebook’s massive imaging software seems to be a way to resolve this issue. According to their hiring page, the new technique would base on “optical, RF, ultrasound, or other entirely non-invasive approaches”. The issue perhaps is how to do such procedure real-time.
Skin-reading on the other hand requires new interface such as acoustic actuators on your skin, that probably requires advanced signal processing techniques such as beam-forming.

So far, we can conclude that these efforts are more speculative and research in nature. I think both Facebook/Techcrunch seem to report as it is. Facebook stress that they are hiring neuroimaging engineers as well as start to a complete 2 year projects. So none of these details make me think Facebook is hyping it.

You can contrast Facebook research to Neuralink from Musk, which is a lot more speculative, because it requires a non-invasive procedure to insert interface to the brain.

techcrunch.com

Apple’s Official Permit of Testing Self-Driving Car

It’s an open secret that Apple is working on a self-driving car. After a well publicized cut-back last October, Apple is in the game again.

What should we expect from Apple SDC?

While Google is getting more limelight in its deep learning technology, Apple is unlikely to be too behind. With a recent move of revamping its research effort, we expect Apple has similar man-power as Google in deep learning.
Perhaps the issues is data. Google (Waymo) has invested a decade of research to collect road data. Remember several Apple’s public PR disaster such as Map stemmed from insufficient amount of data.
Perhaps what we expect most is the design. Would it follow Johnathan Ive’s series of iProducts?

Interesting enough, we just learned this is also the week – Google is hiring lobbyist to push SDC through the administration. Of course, we learn that Baidu just announced project Apollo, which is the umbrella project of all Baidu hardware and software solutions for SDC. With 30+ players in the field, we will see who win the crown of SDC.

businessinsider.com

Baidu’s Project Apollo

Yet another company disclosed their plan on SDC. Codenamed Apollo, Baidu’s effort seems extensive: it includes open sourcing hardware and software solutions. On the software stack, Baidu promised to open code and capabilities in obstacle perception, trajectory planning, vehicle control, vehicle operating systems.

While Baidu received a permit from California last September, it’s reasonable to believe that its core market is China. We believe it is more challenging. China leads the on-road car accidents and causalities, mostly because it has a strained highway infrastructure. Would SDC be operable in such challenging road-condition? It’s still left to find out. But for now, California might be a better test bed for Baidu.

marketwired.com

Blog Posts

Kaggle’s CTO advice on studying ML.

Kaggle’s CTO, Ben Hamner, shares his view on how to study machine learning. His view echo with ours – you need a lot of practice in getting your ML skill sharpen in practice.

I think in real-life, having good mathematical and theoretical foundation of machine learning is also important. On that realm, perhaps a reference book such as PRML would be helpful.

kaggle.com

Reinforcement Learning vs Evolutionary Strategies

Arthur Juliani was a developer of Unity3d and currently a researcher in deep learning and cognitive neuroscience. In this interesting article, he argues that reinforcement learning (RL) and evolution strategy (ES) could be methods which coexists with each other. His point of view is purely based on animal’s inter/intra-life learning. Which the former is more close the evolution and the latter is close to reinforcement learning, or Juliani refers as the “gradient-based methods”.

I (Arthur Chan) spread this article around. One criticism I heard from member Stuart Gray, he found that Juliani seems to mix up current day genetic algorithm/evolutionary method with biological evolution. As you know ES-type of method these days are mostly about optimization in the agent life-time. So perhaps it can also be seen as intra-life learning?

I think Stuart has a point. I got to agree though the article is interesting enough, because there might room for model which is first-trained by one strategy, and then retrained by others. Perhaps one potential thought is to first train a neural network, then evolve it using neuroevolution type of method.

In any case, I found Juliani an interesting author to follow. So go ahead to check out his article.

medium.com

Open Source

Caffe2.ai

During F8, Facebook launched a new deep learning framework caffe2, so what’s the deal?

From the documentation so far, caffe2 is more a refactored version of caffe1 with backward compatibility. My first impression of the toolkit is simialr to Denny Britz’s:

I see reasons for using PyTorch over TF in certain research but I have a hard time seeing how to justify Caffe2 over TF for production.

which is quite true. I don’t see a strong reason why one has to use Caffe2 instead of TF in server environment. But then Facebook positions the framework as more mobile-friendly. For example, in its blog message, Facebook indicated that she has been working with several big companies including Qualcomm and Intel to facilitate caffe2’s mobile deployment.

As the package is still new, it’s hard for us to speculate all details of the packages. Let’s look for more developer reports in the near future, in particular we should watch out any speed/memory/model compression performance of caffe2.

caffe2.ai

Jobs

Computer Vision Engineer at Dishcraft Robotics

Bay Area-based startup Dishcraft looking for a machine learning engineer. Well-funded by tier-1 brand-name investors (led by First Round Capital) and are doing extremely well. For the right candidate, willing to relocate the person.
Looking for basic traditional ML (SVM and boosting). Kaggle experience is a plus, Deep Learning for 2D images and 3D volumetric data (CNN focused), Tensorflow + Keras. Desirable computer vision skills: point cloud processing, signal and image processing, computational photography (familiarity with multi-view geometry and stereo vision, and color processing)

dishcraft.com

Video

F8: Keynote by Mark Zuckerberg and More

We talk a lot of F8 this week. There you have it, everything you love from Facebook is here. All the AI stuffs, plus more. Enjoy!

youtube.com

AIDL Office Hour Episode 7 With Sumit Gupta

We had an awesome video “office hour” session with Sumit Gupta, VP of HPC, AI and Machine Learning at IBM. Sumit was very generous with his time and he shared what IBM is doing on the AI front with PowerAI, insights on where the bottlenecks on AI are today and how, as a beginner/intermediate practitioner, you should think about your career.

Also, here are the links to what IBM is doing with PowerAI and the Galvanize-IBM hackathon in Bay Area:

PowerAI overview
Galvanize-IBM Cognitive Builder Faire in SF this weekend

Thanks Sumit!

youtube.com

Uncategorized

AIDL Weekly Issue 9 – Titan Xp or Not? Federated Learning and Last Battle of AlphaGo(?)

Post author By grandjanitor
Post date May 27, 2019
No Comments on AIDL Weekly Issue 9 – Titan Xp or Not? Federated Learning and Last Battle of AlphaGo(?)

Issue 9 April 14th 2017

Editorial

Thoughts From Your Humble Curators

This week we focus on several note-worthy developments of the week:

Federated Learning from Google, what is its impact?
Titan Xp, should you buy it or not?
AlphaGo vs Ke Jie, is it AlphaGo final battle against humans?
Hinton’s NNML class, is it still relevant or not? Should you take it?

As always, if you like our newsletter, subscribe and forward it to your colleagues!

Artificial Intelligence and Deep Learning Weekly

News

Titan-XP

The new Titan is here! With a price tag of $1200, Titan Xp isn’t too impressive doesn’t other than speed improvement (3x). On-board memory is still 12G, slightly ahead of the late 1080 Ti.

Should you buy a Titan Xp? Tim Dettmer would recommend a generally no. Also check out another link from his “Which GPU(s) to Get for Deep Learning?”.

nvidia.com

AlphaGo vs Ke Jie

Go, unlike Chess, has no Elo rating. The closest equivalent is Rémi Coulom‘s unofficial rating. Currently Ke Jie is the number 1 on the list. (The Korean Champion, Lee Sedol? No.5.) You can think of Ke Jie as a Gary Kasparov-equivalent of Go. Ke Jie was known for his confidence, balanced style and his several wins versus the Korean Champion. And before his unofficial match with the “Master” in January, he was also confident that AlphaGo can be beaten. So far, AlphaGo racked up 60 wins, 0 loses. That sadly includes 3 wins on Ke Jie.

Ke Jie still claims he has one last move. Would he be able to stall AlphaGo’s advance? Would AlphaGo improve even more after the 60 wins? We will find out in May, “Future of Go Summit”.

deepmind.com

Canada: The New Hub of A.I.?

NYT report on Canada’s attempt to reverse her AI brain drain. Modern deep learning has its origin from three famous professors: Geoff Hinton, Yoshua Bengio and Yann Lecun. All of them have Canadian connections. This shows how much Canada has been an influence of deep learning, one of the most impactful technologies today.

Even though it’s been a hotbed of academic A.I. activity, Canada has been losing A.I. talent to Silicon Valley, and to a lesser extent, New York or Boston. However, this may be changing in the future.

The NYT article details the Creative Destruction Lab and two of its participants, Atomwise and Deep Genomics. With the newly found Vector Institute and new influx of fundings, we’ll see if it stems the outflow.

nytimes.com

Blog Posts

Update on “Which GPU(s) to Get for Deep Learning” by Tim Dettmers

For years, if you want to DIY a deep learning machine, you read Dettmers’ post “Which GPU(s) to Get for Deep Learning” which gives the final verdict of which card should buy.

So with 1080Ti and Pascal Tx just released last month, should you toss in $600 more to buy a Pascal Tx? Let’s look at few categories from Dettmer’s “tl;dr” recommendation:

Best GPU overall (by a small margin): Titan Xp
I have little money: GTX 1060 (6GB)
I have almost no money: GTX 1050 Ti (4GB)
I am a competitive computer vision researcher: NVIDIA Titan Xp; do not upgrade from existing Titan X (Pascal or Maxwell)
I am a researcher: GTX 1080 Ti

We like Dettmer’s suggestion. GTX 1080Ti is a fairly unusual card – it is the first GTX card which has more than 8G of RAM. So in a way, GTX 1080Ti makes it hard for customers to buy the Titan series. But then if you do want to do the best computer vision experiment, a Titan X (Pascal) upper is necessary.

Another note here is the new Titan Xp isn’t as impressive as one hopes. Dettmer’s guide is quite clear: don’t replace your Titan X (Pascal) yet with Xp. We also think that’s solid advice from a cost-efficiency point of view.

In any case, Dettmer’s guide should teach you how to DIY your dream DL machines, check out his post regardless of your goal.

timdettmers.com

Federated Learning

What is federated learning? Basically it is a kind of distributed method for model training. First you train models from different devices, then the devices upload updates and the server averages them to become a single model.

From a deep learning point of view, such training requires parallelizing SGD into multiple parts, which is quite hard. So the merit of Google Research paper is that they figure you can just use a large batch size on each device. In that case, you can avoid using small-step SGD and use lower bandwidth, which is precious in a federated learning scenario. (There are also techniques to deal with the non-IID-ness of the scenario. But a key insight the researchers found is that averaging behaves surprisingly well.)

Perhaps the more important issue is privacy. For example, a new user input could be learned from the model, and it’s plausible this new user input could be used to derive info of the user. That is protected by the “Secure Aggregation Protocol” (http://eprint.iacr.org/2017/281).

All-in-all, this is interesting work from Google. Check out the original blog post for more detail.

googleblog.com

Unsupervised Sentiment Neuron

From time to time, simple ideas trump complicated and over-engineered ones. OpenAI’s unsupervised sentiment neuron is one of those cases. The idea is very simple, you first train a character language model on a large corpora. In OpenAI’s case, it is a multiplicative LSTM with x-units. But no matter how complicated the model is, you are essentially just modeling the underlying distribution. Notice that, at this point, all the data is still unlabeled

Now this is the interesting part, with labeled data, you can take the x-units (now dubbed as “unsupervised sentiment neurons” ) and train a linear model out of them. When OpenAI did it, this turns out to be surprisingly effective and beat the best technique on the Stanford Sentiment Treebank task. More importantly, even if they are using 30-100x less label data, they can still match results of other methods. It took a month to train such models, but the result is very impressive.

This result is reminiscent of pre-training of DNN. It also makes you wonder, can we use method other than linear model to train on the unsupervised neurons and get better results? In any case, this piece is thought-provoking. Yet another great piece of work from OpenAI.

openai.com

Review of Hinton’s Coursera “Neural Network and Machine Learning”

Prof. Hinton’s “Neural Networks and Machine Learning” (NNML) is perhaps the first MOOC on deep learning. In this review, Arthur will discuss whether you should take this course, when you should take it. And more relevant to our audience: given the many courses/classes/tutorials you could find, is NNML still relevant? He will offer some answers in my post.

thegrandjanitor.com

Trends in Machine Learning by Andrej Kapathy

This is post by Andrej Kapathy, and he looks at various trends of machine learning, including frameworks, models, optimization algorithms etc. Sounds like “Fully Convolutional Encoder Decoder BatchNorm ResNet GAN applied to Style Transfer, optimized with Adam.” is not that far off. 🙂

medium.com

Open Source

DeepMind Sonnet

Sonnet is a library written for research purpose within DeepMind. According to the README, part of the reasons why DeepMind released yet another library is to allow more effective weight sharing within the network. That’s a fairly interesting technical reason.

Another strength is the use of submodules. As the blog post suggest, promoting submodules would probably mean easier design for large neural networks such as Differentiable Neural Computer. Indeed, specifying a large network can be tedious task using raw TF.

Perhaps the last important feature of Sonnet is that there will be more future releases. With Deepmind’s deep involvement in projects such as AlphaGo, we expect there are more interesting code open in the next year.

deepmind.com

10 Free and Legal Books for Machine Learning and Data Science

We seldom post book-link ass many of them are filled with paid content. This one from KDNuggets is different and all the books are free, meaning it is legal for you to download. We even add this link into our forum’s FAQ.

kdnuggets.com

Video

Power of CycleGAN

Wow, check out this image generated by CycleGAN! It’s certainly impressive for the flawless transfer of all strips.

Many in the forum pointed out that the horse’s eyes is still blurry, and some had tried the package with mixed results, e.g. the processing isn’t too smooth at the edge of an object. We are not too surprised – while cycleGAN propose a new loss term, cycle loss, it doesn’t quite use any segmentation information. In that way, pixels around the object could easy altered and reduced in spurious change.

We still found the technology fairly impressive – after all, you just need to throw in a bunch of training images, then the algorithm would generate the right texture.

youtube.com

Artificial Intelligence & Deep Learning Office Hour Episode #6

We had an awesome office hour session with Slack’s Amir Shevat! We discussed where AI fits in the conversational interface and the enterprise, among other things. Appreciate the time!

For those interested in building on Slack, below are the relevant links:
api.slack.com. Check out his article “Build an interactive Slack app with message menus” and his book.

Also, Amir is going on an EU tour to talk more about Slack developer platform.

Here are the dates:

London, England

Daytime Workshops April 24

Daytime Workshops April 25

Messaging Bots London April 25

Berlin, Germany

Daytime Workshops April 26

Daytime Workshops April 27

#BotsBerlin Meetup April 26

Vienna, Austria

BotBarCamp April 29–30

Copenhagen, Denmark

Meetup at Founders HQ May 1

Paris, France

Daytime Workshops May 2

Chatbots Paris Meetup May 2

Stockholm, Sweden

StartupGrind with Bear Douglas May 3

Tel Aviv, Israel

Basebots Meetup May 3

Rooftop Chat with Aleph May 3

To AI or not to AI: Chat with Amir Shevat May 4

youtube.com

About Us

This newsletter is published by Waikit Lau and Arthur Chan. We also run Facebook’s most active A.I. group with 16,000+ members and host a weekly “office hour” on YouTube.

Uncategorized

AIDL Weekly Issue 8 – Google’s TPU, cs224n 2017, April Fools’ Jokes Roundups Apr 7th 2017

Post author By grandjanitor
Post date May 27, 2019
No Comments on AIDL Weekly Issue 8 – Google’s TPU, cs224n 2017, April Fools’ Jokes Roundups Apr 7th 2017

Thoughts From Your Humble Curators

A number of very interesting developments this past week:

Google’s TPU,
The Vector Institute,
Newly released cs224n 2017 videos,
CNTK 2.0,
BEGAN

Last Saturday was April Fools Day, so we round up the best jokes and pranks about AI. Did you fall for any of them? Some of them, like OpenAI’s spam detection are fairly sophisticated.

As always if you like our newsletter, share it with your friends and colleagues. If you haven’t done it yet, don’t forget to subscribe!

Artificial Intelligence and Deep Learning Weekly

Corrections on Issue #7

In the email edition of Issue 7, we erroneously reported that an autonomous vehicle was involved in a fatal accident. It turns out that there were no serious injuries resulted. We promptly corrected the web version and post correction notice at AIDL. We apologize for causing any misunderstanding.

Artificial Intelligence and Deep Learning Weekly

April Fools’ Jokes on AI and Deep Learning – 2017

It’s almost a week after April Fools, have you fallen for any of the following pranks? Here are some of the best April Fools’ jokes we gathered this year:

Google Cheese Master – not that subtle……
Google Wind – despite urban legends, weather control is still an unfeasible task in large scale…..
Google Translate doesn’t quite add an alien language In case you don’t know, we haven’t been contacted by alien yet; also zero-shot learning from Translate is really not that zero-shot…..
Amazon Petlexa Amazon’s Alexa haven’t quite solved how to connect with canines and felines. So hasn’t “rival” Google.
Nvidia G-Assist Automatic gaming bot is feasible but probably not on a thumb drive at least. May be a K40 board.
Google doesn’t have a data center at Mars. To see through this one, you just need to know how much payload today’s spacecraft can transport.
And of course, DeepMind hadn’t solved AGI yet. We are faaaaaaaaaaaaaaaaar far away from having an AGI.

The one which we felt confused about: OpenAI’s result on spam detection, because the claim on using simulation to improve real-life training is possible. But the “future plan” on “phishing” and “adversarial spam” give it out. 🙂

(Photo Credit: Open AI)

Artificial Intelligence and Deep Learning Weekly

News

Quantifying Google’s TPU

Perhaps the biggest news last week is Google’s technical paper on Tensor processing unit (TPU) which you might know is known to be the one secret source which speed up deep-learning research within Google. It was involved in projects such as Translate and AlphaGo. Here is the full paper version. It is a very impressive work. It is both faster (15x-30x) and more energy efficient (30x-80x) than current GPUs and CPUs.

googleblog.com

A.I. versus M.D.

This is an interesting, but tl;dr type of article. We were curious on how Prof. Hinton thinks on the future of AI on M.D. I think the following paragraph summarize his view the best:

“I think that if you work as a radiologist you are like Wile E. Coyote in the cartoon,” Hinton told me. “You’re already over the edge of the cliff, but you haven’t yet looked down. There’s no ground underneath.” Deep-learning systems for breast and heart imaging have already been developed commercially. “It’s just completely obvious that in five years deep learning is going to do better than radiologists,” he went on. “It might be ten years. I said this at a hospital. It did not go down too well.”

We would also include Pathologists in this camp. In fact, we think it’s more at risk because pathology scans are generally a lot more hi resolution than radiology scans. The delta between human and machine performance on feature detection would be much larger.

However, many AIDL members cite interesting opinions on why Prof. Hinton could be wrong for the case of using AI on medicine. Check out this thread to join the discussion!

newyorker.com

Canada AI Moment

This piece, penned by Prof. Geoff Hinton, gives us more detail of the plan of Vector Institute. Prof. Hinton said that the new institute will be applied on fields such as healthcare, financial services and advanced manufacturing. That’s exciting because they are exactly fields which are still under research and can see many real-life applications in the future.

googleblog.com

3 More AI-related April Fools’ Jokes

VB is going above-and-beyond in their post. They found couple of April Fools’ jokes which we didn’t quite see before. Enjoy!

venturebeat.com

Blog Posts

Facebook AI Academy

This is a glimpse on how Facebook is building a learning community within the company. The system is very close to apprentice system which there is first a series of hand-on deep learning class, then the engineers would be able to follow researchers from FAIR to work on deep learning problems.

This is fairly smart system. For the most part, programming is one type of tasks a machine learning department need to source. In real-life though, programming in machine/deep learning requires a specific mindset. For example, you need to be willing to make sure your machine is numerically accurate. And only painstaking matching would accomplish such goals. Not every programmers train in todays university curriculum would prepare for this type of work. So Facebook system allows researchers from FAIR to easily “in-source” high quality programmers within the company.

Btw, Google has a very similar program in-house. Now AI/deep learning is getting more prominent, we can expect more companies will follow the suit to beef up their own internal AI/ML education.

The $64K question now is – if you are not FB and Google, what’s your strategy?

fb.com

Why Momentum Really Works

This is a very interesting note, written UC Davis’ Gabriel Goh, on why momentum works so nicely in practice. Goh starts from breaking the errors of a gradient descent based on the eigenvalues of the Hessian A (or Fisher Information Matrix depends on your formulation), he proceeds to define condition number which is ratio between the largest and smallest eigenvalues of A, which define how poorly gradient descent would perform. Goh then explains given such thinking, how would you choose the right step size (or learning rate)?

Goh carries out the same analysis on momentum-based method. Now we don’t want to spoil the rest of the paper. All we will say is he was able to explain several heuristics widely used in fairly simple arguments.

We find this paper to be must-read, not only because it explains momentum well, but also gives you insight on gradient descent, which beginner DL classes won’t teach.

distill.pub

BEGAN

GAN, GAN, GAN. Generative Adversarial Network is one of the hotter optic in unsupervised learning. As Prof. Yann Lecun once said it is the most important recent development in deep learning. In practice, GAN training is tough. So there are many improvements upon the basic idea which was first proposed by Ian Goodfellow.

Our summary:

The authors take away the usual discriminator, instead they uses an autoencoder as the discriminator. The authors argue that we could assume the loss to be normal, and suggest autoencoder can be used to model a normal loss. This is not new, EBGAN is first to propose similar change in the discriminator network. BEGAN assume a pixel-wise loss function, that makes discrimination just a problem comparing distribution of real and generated images.
After that, all you have to do is to come up with a distance function, they are using an approximate Wasserstein distance as in Martin Arjovsky’s Wasserstein GAN (WGAN). It is proven to lead to better convergence property than Jensen-Shannon distance.
There is also a new term which balances the generator loss and discriminator loss. In practice, these issues matter because one of the issues of training GAN is to not make either one of the networks too “strong”.

The Google team then reported that the resulting method, boundary equilibrium GAN (BEGAN), outperforms many existing benchmarks. The extra plus here is that BEGAN can train both the generator and discriminator networks in parallel. Looking at the comparison between EBGAN and BEGAN. Indeed, the image generated is less ghoulish. Objectively, it also beats WGAN in the inception scores.

It’s also interesting to compare with the paper “Improved Training of Wasserstein GANs” by NYU authors. It seems like both papers were trying to issues with WGAN, which usually require weight-clipping and can occasionally lead to instability of training. The NYU authors suggest using gradient penalties to resolve the problem. In contrast, BEGAN seems to only requires tuning the equilibrium term so it sounds more handy.

Again, we don’t want to judge if BEGAN is the ultimate form of GAN. From DCGAN, to EBGAN to WGAN. The GAN space is rapidly changing. Though, if you like to play with one, BEGAN sounds like a good implementation to start with. Here is an unofficial implementation by Taehoon Kim.

lanl.gov

Open Source

CNTK 2.0

CNTK is getting an upgrade. This time, it has Convnet and Resnet examples, interfacing with Tensorboard and Fast R-CNN examples. CNTK is not Tensorflow, so third-party development is lagging. But MSFT Research has always been a power house of deep learning research. So you should certainly check out if CNTK is fit for your company.

microsoft.com

Video

cs224n 2017 Videos

After waiting for close to 3 months, Stanford is kind enough to share the complete videos of cs224n 2017. There are 18 lectures and it a merged course from the old cs224n which focused on more traditional NLP, and Socher’s cs224d which is more deep-learning based. Both sets of lecture used to take 16 lectures, so using 18 lectures some of materials are gone. So far though, we found most comments on the new course are positive. And just looking at the indices, there are juicy topics such as dependency grammar as well coreference resolution. So we would strongly recommend you to watch this series as well even if you have already taken the previous cs224d and cs224n before.

youtube.com

Bob Ross Posessed by DeepDream

After Grocery Trip, Fear and Loathing, we have the third video, by user artBoffin which is entirely processed by DeepDream, and it’s about Bob Ross, our beloved painting teacher. I can only assure your the video is both soothing and horrifying at the same time……

vimeo.com

Member’s Question

Should You Learn Lisp?

Q: [As subject]

A: “Learning programming languages, like human languages, or generally different skills, is a way to enlighten you. LISP is a cool language because it does things differently. So sure, in that sense, Lisp may be worth your time.

On the other hand, if you do want to learn modern-day A.I. though, perhaps probability and statistics are the first “language” you want to learn well. As one member, Ernest Szeto said, nowadays A.I. usually use at least some probability-based logic. And if you think probability and statistics as a language, they are fairly difficult to learn on their own.

And yes, at AIDL, we recommend python as the first language, because it allows you to use several stacks in deep learning. You can also use R and java, but notice that there will be a gap between your work and what many people are doing.”

First publish as blog message:

Should You Learn Lisp?

Artificial Intelligence and Deep Learning Weekly

Uncategorized

AIDL Weekly Issue 7 – The Last Imagenet, OpenAI’s Evolution Strategy and AI Misinformation Epidemic

Post author By grandjanitor
Post date May 24, 2019
No Comments on AIDL Weekly Issue 7 – The Last Imagenet, OpenAI’s Evolution Strategy and AI Misinformation Epidemic

Thoughts From Your Humble Curators

One of us (Waikit) is teaching a class for MIT in Brisbane, Australia. That’s why we have a lighter issue.

An interesting observation – In the MIT Entrepreneurship classes I’m teaching, there are 120 entrepreneurs from 34 countries spanning U.S to Vietnam to Kazakhstan. One of the top topics of interest and discussion was A.I. and Deep Learning. Surprising or not, some of the students were already implementing fairly advanced DL techniques in agriculture, etc. in emerging economies. It is clear that as A.I. democratizes from the ivory towers of Montreal, Stanford, CMU, FB, Google, Microsoft, etc., there will be some very long-tail positive implications in various economies over time. Is A.I. over-hyped? Sure. But people always over-estimate the short-term and under-estimate the long-term.

This week, we cover:

The last ImageNet
OpenAI’s new results on Evolution Strategy
A new and popular Github, photo style transfer

We also incorporate an article from Zachary Lipton, in which he called out the hype of AI and misinformation spread from popular outlets.

If you like our letter, remember to forward to your friends and colleagues! Enjoy!

Artificial Intelligence and Deep Learning Weekly

News

The Last Imagenet

The Imagenet has come to the end. Imagenet 1000’s object classification performance is very close to human performance. Some authors started to evaluate on Imagenet-5k (e.g. FB ResNeXt). It’s also obvious that commercial interest on the competition also waned last year. So as the page read:

The workshop will mark the last of the ImageNet Challenge competitions, and focus on unanswered questions and directions for the future.

In any case, we sincerely thank to Fei-Fei Li who started the initiative, and the database has been a futile ground of object recognition/localization/detection research.

image-net.org

The Vector Institute

I (Arthur) just learned this an hour before this issue published. Vector is a powerhouse of deep learning, with Geoff Hinton in the team, and Google, RBS as platinum sponsors. This is one institute which would likely generate much research in the future.

The Vector Institute will propel Canada to the forefront of the global shift to artificial intelligence (“AI”) by promoting and maintaining Canadian excellence in deep learning and machine learning more broadly, and by actively seeking ways to enable and sustain AI-based economic growth in Canada.

vectorinstitute.ai

Talk by Ruslan Salakhutdinov

It’s rare to hear any news about Apple’s AI, as the company is known for its secrecy. This piece, by MIT Technology Review reports on Ruslan Salakhutdinov’s talk at EmTech. Salakhutdinov talks about his current research at Apple. Interesting enough, it is more about reinforcement learning, rather than Salakhutdinov’s research on unsupervised learning. So we will wait and see what fruits it will bring.

technologyreview.com

Talk by Gary Marcus

Also from the MIT Technology Review, and also from the EmTech’s conference. This one is by Gary Marcus, he is more vocal critic of the deep learning trend. Yet we found his argument is cogent and thought-provoking so we also include this piece here.

technologyreview.com

Uber Suspends Testing SDC

Uber’s SDC was involved in a collision last week. While there was no serious injury reported, soon after the company decided suspends testing of SDC. This is perhaps called for – safety of Uber’s autonomous vehicle was questioned since recode.net has revealed a leaked unfavorable disengagement document last week.

(Edit: We reported earlier that there was a “fatal crash”, but that’s not the case, according to both CNBC report and Washington Post report , there was no serious injury. Apology for the mistake and thanks George Sung for pointing it.)

nytimes.com

Blog Posts

Evolution Strategy Proved to be Comparable as RL

OpenAI found that evolutionary strategy (ES) is found to be as competitive as reinforcement learning (RL) in modern games. There is a paper version with more details, but I found the blog post summarized well.

Notice that OpenAI is arguing ES is being more parallelization-friendly than RL. e.g. They are saying using 1440 cores on 80 machines, they can create comparably-performing system as RL in 10 minutes. Whereas RL system could only use 32 cores in one machine, and it takes 10 hours. If you compare the actual compute time, ES uses 1440×80=115200 core minutes, whereas RL system uses 32x10x60=19200 core minutes. From this calculation, even OpenAI’s results suggests RL is still faster, but ES can achieve shorter wall-clock time. Perhaps that’s why OpenAI says it is a good alternative.

If this result was proven in more domains, there would be many practical consequences. For one, in the time of deep learning, many companies abandon networks which have multiple multi-core machines, instead they prefer a few machines which has powerful GPU cards. This GPU-card-centric architecture is mostly motivated by backprop, which require gradient communication every iteration, which is hugely expensive.

ES, on the other hand, is episode-based, every core can just pick up one episode, evaluate fitness and generate one scaler value. Of course, that won’t cause too much problem in communication and you can use many more machines/cores. So if ES is found to be competitive in more problems, you would expect companies would go back to network with multiple multi-core structures. Also, importance of GPU cards would be deemphasized, which is a whole new ballpark for everyone working on reinforcement learning now.

Also, if a certain problem can be solved by ES, RL researchers would have existential crisis because ES made minimal assumptions on problems, and it’s more generic. Whereas RL methods usually have many whistles and bells. Moreover, some algorithm such as neuro-evolution also allows the network topology to be altered. That sounds very attractive if computation is not a limiting factor.

Perhaps it’s still too early to say….. OpenAI’s result only say that ES can be comparable with RL in task which was dominated by RL (MuJoCo and Atari). If they later prove ES can beat RL in unsolved challenges, such as man-machine match on Starcraft, or self-driving car, DeepMind’s researchers are going to adapt ideas of ES into their research.

openai.com

The AI Misinformation Epidemic

Here is a thought-provoking article written by Zachary Lipton who is a graduate from USCD and would be an assistant professor of CMU next January. We cannot agree with him more – at AIDL, we witness misinformation and faked news spread daily – That’s why we constantly curate postings. Also, there are many self-proclaimed deep learning consultants, as well as so-called influencers, who would publish for the sake of clicks and publicity.

The truth is no filters can be more effective that your critical mind. You, as a reader, should be cautious and critical about sensational news when it comes to deep learning….. or perhaps on any topics.

approximatelycorrect.com

Deep Learning with a Pre-configured VM

There is this popular post by Adam Geitgey. He did something which is very handy for testing deep learning tools – a pre-configured VM.

Making sure you can use different DL packages is alway tough. In python, if you hit similar problem, other than using a pre-configured VM, you can also try the following strategies:

Use multiple python installations such as anaconda
virtualenv
Docker

medium.com

The Bandwagon (using in the words of Claude Shannon, 1956)

This is an essay modified from Claude Shannon’s “The Bandwagon” about machine learning. I saw it shared by Cheng Soon Ong.

“Machine Learning has, in the last few years, become something of a scientific bandwagon. Starting as a technical tool for the computer scientist, it has received an extraordinary amount of publicity in the popular as well as the scientific press. In part, this has been due to connections with such fashionable fields computing machines, cybernetics, and automation; and in part, to the novelty of the subject matter. As a consequence, it has perhaps been ballooned to an importance beyond its actual accomplishments. Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems. Applications are being made to biology, psychology, linguistics, fundamental physics, economics, the theory of organisation, and many others. In short, machine learning is currently partaking of a somewhat heady draught of general popularity.

Although this wave of popularity is certainly pleasant and exciting for those of us working in the field, it carries at the same time an element of danger. While we feel that machine learning is indeed a valuable tool in providing fundamental insights into the nature of computing problems and will continue to grow in importance, it is certainly no panacea for the computer scientist or, a fortiori, for anyone else. Seldom do more than a few of natures’ secrets give way at one time. It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realised that the use of a few exciting words like deep learning, artificial intelligence, data science, do not solve all our problems.

What can be done to inject a note of moderation in this situation? In the first place, workers in other fields should realise that the basic results of the subject are aimed in a very specific direction, a direction that is not necessarily relevant to such fields as psychology, economics, and other social sciences. Indeed, the hard core of machine learning is, essentially, a branch of mathematics and statistics, a strictly deductive system. A thorough understanding of the mathematical foundation and its computing application is surely a prerequisite to other applications. I personally believe that many of the concepts of machine learning will prove useful in these other fields — and, indeed, some results are already quite promising — but the establishing of such applications is not a trivial matter of translating words to a new domain, but rather the slow tedious process of hypothesis and experimental verification. If, for example, the human being acts in some situations like an ideal predictor, this is an experimental and not a mathematical fact, and as such must be tested under a wide variety of experimental situations.

Secondly, we must keep our own house in first class order. The subject of machine learning has certainly been sold, if not oversold. We should now turn our attention to the business of research and development at the highest scientific plane we can maintain. Research rather than exposition is the keynote, and our critical thresholds should be raised. Authors should submit only their best efforts, and these only after careful criticism by themselves and their colleagues. A few first rate research papers are preferable to a large number that are poorly conceived or half-finished. The latter are no credit to their writers and a waste of time to their readers. Only by maintaining a thoroughly scientific attitude can we achieve real progress in machine learning and consolidate our present position.”

Shannon’s original can be found here.

Artificial Intelligence and Deep Learning Weekly

Open Source

Photo Style Transfer

This is the code based on the interesting paper “Deep Photo Style Transfer” jointly written by researchers from Cornell and Adobe. As you might know, most neural-style transfer was based on Gatys’ “Gram-matrix” method, but once you play with the method a bit, you would notice that getting results similar to the author is difficult. For a while, there was a mystery of how the method work, In cs231n 2016 Lecture 9, Justin Johnson comments that Gram-matrix is only one way to do such transfer, for example, it is reasonable to use any statistics which represent the picture to be optimization criterion. So one work was based on CNNMRF, in which the most similar patch from the style image is using choose to match the content image.

Despite of all this work, you can seldom transfer a style from another photo. That makes “Deep Photot Style Transfer” special. In essence, it uses two tricks:

*NeuralDoodle’s method or general semantic segmentation method (See Lecture 13 of cs231n 2016). A segmentation mask is first generated and segment class is augmented. This allows the authors to device a class-dependent metric on the statistics.

*A photorealism regularization loss which penalizes image distortion. The idea is derived from an image matting paper from 06. I (Arthur) believe, roughly, it calculates an affine transform to reduce the “stretch” of the input image color space.

The authors then augment the standard neural-style loss function with the photorealism loss. Since the 06 paper already has proposed an optimization function, the technical merit of the authors is to integrate such function into back propagation.

We haven’t quite tried out the method. But from the examples from the paper, the results is quite stunning. The first panel from the left is the original content, the second is the style, the one at the right is transferred image. If you look at the original paper, indeed photo-style transfer gives much better results than either the gramm-matrix method or CNNMRF method.
(Photo Credit: edited from the paper.)

github.com

Member’s Question

Some Tips on Reading “Deep Learning” By GoodFellow et al

Q: How do you read the book Deep Learning By Ian GoodFellow

It depends on the chapters you are in. The first two parts are better as supplementary material to lectures/courses. For example, if you are reading deep learning and watching all videos from Karpathy’s and Socher’s class, you would learn much more than other students. We think the best lecture to go with is Hinton’s “Neural Network”.

Part 1 tries to power you through the necessary Math. If you never have at least a class of machine learning, those material are woefully inadequate. Consider to study matrix algebra or more importantly matrix differentiation first. (Abadir’s Matrix Algebra is perhaps the most relevant.) Then you will make through the Math more easily. Saying so, Chapter 4’s example on PCA is quite cute. So read them if you are comfortable with the math.

Part 3 is tough, and for the most part it is a reading for researchers in unsupervised learning, which many people believe it is the holy grail of the field. You will need to be comfortable with energy-based model. For that, we suggest you go through Lecture 11 to 15 of Hinton’s deep learning first. If you don’t like unsupervised learning, you could skip Part 3 for now. Reading Part 3 is more about knowing what other people are talking about in unsupervised learning.

While deep learning is a hot field, make sure you don’t abandon other ideas in machine learning. e.g. we find reinforcement learning and genetic algorithm very useful (and fun). Learning theory is deep and can explain certain things we experienced in machine learning. IMO, those topics are at least as interesting as Part 3 of deep learning. (Thanks Richard Green at AIDL for his opinion.)

Artificial Intelligence and Deep Learning Weekly

Uncategorized

AIDL Weekly Issue #6 – Ng’s Departure, Mask-RCNN and Intel’s AIPG

Post author By grandjanitor
Post date May 23, 2019
No Comments on AIDL Weekly Issue #6 – Ng’s Departure, Mask-RCNN and Intel’s AIPG

Thoughts From Your Humble Curators

The biggest AI/DL news last week is definitely Andrew Ng’s departure from Baidu so naturally it is the top news this issue.

Other than Ng’s departure, last week was filled with news on interesting researches and source code:

OpenAI’s research on multiple agents led to emergence of a simple language,
FAIR’s Kaiming He proposed Mask-RCNN, which shatters previous records,
Distill, a new on-line journal for deep learning,
Google’s Syntaxnet upgrade,
Google’s new skip-thought model.

Enjoy!

Artificial Intelligence and Deep Learning Weekly

News

Andrew Ng Leaving Baidu

Perhaps the biggest news this week: Andrew Ng is leaving Baidu. As all of you know, Ng started Coursera, taught the perhaps most well-known MOOC, Machine Learning, found Google Brain, and later lead important researches in Baidu such as speech recognition. No one will doubt he is one of the giants in today’s world of deep learning. So it comes to be a surprise for his departure and leave many speculations on his next move. He said in his Medium post, he would “explore new ways to support all of you in the global AI community”. That makes you wonder, what can be bigger than Google or Baidu?

Our speculation is that Ng might join an initiative such as OpenAI which is a joint effort from multiple companies, or he starts a new research initiative, similar to his own Coursera, or Fei-Fei Li’s Imagenet project, both create tremendous values to the community.

Regardless of his choice, we wish the good Professor well in his new journey. There are still many unsolved problems in machine learning. We are waiting for a world-class talent like Andrew to help solving them.

medium.com

Intel’s Artificial Intelligence Product Groups

As many of you know, Intel has been acquiring different deep learning startups such as Nervanna and Mobileye. Now Intel is aligning all these efforts under one single group, Artificial Intelligence Product Groups (AIPG), which will be reporting directly to the CEO’s Brian Krzanich.

While Intel is behind from its competitors such as Nvidia in the deep-learning market. Their recent effort of rounding up helps through acquisition is impressive. Perhaps the more difficult technical problem here is how Intel would assimilate different diverse pieces of technology together. For example, it’s thinkable that with technology of both Nervanna and Mobileye, we could have a very fast system on collision avoidance system. This will, in turn, gives room to further machine learning performance improvement, as slower methods are usually more accurate. All these interesting potential can only happen when multiple existing systems can work together. It would take Intel time and proper management to move things forward.

It will also be interesting to see how Intel can wrest some marketshare away in the low-power embedded space from ARM which is rumored to be furiously building/buying their own AI stack. The AI future will likely be hybridized – with AI training and inference happening on both server and client side. There may be some network effects in ability to own both sides.

forbes.com

Uber’s Leaked Disengagement Report

Recode obtained Uber’s disengagement number last week, and we should feel alarmed by one statistics: miles per intervention. That is how often drivers need to disengage the self-driving system and intervene driving themselves. Uber has an alarmingly low number: 0.8miles. That is for less than 1 mile, the safety driver has to take over the driving.

Compare this number from what we know about Waymo, it shows that Uber still has quite a catch up to do: Waymo’s vehicles are now in the disengagement rate of 0.2 disengagement per 1000 miles. That means the safety driver only has take over every 5000 miles.

Btw, disengagement rate is in public record in the State of California. I couldn’t find similar reports at Pennsylvania yet, apparently some consumer groups has urged Uber to release such reports around a year ago.

recode.net

Blog Posts

Distill

Distill is a new on-line journal on deep learning. There are many superstar authors such as Andrej Kapathy and Chris Olah, both are known to explain with clarify and can produce great visualization of deep learning techniques. The effort is jointly supported by OpenAI and DeepMind so you can expect the best deep learning talent would publish on the site.

googleblog.com

OpenAI : Learning to Communicate for Agents

Similar to DeepMind, OpenAI has established itself to be a power house of deep learning research. My impression is that OpenAI is nurturing an environment of reinforcement learning, and this research is one example.

What the authors try to do is see if a simple language can emerged through interactions of agents. It is one of these stories, if picked up by popular outlets, which can get sensationalized easily. So it deserves to take a closer look. For example, you would hope that if you feed in English words, then the agents would just automatically come up with an English sentence. That’s not the case. OpenAI’s researchers found that sometimes agent will use one single word to represent entire sentence. While this is very nice in term of efficiency, it is very difficult for human to interpret the meaning of the agent’s language.

Other than this technicality though, I(Arthur Chan) do find the work fascinating, it was inspired by the Nature’s paper, “the evolution of syntactic communication”. Just from the sound-byte, if one go ahead to tweak the optimization criterion, it’s possible to resolve the interpretability issue. Also check out the arxiv paper, it provides more detail on how the implementation is done, including using Grumbel-Softmax trick in “soften” a categorical distribution.

openai.com

FAIR’s Mask RCNN by Kaiming He

Kaiming He, with the Resnet fame, surprised us again last week, by proposing yet another simple, but innovative Convnet architecture to the field. This time the game is instance segmentation. As we discussed in Issue #4, instance segmentation tries to come up with individual instances and mask for a particular object. That’s different from semantic segmentation, in which a pixel-by-pixel decision is make on the regions of the image.

If you have taken cs231n 2016 (Lecture 13), you would learn that most instance segmentation system have a very similar processing pipeline as object detection: conceptually the image first went through object proposals, and then classification layers would run upon the different regions of interest (RoI)s which come up with both the classification score and bounding box. That, as many of you know is the basic structure of RCNN. Of course, the original authors later on come up with an end-to-end version of RCNN, Faster RCNN, in which the object proposals were done by region proposal network (RPN).

So most instance segmentation use very similar pipeline as Faster-RCNN, but the glitch is there is usually a further stage called “region refinement”, e.g. as in
“Simultaneous Detection and Segmentation” by Bharath Hariharan. The purpose is to further refine the mask to fit the RoI. You can imagine doing so would slow down the speed of the system.

Saying this much, the first thing you should appreciate He’s architecture is that he has practically done away the region refinement stage, and make the mask to be trainable with the Faster-RCNN framework. This is surprisingly simple. So the next question is why such simple architecture never emerged until now?

My guess is two, first is perhaps instance segmentation is still a relatively new task. Just like other tasks, people are still searching for a good end-to-end framework.

The second one is perhaps more technical: Normal faster-RCNN inherit a hard quantized RoIpool layer from fast-RCNN which was found to be suitable for extracting a feature for each RoI. It was suitable in detection task, but it is perhaps too coarse for pixel-based decision. Th technical improvement from the authors is that they come up with a better layer RoIalign, this significantly improves the network performance for creating mask, yet only make the system slightly slower than Faster RCNN (now at 5 fps).

Perhaps the most impressive part is its segmentation performance, it crushes all existing benchmarks on Coco. Let’s wait for its github code release, and we will wait and see how such architecture would be used in the future.

arxiv.org

Open Source

Google releases new skip thought model

Here is an interesting piece of code shared by Google, on the skip thoughts model first suggested by Kiros et al. The idea is similar to skip-gram, but in the sentence level. Given a sentence, a skip-thought model would predict a sentence which has the closest semantic and syntactical property. It’s a powerful technique because a similar sentence can be used as an input to upper layer such as classification. More importantly, the resource can be unlabelled and much lower the cost.

The Google’s recipe contain a Tensorflow-based recipe which can train, evaluate your own skip-thought models.

github.com

Google upgrade SyntaxNet

Syntaxnet is interesting piece of technology, it is a dependency parser and if you know a bit about NLP parsing technology, it gives great speed-accuracy trade-off in parsing. When I first learned it in Dragomir Radev’s Coursera class, I found it fairly interesting because it allows using machine learning method to solve parsing.

When Syntaxnet was first released, it was billed as “most accurate in the world”. So there were many articles to put in context. The one I like most is from Matthew Honnibal of explosion.ai who wrote the spaCY NLP library. One thing Honnibal pointed out is that Syntaxnet only inch forward from academic research. So what Google did was more the legwork to make such parser available to general public.

So how should we see this upgrade of syntaxnet? I think my view is similar to Honnibal’s one year ago. It looks like Google has done a good job to expand syntaxnet to train character-based model easier. This is crucial for morphologically rich languages such as Russian. Google dubbed models with such capability as ParseySaurus. Google’s result was tested at this year CoNLL for 45 languages. Such capability is interesting, but probably not too far away from academic researchers. (I wonder what will Honniabal say now.)

Regardless, I appreciate Google’s continuous effort of open-sourcing their technology. So check it out and see how well you can parse Russian now. 🙂

googleblog.com

Video

AIDL Office Hour session with Dashbot and Fireflies.ai founders.

This week we talked with Fireflies founder, Sam Udotong and Dashbot‘s founder Arte Meritt. Waikit and I(Arthur Chan) came up with a lot of questions for Sam and Arte, and they proved to be very interesting panel.

Some highlights:

Fireflies uses deep NLP to classify if a certain message is task related or not.
Dashbot works with multiple platforms, and Arte gave many interesting insights on the difference between different platforms.
Waikit asked how both founders feel about the future of chatbots in the next 5 years.
Many of us working on chatbot because we want to achieve some kind of artificial general intelligence (AGI), but as we work on the problem for a while, we know it is infeasible in our time. So I asked towards the end of the video: “Anything other than AGI, what are the tasks you want to automate the most?” And I think both founders gave me good and practical answers.

youtube.com

Book Review

“Deep Learning” by Ian GoodFellow et al

I (Arthur) have some leisure lately to browse “Deep Learning” by Goodfellow for the first time. Since it is known as the bible of deep learning, I decide to write a short afterthought post, they are in point form and not too structured.

If you want to learn the zen of deep learning, “Deep Learning” is the book. In a nutshell, “Deep Learning” is an introductory style text book on nearly every contemporary fields in deep learning. It has a thorough chapter covered Backprop, perhaps best introductory material on SGD, computational graph and Convnet. So the book is very suitable for those who want to further their knowledge after going through 4-5 introductory DL classes.
Chapter 2 is supposed to go through the basic Math, but it’s unlikely to cover everything the book requires. PRML Chapter 6 seems to be a good preliminary before you start reading the book. If you don’t feel comfortable about matrix calculus, perhaps you want to read “Matrix Algebra” by Abadir as well.
There are three parts of the book, Part 1 is all about the basics: math, basic ML, backprop, SGD and such. Part 2 is about how DL is used in real-life applications, Part 3 is about research topics such as E.M. and graphical model in deep learning, or generative models. All three parts deserve your time. The Math and general ML in Part 1 may be better replaced by more technical text such as PRML. But then the rest of the materials are deeper than the popular DL classes. You will also find relevant citations easily.
I enjoyed Part 1 and 2 a lot, mostly because they are deeper and fill me with interesting details. What about Part 3? While I don’t quite grok all the Math, Part 3 is strangely inspiring. For example, I notice a comparison of graphical models and NN. There is also how E.M. is used in latent model. Of course, there is an extensive survey on generative models. It covers difficult models such as deep Boltmann machine, spike-and-slab RBM and many variations. Reading Part 3 makes me want to learn classical machinelearning techniques, such as mixture models and graphical models better.
So I will say you will enjoy Part 3 if you are 1) a DL researcher in unsupervised learning and generative model or 2) someone wants to squeeze out the last bit of performance through pre-training, 3) someone who want to compare other deep methods such as mixture models or graphical model and NN.

Anyway, that’s what I have now. May be I will summarize in a blog post later on, but enjoy these random thoughts for now.

Original version from my (Arthur’s) blog post.

Artificial Intelligence and Deep Learning Weekly

Uncategorized

AIDL Weekly Issue #5 – Special Issue on Self-Driving Cars

Post author By grandjanitor
Post date May 23, 2019
No Comments on AIDL Weekly Issue #5 – Special Issue on Self-Driving Cars

Editorial

Intel/Mobileye big deal, more Waymo/Uber drama, etc. – yet another big week for self-driving cars! It’s not hyperbole to say that self-driving cars represent one of largest market-size application for A.I. The jockeying for positions had been happening for a while and won’t abate anytime soon. Intel largely missed the boat on mobile and is determined not to miss it on A.I. and autonomous vehicles. There’s a subsystem race going on in the h/w and s/w space to solve all the myriad problems.

At the highest level, a successful architecture would need to at least understand:

Where am I (car) and where am I going? Need maps, GPS, odometry data.
What’s around me based on my sensors? Need car sensors – LIDAR, camera, ultrasound, audio, infrared, etc. Need low-level intelligence / classifiers on each of those signals to identify and make sense of road signs, humans, pets, random objects on the street
What’s around me based on external telemetry data? Need other car-related positioning and odometry data, weather data, traffic pattern data
How do I make sense of what’s around me, what other objects are doing and whether I’m doing the right actions? A brain that takes internal sensor data and external telemetry data, makes sense of them and outputs an action. This is an oversimplification and is inherently a really tough challenge. There are so many corner and non-corner cases to account for. No company wants to own the first self-driving car that kills a pedestrian. How does the algorithm weigh navigation decisions in an unavoidable accident scenario where you could hit one group of pedestrians or another?
How do I train car to be smarter over time? Need phone home feature to a remote human operator if car can’t decide what to do, generating training data
Etc.

This isn’t meant to be exhaustive, but as you can see, the moment we start thinking about all the things a human driver does in navigation and in response to other moving blobs on the street, it becomes incredibly hard to create a driving machine replica. We suspect there will be multiple waves of innovation here over time, along the dimensions of better sensors, more types of telemetry data, better cost curve, and better brain.

Artificial Intelligence and Deep Learning Weekly

News

California’s Green Light to Self-Driving Cars

The California State is giving loosening regulations on self-driving cars. This makes vehicles with automated level 4 or 5 be more easily tested. Vehicles such as Uber’s self-driving taxi would also be allowed to pick up customers, but not for a fee. California’s move is bold as many other cities in the world, such as Singapore, Boston, Pittsburgh, competes to be the leading city of driverless car. Free pick-up has been implemented in Pittsburgh back September last year.

ieee.org

Intel buys Mobileye for $15B

Intel acquisition of Mobileye deserves some attention: the price is expensive. Intel is paying 60 times of Mobileye’s earning. Many people believe that it is overpriced. So the question is : is it worthwhile?

From the A.I. perspective, it does make sense. Intel’s rival Nvidia released Jetson TX2, and is positioning to get into autonomous vehicles via tier-1 auto suppliers like Bosch. Intel missed the mobile wave and they are determined not to miss this wave.

Mobileye’s Autopilot was one of the most prominent systems used in many self-driving system, lauded as the first deep-learning-based system for vehicle detection. Mobileye is a major supplier for many self-driving companies, which used to include Tesla, but Tesla dropped the relationship after a fatal accident.
Currently Mobileye is investing reinforcement learning to improve their system. They also have a simulated system to assist such learning.

If you compare Mobileye with Drive.ai (also featured in this issue), perhaps the deep learning technology is less sexy. Drive.ai uses an end-to-end training strategy which usually results in bigger gain. But then Mobileye has been around, its existing customer base is more massive.

It’s reasonable to think that acquisition of Mobileye, similar to Nervanna’s, would assist Intel’s progress of creating specialized hardware chip on deep learning, and more important, the development platform for adopters.

technologyreview.com

Deep Learning and Drive.ai

For a new company in the self-driving space, Drive.ai stands out among the crowd, and made headlines multiple times. For example, we saw a video back in February that Drive can take care of a driving on rainy night, and very difficult driving condition such as malfunction red-light And we learn from several articles (such as this one and the IEEE piece), that Drive attempts to train their network end-to-end. So we can imagine it is a sort of all-in-one architecture that include object-detection, and their approach would come up with both driving decision and if it is safe one or not.

The IEEE piece tells one more interesting technical strength: which is how it can obtain data. It uses a small band of annotators, but most of them are used to train new scenarios. Whereas the deep-learning-based system is used to automate validate the data itself.

The test the author went through was based on automated level Level 2, which assumes human is on the driving seat. With California’s state law loosen, we might see more Drive to showcase their close to Level-4 automated driving in the future.

ieee.org

Update on Waymo’s Lawsuit

Woo. Waymo is ramping up its legal actions on Uber! This time it is filing for an injunction of the whole Uber’s self-driving car operation (!). Enough said. As blogger Daniel Compton analyzed, it may be provable that Levandowski and Uber collaborated on the data theft before Levandowski even left Google

theverge.com

Blog Posts

ICLR 2017 vs arxiv-sanity

Here is an interesting analysis by Andrej Karpathy, on ICLR 2017 papers. What’s the difference between peer review results and from arxiv-sanity? Spoiler alert: many papers loved by arxiv-sanity are rejected. So check out Karpathy’s article and see why.

medium.com

An analysis of Uber’s Worst Case Scenario

This is a piece written by Blogger Daniel Compton. In our view, perhaps it gives the best speculation on how the Waymo-Uber lawsuit would turn out. It ain’t pretty.

danielcompton.net

Updates on “Learning Deep Learning – My Top Five List”

At AIDL, we usually point any beginner resource question to Q4 which answers the question ” How do you compare different resources on machine learning/deep learning?” and it links to one of my (Arthur’s) blog post “Learning Deep Learning – My Top-Five List”.

Recently I gave the post a quarterly update, a summary:
– Add a “Philosophy” section
– Add a “Top-Five of Top-Five” section for people who don’t know how to start.
– Update my impression about Socher’s class – as I finish the course half-way as well as looping through Hinton’s class second time. No ranking changes.
– Also add Oxford Deep Learning and several other classes in the Lectures/Courses section
– Link Ian GoodFellow’s “Deep Learning” with some of my quick impressions of the book – as I browse through the whole book once. (Also in this Issue.)
– Add “Top-Five” for mailing list. You favorite mailing list “AIDL Weekly” is #1! 🙂

thegrandjanitor.com

Applying ML on March madness.

One of our favorite writers, Adit Despande wrote an interesting piece on how to use machine learning to predict March Madness’ result using data from last 25 years.

github.io

Open Source

YerevaNN Lab’s “A Guide to Deep Learning”

We stumbled on this tremendous resources, a courtesy from YerevaNN Lab from Armenia. For beginners of deep learning, this guide is absolutely a treasure trove – it has a ranked resource list, separated by topics and gives very clear guidance what topic should be learned. It will give you a good sense on what topic in deep learning is a must-learn and how difficulty the topic is.

yerevann.com

tf-Seq2seq

Denny Britz released a general purpose seq2seq engine, tf-seq2seq. He mentioned that this engine, unlike GNMT is meant to be general purpose, so he doesn’t guarantee tf-seq2seq would replicate Google’s results. But the engine does look promising enough, it encompasses several functionalities including Machine Translation, Text Summarization, Conversational Modeling, Image Captioning. While these applications are based on the same theory, you usually need to do some plumbing to train well if you only have a translation engine.

github.com

Video

AIDL Office Hour session with Han Shu, of Airbnb data science team

Han Shu, who leads a couple of data science teams at Airbnb, graciously took time out to talk with us about how Airbnb implements ML into the platform and how they think about it. They think as much as about org design as the code itself.

Several interesting tidbits:

Both of us (Arthur and Waikit) were interested in Airbnb’s design of scalable experimentation. Engineers can easily change their engine and quickly perform A/B tests. Results populate automately in their in-house built data platform
Han gave an insider’s view on how deep learning changed automatic speech recognition (ASR). He has been very experienced in ASR – PhD from MIT SLS group and one of the co-founders of Vlingo (later acquired by Nuance).

youtube.com

Uncategorized

AIDL Weekly Issue 4: K for Kaggle, Jetson TX2 and DeepStack

Post author By grandjanitor
Post date May 23, 2019
No Comments on AIDL Weekly Issue 4: K for Kaggle, Jetson TX2 and DeepStack

Thoughts From Your Humble Curators

Three big news last week:

Google acquired Kaggle
Jetson TX2 was out,
Just like its rival Libratus, DeepStack made headlines for beating human poker pros.

In this Editorial though, we want to bring to your attention is this little paper titled “Stopping GAN Violence: Generative Unadversarial Networks”. After 1 minute of reading, you would quickly notice that it is a fake paper. But to our dismay, there are newsletters just treat the paper as a serious one. It’s obvious that the “editors” hadn’t really read the original paper.

It is another proof point that the current deep learning space is a over-hyped. Similar happened to Rocket AI). You can get a chuckle out of it but if over-done, it could also over-correct when expectations aren’t met.

Perhaps more importantly, as a community we should spend more conscious effort to fact-check and research a source before we share. We at AIDL Weekly, follow this philosophy religiously and all sources we include are carefully checked – that’s why our newsletter stands out in the crowd of AI/ML/DL newsletters.

If you like what we are doing, check out our FB group, our YouTube channel.

And of course, please share this newsletter with friends so they can subscribe to this newsletter.

Artificial Intelligence and Deep Learning Weekly

Google is acquiring Kaggle

First brought up by TechCrunch, Google finally confirmed its acquisition of Kaggle. As you know, Kaggle is the most popular machine learning/data science competition platform. So this move of Google will make the Google’s brand and frameworks more entrenched in the data science/machine learning community. We think this is a brilliant acquisition move that helps Google on many levels – recruiting, mindshare, framework dissemination, talent pipelining, etc.

techcrunch.com

Jetson TX2

Credit-card sized Jetson TX2 is going to bring embedded deep learning to another level. TX2 has very similar layout as TX1. It has a better spec, most notably having 8G memory instead of 4G. So you can think of TX2 as the premium version of the Jetson line of Nvidia products.

In the future, Nvidia is probably going to replace the old TX1 soon – it reflects from the pricing: TX2 is only $100 more expensive than TX1, at $599, but a volume order for 1000, would only cost $399, which would be $100 cheaper than TX1’s retail price.

One interesting question we may ask. Now that the Jetson line has more memory. How would it match up with cards such as GTX 1080 or even 1080 Ti? We are curious to look at benchmarking results soon.

nvidia.com

DeepStack Demolish Pros

After Go, perhaps poker is the next frontier of A.I.. And out of all poker games, no-limit Texas Hold’em is perhaps the most important form for machines to master – it was well-known to be a game of psychology and requires complex strategy.

As you might recall, CMU’s Libratus was also able to beat the pros by 1.7 million US dollars. So how is DeepStack different then?

For starter, Libratus based its play with end-game solving, which requires a supercomputer to run 15 million core hours. Whereas DeepStack only used a gaming laptop. It doesn’t calculate all steps ahead but only few. Instead, what DeepStack doing is similar to AlphaGo: As described in their January arxiv paper, Information including pot size, public cards, and the player card ranges was fed to a 7-layer feedforward neural network. The output of the network is counterfactual values used in counterfactual regret minimization (CFR). DeepStack’s FNN requires clustering of player card ranges. The last layer is used to enforce zero-sum properties.

In layman’s terms, this makes DeepStack having human-like intuition on how different card-combinations valued. It turns out such knowledge seems to be as powerful as computation-heavy end-game solving. DeepStack beats 10 out of 11 pros with a significant margin of victory.

Last thing to mention, the workflow of Libratus requires human in the loop to remove repeated patterns daily. DeepStack doesn’t have such requirements. So while DeepStack is less publicized, we found it at least as interesting as the Libratus effort.

(Diagram Credit: p.9 of “DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker” by Matej Moravcik et al)

ieee.org

Blog Posts

Tensorflow XLA Explained

As you might recall, Tensorflow 1.0 was released at mid-February. One of the key features in Accelerated Linear Algebra (XLA). In essence, what XLA does is to remove any duplicated calculation by analyzing a Tensorflow graph. So you may think of it as something similar to the optimization flag -O2 for gcc.

XLA is still an experimental feature. (See the header of this link). So Tensorflow official document stresses that not all users will experience the benefit of such optimization. This makes sense because if you already specified a fairly optimized network manually, XLA wouldn’t do too much for you anyway.

So what’s the significance of XLA? In recent years, there has been a lot of buzz on modularization of neural networks. It’s possible that in the future we will just have several existing architectures and we just decide to merge them together. Then, having an automatic optimizer for computational graph would be very useful.

googleblog.com

On the Origin of Deep Learning

This is a fairly extensive survey of deep learning. The first author, Haohan Wang is a PhD candidate at CMU’s Language Technology Institute (LTI) and first-authored four papers in the domain of computational biology.

I (Arthur) found the material in-depth. For example, treatment of Hopfield net which starts from physical origin of the term “energy”. And it is very helpful to understand the origin of Hopfield net. (Hinton’s lecture helps, but I would be less confused to know “Oh, actually Hopfield network is isomorphic to the Ising model of magnetisim!”)

Notice though, at this point, Wang is still constantly updating the arxiv version of the paper – the paper is now at version 4 after it was two weeks ago. I do find the paper quite enjoyable and it covers some less discussed concepts from standard on-line classes. So I recommend you all to read the paper.

arxiv.org

Prank of the Week: “Stopping GAN Violence: Generative Unadversarial Networks”

Many AIDLer have shared the following paper to our forum. Just want to inform you. So FYI: this is a prank. Couple of very non-subtle hints:

Authorship: Samuel Albanie works for “Institute of Deep Statistical Harmony”,
“In this work, we quantify the financial, social, spiritual, cultural, grammatical and dermatological impact of this aggression and address the issue by proposing a more peaceful approach which we term Generative Unadversarial Networks (GUNs)” But then of course, there is no social data in the paper at all,
The github is almost empty, with python scripts which generate a fake MNIST digits.
And have you look at the experimental section?

Seeing it as a joke, we lol after reading the “paper”. Perhaps more hilarious to us is that some other sources treat this as a legit paper and treat it as “paper of the week”-type of the material.

That just teaches all of us a lesson: faked writing is everywhere these days. So always read the paper/sources before you share them. This says for AI/DL, perhaps it also says for everything you read.

arxiv.org

Open Source

AudioSet: A sound vocabulary

Google released another impressive dataset – this time is audio-centric. In case you don’t know, noise detection is very painful for tasks such as speech recognition. And if you want to use a model-based approach to solve such problem, you need to have good negative training data such as noise or general sound events. AudioSet fills in this gap.

AudioSet is an effort stemmed from YouTube content analysis – which is perhaps the most difficult speech recognition task for humanity now. Here is a link to the original ICASSP 2017 paper. Again, Kudo to Google. We think AudioSet is a very useful resource for practitioners in speech and audio processing.

google.com

FAISS

According to Prof. Yann LeCun FAISS is #1 in Github’s “trending in open source” list of C++ projects (TensorFlow is #3). FAISS is a very fast implementation of nearest neighbor search for dense vector and can fully utilize system’s memory.

github.com

Video

A DARPA Perspective on Artificial Intelligence

Here is an educational video from DARPA which explains concepts of A.I. I found it a rare gem, because it starts from the days before statistical learning is used (what Launchbury called the “First Wave”. It also doesn’t dumb down in explanation. For example, John Launchbury explain machine learning with the Manifold Hypothesis, which is the key to understand why neural network is so powerful. He also talks about the challenges of the current days technology (“Second Wave”). E.g. adversarial examples, how difficult it is to come up with a general intelligent conversational system, or bot. So check it out, it only takes around 20 mins to listen it through. Finally he briefly discuss what should be the future of A.I. from DARPA’s perspective (“The Third Wave”), which he defines as combining huge amount of data with context data.

youtube.com

Member’s Question

Question from an AIDL Member

Q. (Rephrases from a question asked by Flávio Schuindt) I’ve been studying classification problems with deep learning and now I can understand quite well it. Activation functions, regularizeres, cost functions, etc. Now, I think its time to step forward. What I am really trying to do now is enter in the deep learning image segmentation world. It’s a more complicated problem than classification (object occlusion, lightning variations, etc). My first question is: How can I approach this king of problem? […]

A. You do hit one of the toughest (but hot) problem in deep-learning-based image processing. Many people confuse problems such as image detection/segmentation with image classification. Here are some useful notes.

First of all, have you watched Karpathy’s 2016 cs231n‘s lecture 8 and 13? Those lectures should be your starting points to work on segmentation. Notice that image localization/detection/ segmentation are 3 different things. Localization and detection find bounding boxes and their techniques/concepts can be helpful on “instance segmentation”. “Semantic segmentation” requires downsampling/upsampling architecture. (see below.)
Is your problem more a “semantic segmentation” problem of “instance segmentation” problem? (See cs231n’s lecture 13) The former comes up with regions of different meaning, the latter comes up with instances.
Are you identifying something which always appear? If that’s the case you don’t have to use flunky detection technique, treat it as a localization problem and you can solve by Backprop a simple loss function (as described in cs231n lecture 8). If it might or might not appear, then a detection-type of pipeline might be necessary.
If you do need to use detection-type of pipeline. Does standard segment proposal techniques work for your domain? This is crucial, because at least the beginning of your segmentation research, you have to do find segment proposals.
Lastly if you decide this is really a semantic segmentation problem, then most likely your major task is to adopt an existing pre-train network. Very likely your goal is to transfer learning. Of course check out my point 2 and see if this is really the case.

Artificial Intelligence and Deep Learning Weekly