Category Archives: Uncategorized

AIDL Weekly Issue 4: K for Kaggle, Jetson TX2 and DeepStack

Thoughts From Your Humble Curators

Three big news last week:

  1. Google acquired Kaggle
  2. Jetson TX2 was out,
  3. Just like its rival Libratus, DeepStack made headlines for beating human poker pros.

In this Editorial though, we want to bring to your attention is this little paper titled "Stopping GAN Violence: Generative Unadversarial Networks". After 1 minute of reading, you would quickly notice that it is a fake paper. But to our dismay, there are newsletters just treat the paper as a serious one. It's obvious that the "editors" hadn't really read the original paper.

It is another proof point that the current deep learning space is a over-hyped. Similar happened to Rocket AI). You can get a chuckle out of it but if over-done, it could also over-correct when expectations aren't met.

Perhaps more importantly, as a community we should spend more conscious effort to fact-check and research a source before we share. We at AIDL Weekly, follow this philosophy religiously and all sources we include are carefully checked - that's why our newsletter stands out in the crowd of AI/ML/DL newsletters.

If you like what we are doing, check out our FB group, our YouTube channel.

And of course, please share this newsletter with friends so they can subscribe to this newsletter.

Artificial Intelligence and Deep Learning Weekly



Blog Posts



Open Source


Video

Member's Question

Question from an AIDL Member

Q. (Rephrases from a question asked by Flávio Schuindt) I've been studying classification problems with deep learning and now I can understand quite well it. Activation functions, regularizeres, cost functions, etc. Now, I think its time to step forward. What I am really trying to do now is enter in the deep learning image segmentation world. It's a more complicated problem than classification (object occlusion, lightning variations, etc). My first question is: How can I approach this king of problem? [...]

A. You do hit one of the toughest (but hot) problem in deep-learning-based image processing. Many people confuse problems such as image detection/segmentation with image classification. Here are some useful notes.

  1. First of all, have you watched Karpathy's 2016 cs231n's lecture 8 and 13? Those lectures should be your starting points to work on segmentation. Notice that image localization/detection/ segmentation are 3 different things. Localization and detection find bounding boxes and their techniques/concepts can be helpful on "instance segmentation". "Semantic segmentation" requires downsampling/upsampling architecture. (see below.)
  2. Is your problem more a "semantic segmentation" problem of "instance segmentation" problem? (See cs231n's lecture 13) The former comes up with regions of different meaning, the latter comes up with instances.
  3. Are you identifying something which always appear? If that's the case you don't have to use flunky detection technique, treat it as a localization problem and you can solve by Backprop a simple loss function (as described in cs231n lecture 8). If it might or might not appear, then a detection-type of pipeline might be necessary.
  4. If you do need to use detection-type of pipeline. Does standard segment proposal techniques work for your domain? This is crucial, because at least the beginning of your segmentation research, you have to do find segment proposals.
  5. Lastly if you decide this is really a semantic segmentation problem, then most likely your major task is to adopt an existing pre-train network. Very likely your goal is to transfer learning. Of course check out my point 2 and see if this is really the case.

Artificial Intelligence and Deep Learning Weekly

AIDL Weekly Issue 3

Issue 3  

Editorial

Thoughts From Your Humble Curators

What a week in Machine Learning! Last week we saw Waymo high-profile lawsuit against Uber, as well as perhaps the first API against online trolling from Jigsaw. Both events got a lot of media coverage. Both of these events are featured in our News section, with our analysis on it.

On exciting news: GTX 1080 Ti is here yesterday, and featured in this issue. Its spec is more impressive than the $1.2k Titan X, and only costs $699.

In other news, you might have heard of DeepCoder in the last few weeks, and how it purportedly steals and integrates code from other repos. Well, it's fake. We feature a piece from Stephen Merity which debunks these hyped news.

One must-see this week perhaps is Kleiner Perkins' Mike Abbott's interview with Prof. Fei-Fei Li from Stanford. The discussion on how A.I. startups can compete with the larger incumbents is definitely worth watching.

As always, check out our FB group, our YouTube channel.

And of course, please share this newsletter with friends so they can subscribe to this newsletter.

Artificial Intelligence and Deep Learning Weekly

News



Blog Posts

Open Source




Video


AIDL Weekly Issue 2 - Gamalon/Batch Renormalization/TF 1.0/Oxford Deep NLP

Issue 2  

Editorial

Thoughts From Your Humble Curators

How to create a good A.I. newsletter? What first comes to everybody's mind is to simply aggregate a lot of links. This is very common in deep learning resource lists, say "Cool List of XX in Deep Learning". Our experience is that you usually have to sift through 100-200 links and decide which are useful.

We believe there is a better way: In AIDL Weekly, we only choose important news and always provide detailed analysis on each of them. For example, here we take a look at newsworthy Gamalon, it is known to use a ground-breaking method to outperform deep learning and win a defense contract recently. What is the basis of its technology? We cover this in a deep dive in the "News" section.

Or you can take a look of the exciting development of batch renormalization that tackles its current shortcomings. Anyone who does normalization in training will likely benefit from the paper.

Last week, we also saw the official release of Tensorflow 1.0 as well as the 2017 Official Tensorflow Summit. We prepared two good links so that you can follow. If you love deep learning with NLP, you might also want to check out the new course from Oxford.

As always, check out our FB group, our YouTube channel, of course subscribe this newsletter.

Artificial Intelligence and Deep Learning Weekly

News


Blog Posts


Open Source


Video


Member's Question

Question from a AIDL Member

Q: (Rephrase) I am trying to learn the following languages, (...) to intermediate level, and the following languages, (...) to professional level. Would this be helpful for my career on Data Science/Machine Learning? I have a mind to work on deep learning."

This is a variation of a frequently asked question. In a nutshell, "how much programming should I learn if I want to work on deep learning?". The question itself shows misconceptions about programming and machine learning. So we include it in this issue. This is my (Arthur's) take:

  1. First thing first, usually you first decide which package to work on, if the package use language X, then you go to learn-up language X. e.g. if I want to hack Linux kernel, I would need to know C and learn Linux system calls, and perhaps some assembly language. Learning programming is more like a means to achieve a goal. Echoing J.T. Bowlin's point, programming language is more like a language, you can always learn more, but there's a point it seems to be unnecessary.
  2. Then you ask what language should be used to work on deep learning. I will say mathematics, because once you understand the greek symbols, you can translate all these symbols to code (approximately). So if you ask me what you need to learn to hack tensorflow, "Mathematics" would be the first answer, yes, the package is written by Python/C++/C, but they won't be even close in my top-5 answers. Because if you don't know what Backprop is, knowing how C++ destructor works can't make you an expert of TF.
  3. The final thing is you mentioned the term "level". What does this "level" mean? So is it like chess-rating or go-rating that someone has higher rating, they will have a better career in deep learning? It might work for competitive programming...... but real-life programming doesn't work that way. Real-life programming means you can read/write a complex programs. e.g. in C++, you use a class instead of repeating a function implementation many times to reduce programming. Same as templates. That's why class and templates are important concept and people debate their usages a lot. How can you give "levels" to such skills?

Lastly I would say if you seriously want to focus on one language, consider python, but always learn a new programming language yearly. Also pick up some side-projects, both your job and side-projects would usually give you ideas which language you should learn more.

Artificial Intelligence and Deep Learning Weekly

©2017-2019 Artificial Intelligence and Deep Learning Weekly
| Sponsorship

 

AIDL Weekly Issue 1 - First AIDL Weekly

Issue 1  

Editorial

Thoughts From Your Humble Curators

When Waikit Lau and I (Arthur Chan) started the Facebook Group Artificial Intelligence and Deep Learning Group (AIDL) last April. We have no idea it would become a group with 9000+ members, and still growing fast. (We added 1k members in last 7 days alone)

We suspect this is just the beginning of the long curvy road of a new layer of intelligence that can be applied everywhere. The question is how do we start? That was the first thing we realized back in late 2015: facing literally ten thousands of links, tutorials etc., it was like drinking from a firehose and we had a hard time to pick up the gems.

We decided to start our little AIDL group to see if we could get a community to help makes sense of the velocity of information. In less than one year, AIDL become the most active A.I. and deep learning group on Facebook. We hope to summarize, analyze, educate and disseminate and I think we have done a good job so far. This resulted in conversations flourishing in the group. We strived to have discussions one level deeper than others. For example, forum members including us fact check several pieces of news related to deep learning. This gives us a better edge in the rapidly changing field of A.I.

This newsletter follows exactly the same philosophy as our forum. We hope to summarize, analyze, educate and disseminate. We will keep an eye on the latest and most salient developments and present them in a coherent fashion to your mailbox.

We sincerely hope that AIDL will be helpful to your career or studies. Please share our newsletter here with your friends. Also check out our Youtube channel at here.

Thanks,

Your Humble Curators, Arthur and Waikit

Artificial Intelligence and Deep Learning Weekly

News



Blog Posts



Open Source



Video

©2017-2019 Artificial Intelligence and Deep Learning Weekly

 

Resources on Speech Recognition

Unlike other deep learning topics, there are no readily made video courses available on speech recognition.   So here is a list of other resources that you may find useful.

Books:

If you want to learn from online resources:

Useful E2E Speech Recognition Lecture

Important papers:

  • Deep Neural Networks for Acoustic Modeling
    in Speech Recognition" by G. Hinton et al
  • Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves

Resources on Understanding Heaps

Some assorted links for understanding heaps in user-land,

Resources on CUDA programming

Here is a list of resources for CUDA programming, in particular, in C.

Basic

Perhaps the best beginning guide is written by Mark Harris, currently spot 10 articles. They start from simple HelloWorld-type of example.  But goes deeper and deeper into important topic such as data transfer optimization, as well as shared memory.  The final 3 articles focus on optimizing real-life applications such as matrix transpose and finite-difference method.

  1. An Easy Introduction to CUDA C and C++
  2. How to Implement Performance Metrics in CUDA C/C++
  3. How to Query Device Properties and Handle Errors in CUDA C/C++
  4. How to Optimize Data Transfers in CUDA C/C++
  5. How to Overlap Data Transfers in CUDA C/C++
  6. An Even Easier Introduction to CUDA
  7. Unified Memory for CUDA Beginners
  8. An Efficient Matrix Transpose in CUDA C/C++
  9. Finite Difference Methods in CUDA C/C++, Part 1
  10. Finite Difference Methods in CUDA C/C++, Part 2

Intermediate

A very important document on the internal of Nvidia chips as well as CUDA programming models would be CUDA C Programming Guide.

In version 9, the document has around 90 pages of content with the rest of 210 pages to be appendices.  I found it very helpful to read through the content and look up the appendices from time to time.

The next document which is useful is CUDA Best Practice Guide.  You will find a lot of performance tuning tips there in the guide.

If you want to profile a CUDA application, you must use nvprof and the Visual profiler, you can find their manuals here.  Two other very good links to read are here and this one by Mark Harris.

If you want to read a very good textbook, consider to read "Professional CUDA C Programming" which I think is the best book on the topic.   You will learn what the author called "profile-based programming" which is perhaps the best way to proceed in CUDA programming.

Others

PTX ISA

Inline PTX Assembly

CuBLAS:  indispensible for linear algebra.  The original Nvidia documentation is good.  But you may also find this little gem on "cuBLAS by example" useful.

Resources on ResNet

github: https://github.com/KaimingHe/deep-residual-networks

youtube video: https://www.youtube.com/watch?v=C6tLw-rPQ2o

slide: https://pdfs.semanticscholar.org/presentation/276e/8e23f8232b55193b4c1917150e77549a4675.pdf

Quite related:

  •  Convolutional Neural Networks at Constrained Time Cost (https://arxiv.org/pdf/1412.1710.pdf) Interesting predecessor of the paper.
  • Highway networks: (https://arxiv.org/pdf/1505.00387.pdf)

Unprocessed but Good:

  • multigrid tutorial (https://www.math.ust.hk/~mawang/teaching/math532/mgtut.pdf)
  • https://blog.waya.ai/deep-residual-learning-9610bb62c355 (Talk about Resnet, Wide Resnet and ResXnet)
  • Wide Residual Networks (https://arxiv.org/pdf/1605.07146.pdf)
  • Aggregated Residual Transformations for Deep Neural Networks (https://arxiv.org/pdf/1611.05431.pdf)
  • https://www.kdnuggets.com/2016/09/deep-learning-reading-group-deep-residual-learning-image-recognition.html
  • Deep Networks with Stochastic Depth https://arxiv.org/abs/1603.09382
  • Highway network: https://arxiv.org/pdf/1505.00387.pdf
  • http://www.deeplearningpatterns.com/doku.php?id=residual
  • Ablation study: http://torch.ch/blog/2016/02/04/resnets.html
  • It's implemented in TF: https://www.quora.com/Have-the-ideas-of-Deep-Residual-Learning-for-Image-Recognition-be-implemented-in-TensorFlow
  • Wider or Deeper: Revisiting the ResNet Model for Visual Recognition: https://arxiv.org/abs/1611.10080
  • Deep Residual Learning and PDEs on Manifold: http://ymsc.tsinghua.edu.cn/~shizqi/papers/ResNet_PDE.pdf
  • Is it really because of ensemble? https://ai.stackexchange.com/questions/1997/resnets-ensemble-or-depth-makes-residual-networks-strong
  • https://vision.cornell.edu/se3/wp-content/uploads/2017/04/ResNet_Ensemble_NIPS.pdf
  • Multi-level Residual Networks from Dynamical Systems View (https://openreview.net/pdf?id=SyJS-OgR-)
  • Exploring Normalization in Deep Residual Networks with Concatenated Rectified Linear Units (https://research.fb.com/wp-content/uploads/2017/01/paper_expl_norm_on_deep_res_networks.pdf?)
  • TinyImageNet (http://cs231n.stanford.edu/reports/2016/pdfs/411_Report.pdf)
  • Predict Cortical Representation (https://www.nature.com/articles/s41598-018-22160-9)

Another summary:

https://www.commonlounge.com/discussion/839d11b9a67d464796e5ba0309611e9b

A read on "ImageNet Training in Minutes"

Yes, you read it right, Imagenet training in 24 mins. In particular, an Alexnet structure in 24 mins and Resnet-50 in 60 mins. In terms of Alexnet, in fact, You's work break the previous Facebook's record: 1 hour for Alexnet training. Last time I check, my slightly-optimized training with one single GPU will take ~7 days. Of course, I'm curious how these ideas work. So this post is a summary.

* For the most part, this is not GPU works. This is mostly more a CPU platform but accelerated by Intel Knight Landing (KNL) accelerator. Such accelerator is very suitable in HPC platforms. And there are couple of supercomputers in the world which were built up to 2000 to 10000 such CPUS.

* The gist of why KNL is good: it can divide processors on chip with the memory well. So unlike many clusters you might encounter with 8 to 16 processors, memory bandwidth is much wider. That's usually is a huge bottleneck in training speed.

* Another important line of thought here is "Can you load in more data per batch?" because that allows calculation to be parallelized much easier. The first author, You's previous work already allow the Imagenet batch goes from the standard, 256-512 to something like 8192. This thought has been there for a while, perhaps since Alex Krishevzky. His previous idea is based on adaptive calculation of learning rate per layers. Or Layer-wise Adaptive Rate Scaling (LARS).

* You then combined LARS with another insight from FB researchers: a slow warmup in learning rate. That results in his current work. And it is literally 60% faster than the previous work.

Given what we know, it's thinkable that the training can be even faster in the future. What has been blocking people seem to be 1) No. of CPUs within a system 2) How large a batch size can be loaded in. And I bet after FB read You's paper, there will be another batch of improvement as well. How about that? Don't you love competition in deep learning?

A Read on "The Consciousness Prior" By Prof. Yoshua Bengio

Here are some notes after reading Prof. Yoshua Bengio's "The Consciousness Prior". I know many of you, like Stuart Gray was quite unhappy that there is no experimental results. Yet, this is an interesting paper and good food for thought for all of us. Here are some notes:

* The consciousness mentioned in the paper is much less of what would think as qualia but more about access of the different representations.

* The terminology is not too difficult to understand, suppose there is a representation of the brain at a current time h_t, a representation RNN F is used to model such representation.

* Whereas the protagonist here is the consciousness RNN, C, which is to used to model a consciousness state. What is *consciousness state& then? It is actually a low-dimension vector of the representation h_t.

* Now one thing to notice is that Bengio believe that consciousness RNN, C should by itself include some kind of attention mechanism. What that means is that attention being used in NNMT these days should be involved. In a nutshell, C should "pay attention" to only important details within this consciousness vector when it updates itself

* I think so far the idea is already fairly interesting, in fact, just the idea one interesting thought : what if we just initialize the consciousness vector to be random instead, in that case, there will be a new representation of brain appears. As a result. this mechanism mimic human brains on exploring different scenario we conjured with imagination.

* Bengio's thought also encompass a training method which he called verifier network, V. The goal of the network to match the current representation h_t with previous consciousness state c_{t-k} (states?). The training as he envisioned can be a Variational autoencoder (VAE) or GAN.

* So far the idea doesn't quite echo with human's way of thinking. Human seems to create high-level concepts, like symbols to simplify our thinking. So Bengio addresses these difficulty by suggesting we can just use another network to generate what we mean from the consciousness state, he called it U. Perhaps we can call it generation network. This network can well-be implemented by memory-augmented networks style of architecture which distinguish key/value pairs. In this case, we can map the consciousness to more concrete symbols which symbolic logic or knowledge representation framework can use. ... Or we humans can also understand this consciousness representation.

* This all sounds good, but as you may hear from many readers of the paper. There is no experimental results. So this is really a theoretical paper.

* To be fair though, the good professor has outlined how each of the above 4 networks can be actually implemented. He also mentioned how such idea can be experimented in practice. E.g. he believe one good arena is reinforcement learning tasks.

All-in-all, this is an interesting paper, it's a pity that the detail is scanty at this point. But it's still quite worthwhile for your time to read.