For years, if you want to DIY a deep learning machine, you read Dettmers' post "Which GPU(s) to Get for Deep Learning" which gives the final verdict of which card should buy.
So with 1080Ti and Pascal Tx just released last month, should you toss in $600 more to buy a Pascal Tx? Let's look at few categories from Dettmer's "tl;dr" recommendation:
- Best GPU overall (by a small margin): Titan Xp
- I have little money: GTX 1060 (6GB)
- I have almost no money: GTX 1050 Ti (4GB)
- I am a competitive computer vision researcher: NVIDIA Titan Xp; do not upgrade from existing Titan X (Pascal or Maxwell)
- I am a researcher: GTX 1080 Ti
We like Dettmer's suggestion. GTX 1080Ti is a fairly unusual card - it is the first GTX card which has more than 8G of RAM. So in a way, GTX 1080Ti makes it hard for customers to buy the Titan series. But then if you do want to do the best computer vision experiment, a Titan X (Pascal) upper is necessary.
Another note here is the new Titan Xp isn't as impressive as one hopes. Dettmer's guide is quite clear: don't replace your Titan X (Pascal) yet with Xp. We also think that's solid advice from a cost-efficiency point of view.
In any case, Dettmer's guide should teach you how to DIY your dream DL machines, check out his post regardless of your goal.
What is federated learning? Basically it is a kind of distributed method for model training. First you train models from different devices, then the devices upload updates and the server averages them to become a single model.
From a deep learning point of view, such training requires parallelizing SGD into multiple parts, which is quite hard. So the merit of Google Research paper is that they figure you can just use a large batch size on each device. In that case, you can avoid using small-step SGD and use lower bandwidth, which is precious in a federated learning scenario. (There are also techniques to deal with the non-IID-ness of the scenario. But a key insight the researchers found is that averaging behaves surprisingly well.)
Perhaps the more important issue is privacy. For example, a new user input could be learned from the model, and it's plausible this new user input could be used to derive info of the user. That is protected by the "Secure Aggregation Protocol" (http://eprint.iacr.org/2017/281).
All-in-all, this is interesting work from Google. Check out the original blog post for more detail.
From time to time, simple ideas trump complicated and over-engineered ones. OpenAI's unsupervised sentiment neuron is one of those cases. The idea is very simple, you first train a character language model on a large corpora. In OpenAI's case, it is a multiplicative LSTM with x-units. But no matter how complicated the model is, you are essentially just modeling the underlying distribution. Notice that, at this point, all the data is still unlabeled
Now this is the interesting part, with labeled data, you can take the x-units (now dubbed as "unsupervised sentiment neurons" ) and train a linear model out of them. When OpenAI did it, this turns out to be surprisingly effective and beat the best technique on the Stanford Sentiment Treebank task. More importantly, even if they are using 30-100x less label data, they can still match results of other methods. It took a month to train such models, but the result is very impressive.
This result is reminiscent of pre-training of DNN. It also makes you wonder, can we use method other than linear model to train on the unsupervised neurons and get better results? In any case, this piece is thought-provoking. Yet another great piece of work from OpenAI.
Prof. Hinton's "Neural Networks and Machine Learning" (NNML) is perhaps the first MOOC on deep learning. In this review, Arthur will discuss whether you should take this course, when you should take it. And more relevant to our audience: given the many courses/classes/tutorials you could find, is NNML still relevant? He will offer some answers in my post.
This is post by Andrej Kapathy, and he looks at various trends of machine learning, including frameworks, models, optimization algorithms etc. Sounds like "Fully Convolutional Encoder Decoder BatchNorm ResNet GAN applied to Style Transfer, optimized with Adam." is not that far off. 🙂