Elon Musk is teasing a new AI hardware from Teslas. In a non-stream conversation, he said "Jim is developing specialized AI hardware that we think will be the best in the world,", according to one person at the event. This CNBC report actually quotes Stephen Merity, which we know has a lot of credibility.
One thing we learn from NIPS 2017. Titan X has a successor: Titan V. It now has 110 teraflops of raw computing capability, 9x of its predecessor. But it costs $3000. This is close to a low-end Tesla card such as K2000 or K4000. Jensen Huang told us this is the best card for desktop, we have no doubt.
On Google's "AI Built an AI That Outperforms Any Made by Humans"
For those who are new at AIDL. AIDL has what we called "The Three Pillars of Posting". i.e. We require members to post articles which are relevant, non-commercial and non-sensational. When a piece of news with sensationalism start to spread, admin of AIDL (in this case, Arthur) would fact-check the relevant literature and source material and decide if certain pieces should be rejected. And this time we are going to fact-check a popular yet misleading piece "AI Built an AI That Outperforms Any Made by Humans".
The first thing to notice: "AI Built an AI That Outperforms Any Made by Humans" by a site which historically sensationalized news. The same site was involved in sensationalizing the early version of AutoML, as well as the notorious "AI learn language" fake news wave.
So what is it this time? Well, it all starts from Google's AutoML published in May 2017 If you look at the page carefully, you will notice that it is basically just a tuning technique using reinforcement learning. At the time, The research only worked at CIFAR-10 and PennTree Banks.
But then Google's AutoML released another version in November. The gist is that Google beat SOTA results in Coco and Imagenet. Of course, if you are a researcher, you will simply interpret it as "Oh, now automatic tuning now became a thing, it could be a staple of the latest evaluation!" The model is now distributed as NASnet.
Unfortunately, this is not how the popular outlets interpret. e.g. Sites were claiming "AI Built an AI That Outperforms Any Made by Humans". Even more outrageous is some sites are claiming "AI is creating its own 'AI child'". Both claims are false. Why?
As we just said, Google's program is an RL-based program which propose the child architecture, isn't this parent program still built by humans? So the first statement is refutable. It is just that someone wrote a tuning program, more sophisticated it is, but still a tuning program.
And, if you are imagining "Oh AI is building itself!!" and have this imagery that AI is now self-replicating, you cannot be more wrong. Again, remember that the child architecture is used for other tasks such as image classification. These "children" doesn't create yet another group of descendants.
A much less confusing way to put it is that "Google RL-based AI now able to tune results better than humans in some tasks." Don't get us wrong, this is still an exciting result, but it doesn't give any sense of "machine is procreating itself".
We hope this article clears up the matter. We rate the claim "AI Built an AI That Outperforms Any Made by Humans" false.
Perhaps the highlight of this NIPS is the debate between NIPS Test of Time Award winner, Al Rahimi, and deep learning demi-god Prof. Yann Lecun. What happened?
In the Award presentation, Rahim said "Machine learning has become alchemy." This is meant to be a counter to another well-known saying from Andrew Ng: "Artificial intelligence is the new electricity." He cuts deep into the current problem of machine learning: lacks of theoretical framework.
Prof. Lecun seems to be very upset by Rahimis' comment. In his long Facebook post, he raised his disagreement. His main point is that historically engineering effort always precede theoretical results. As he said,
the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science.
This exchange spawns a great debate within the community. e.g. Ferenc Huszár in Alchemy, Rigour and Engineering, he raised a good point that while you can have incomplete or non-rigorous theoretically understanding, but having non-rigorous testing method like many papers is bad.
We really don't want to take side in the debates. But let's wrap it up with the rhyme battle between "Bored Yann LeCun"(The parody account of Prof. LeCun) and Ali Rahimi?
Rocking that alchemy // from my penthouse balcony // my empirical, lyrical modality // it's like fine sashimi, Ali Rahimi // I choose elbow grease // over rigor police #feelthelearn
for which Ali Rahimi replied,
phat beats to the dome // like weights dropped at random // my training methodology // exposes yours' pathology // i'm getting warmed up // take your ball and go home
Another great article by Sebastian Ruder, which you can see it as a sequel of his An overview of gradient descent optimization algorithms. This includes a short but concrete explanation of the latest ADAM with proper implementation of weight decay, warm restart, latest studies of generalization and more.
Adit Deshpande wrote a great article on the development of deep learning the last 5 years. We enjoy it a lot - not only it summarized what happened, it also gives a set of great pointers to different papers and resources.
This is a brilliant article written by Shan Carter and Michael Nielsen. It explains how generative technology of AI help humans to create. Long time readers of AIDL shouldn't find Nielsen foreign. He is the author of the very educational Neural Network and Deep Learning. Of course, he is more well-known to write the popular textbook Quantum Computation and Quantum Information.
How do you read Duda and Hart's "Pattern Classification"?
Question (rephrase): I was reading the book "Pattern Classification" by Duda and Hart, but I found it difficult to follow the mathematics, what should I do?
Answer: (by Arthur) You are reading a good book - Duda and Hart is known to be one of the Bibles in the field. But perhaps is slightly beyond your skill at this point.
My suggestion is to make sure you understand basic derivations such as linear regression and perceptron. Also if you get stuck with the book for a long time, try to go through Andrew Ng's Machine Learning. Granted the course is much easier than Duda and Hart, but you would also have the outline what you are trying to prove.
One specific advice on the derivation of NN - I recommend you to read Chapter 2 of Michael Nielsen's book first because he is very good at defining clear notation. e.g. meaning of the letter z changes in different text books, but it is crucial to know exactly what it means to follow a derivation.
Here is a read on the paper "Fixing Weight Decay Regularization in Adam", a major correction of how weight decay with ADAM should be used together.
The key to understand this paper is that weight decay is often implemented as L2 regularization in the optimization function. And we often think that the two concepts, weight decay and L2 regularization, are the same.
But then the authors, Loshchilov and Hutter observed a very simple fact: which is if you implement weight decay through L2 regularization often reduce the effect of weight decay. So that explain why ADAM has poorer generalization power.
We will just refer you guys to Algorithm 1 and 2 on p.3. And if you follow the text you will quickly realize the past implementation was just wrong. The authors also proposed how you can fix the update to get weight decay correctly. (The green highlights.)
As you know the story, AlphaZero is not only just playing Go, and is now playing Chess and Shogi. By itself this is a significant event, because most stoa board game engine are specific to games. General game playing engines are seen as novelties but not a norm.
Another note, most Chess and Shogi engines are based on alpha-beta search. But then AlphaZero is now using Monte-Carlo Tree Search which simulate board positions. Positions are order by scores from a board NN. State is entered in the order of visit counts and value of the board according to NN. So you can see this is not just AlphaZero is beating up more games, it will be more a paradigm shift of both computer Chess and Shogi community.
As you know, AlphaZero beats the strongest program in 2016, Stockfish. But one analysis which caught my eyes: In chess, DeepMind researchers also fix the first few moves of AlphaZero so that it follows the top 12 most-play openings for black and white. If you are into chess, Queen's Gambit, several Sicilian Defences, The French, KID. They show that AlphaZero can beat Stockfish in multiple type of situations, and openings doesn't matter too much.
But then, would AlphaZero beat all computer players such as Shredder or Komodo? No one knows the answers yet.
One more thing: AlphaZero doesn't assume zero knowledge neither. As Denny Britz points out in his tweet, AlphaZero was provided with perfect knowledge in terms of rules. So intriguing rules such as castling, threefold repetition or 50-move drawing rules are all provided to the machine. Perhaps Britz points out, may be we want to focus on how to let the machine to figure out the rules themselves in the future.