We learnt more about DeepMind last week - we learn the hefty price to run it and its struggle to correct its image since the debacle of DeepMind Health-Royal Free event. But we also learnt that our beloved Wavenet is now in production within Google Assistant, and it's whopping 1000 times faster. This week, we'll cover DeepMind more in-depth.
We also have an issue filled with contents: "Confession of AI researchers" is our favorite link. We answer our member question on how to come up with new AI idea in "Questions from Members". And we dive into an interesting paper: "Unsupervised Hypernym Detection by Distributional Inclusion Vector Embedding" by Haw-Shiuan et al.
As always, if you like our newsletter, feel free to subscribe and forward it to friends!
This Guardian piece is perhaps the best at describing the new DeepMind Ethics & Society. At first, we thought that this is repurposing the older DeepMind Health advisor panel but then the piece clearly stated that it is a separate society.
This move is in response to the previous year debacle of DeepMind on how to treat patient data. As we reported in Issue #20, an independent commissioner commented on the how DeepMind inappropriately treat patient data: "Just that you can, doesn't mean that you should." The whole event also cost 5 times the legal fee for DeepMind than previous year (see the previous piece on "How Much Does DeepMind Cost?").
DeepMind launching Wavenet in production system is perhaps one of the most exciting news last week. What is amazing - it's 1000 times faster than one-year-old version and it can run on a TPU. The speed up doesn't seem lossy - MOS around ~4.35 which human voice is not too far away: 4.67.
This is an interesting thread on how one could overcome the issue of lacking enough Mathematics background when doing research or reading AI papers. Well, 1) there's never enough Math and 2) everyone actually feels as inadequate as you do. Everyone has impostor syndrome except maybe for Geoffrey Hinton.
Here is an interesting article from OpenAI's Jakob Foerster, which we think it's a very cute idea - using the non-linearity of IEEE float, Foerster was able to use the a linear network to achieve non-linearity based on evolution strategy.
Here are all the videos of the first Cognitive Computational Neuroscience (CCN) conference. Perhaps the ones which interest us is the opening remark from Yann Lecun and the closing remark from Yoshua Bengio.
1, What other people are doing and is it possible to make a twist about it?
2, What is a problem which you want to solve in your life. Then think, is there anyway AI/ML can help you? Everyone has some - e.g. I really like to make a the old-style Nintendo Final Fantasy style game. But then drawing the graphics of bitmap character takes insanely amount of time. So is there any way A.I. can help me? Yes, one potential idea is to create an image generator.
Would these ideas work? Who knows? But that's how you come up with ideas. You ignore the feasibility part for the moment. If you feel it is really hard for you to come up with ideas, chances are you are too caught up with the technical field. Read some books, listen to music, make some art and daydream a bit. Then ideas will come.
This work is to model relationship such as "poodle is-a dog" or generally the word entailment problem in computational linguistic.
But what it follows is the more modern word-embedding-based approach. The hypernym problems are more often solved by Bag of Words (BOW) model. So this paper is perhaps one of the more earlier papers which used word embedding type of method on hypernym detection.
It's good to think of why this method works. So in the case of synonyms, we all know about the distributional hypothesis - or from Firth: "a word is characterized by the company it keeps".
What about hypernyms? the corresponding hypotheses are perhaps two "Distribution Informativeness Hypothesis" and "Distributional Inclusion Hypothesis". First stated that semantically general word occur more often. Second states that context sets of the words tends to be subset of the hypernym. So combining the two, "chihuahua" would occur less often than "dog", and adjective which can be applied to "chihuahua" can all be applied to "dog", but not necessarily vice versa.
Then there is DIVE. So in essence, it is a modification of Mikolov's skipgram but with two major modificiations: Non-negativity of skip-gram as well as weighting the negative sample with the inverse of how often a word appears. That's pretty much is what DIVE is.
Then there is another part about PMI (point-wise mutual information). In a nutshell, Levy and Goldberg proved that the skip-gram formulation is equivalent to matrix factorization with elements equals to PMI or (p(w,c)/p(w)p(c). The authors also give a similar formulation to DIVE. The nifty part is they also apply a filter to the PMI matrix if the occurring words are too small.
Then there is a fairly extensive evaluation on the technique which include 11 datasets. And the authors have tested their technique with breakdown such as with/without DIVE, with/without PMI filtering. There is a very good improvement from SBOW approach.
Interesting enough. This doesn't seem to be picked by arxiv-sanity. Perhaps hypernym is more a side topic in linguistic. But it's certainly a good read. Let's see how other members think about this?