Last week:
Libratus is the biggest news item this week. In retrospect, it’s probably as huge as AlphaGo. The surprising part is it has nothing to do with deep-learning. So it worths our time to look at it closely.
- We learned that Libratus crushes human professional player in head-up no-limit holdem (NLH). How does it work? Perhaps the Wired and the Spectrum articles tell us the most.
- First of all, NLH is not as commonly played in Go, but it is interesting because people play real-money on it. And we are talking about big money. World Series of Poker holds a yearly poker tournament, all top-10 players will become instant millionaires. Among pros, holdem is known as the “Cadillac of Poker” coined by Doyle Brunson. That implies mastering holdem is the key skill in poker.
- Limit Holdem, which pros generally think it is a “chess”-like game. Polaris from University of Alberta bested humans in three wins back in 2008.
- Not NLH until now, so let’s think about how you would model a NLH in general. In NLH, the game states is 10^165, close to Go. Since the game only 5 streets, you easily get into what other game players called end-game. It’s just that given the large number of possibility of bet size, the game-state blow up very easily.
- So in run-time you can only evaluate a portion of the game tree. Since the betting is continuous, the bet is usually discretized such that the evaluation is tractable with your compute, known as “action abstraction”, actual bet size is usually called “off-tree” betting. These off-tree betting will then translate to in tree action abstraction in run-time, known as “action translation”. Of course, there are different types of tree evaluation.
- Now, what is the merit of Libratus, why does it win? There seems to be three distinct factors, the first two is about the end-game.
- There is a new end-game solver (http://www.cs.cmu.edu/~noamb/papers/17-AAAI-Refinement.pdf) which features a new criterion to evaluate game tree, called Reach-MaxMargin.
- Also in the paper, the authors suggest a way to solve an end-game given the player bet size. So they no longer use action translation to translate an off-tree bet into the game abstraction. This considerably reduce “Regret”.
- What is the third factor? As it turns out, in the past human-computer games, humans were able to easily exploit machine by noticing machine’s betting patterns. So the CMU team used an interesting strategy, every night, the team will manually tune the system such that repeated betting patterns will be removed. That confuses human pro. And Dong Kim, the best player against the machine, feel like they are dealing with a different machine every day.
- These seems to be the reasons why the pro is crushed. Notice that this is a rematch, the pros won in a small margin back in 2015, but the result this time shows that there are 99.8% chance the machine is beating humans. (I am handwaving here because you need to talk about the big blinds size to talk about winnings. Unfortunately I couldn’t look it up.)
- To me, this Libratus win is very closed to say computer is able to beat the best tournament head-up players. But poker players will tell you the best players are cash-game players. And head-up plays would not be representative because bread-and-butter games are usually 6 to 10 player games. So we will probably hear more about pokerbot in the future.
Anyway, that’s what I have this week. We will resume our office hour next week. Waikit will tell you more in the next couple of days.
If you like this message, subscribe the Grand Janitor Blog’s RSS feed. You can also find me (Arthur) at twitter, LinkedIn, Plus, Clarity.fm. Together with Waikit Lau, I maintain the Deep Learning Facebook forum. Also check out my awesome employer: Voci.