In an interview that lasted 2.5 hours, Brown spoke in detail about how a few years ago he managed to create a software that beat the strongest regulars without fail.
– You have led three amazing AI projects – Libratus in heads-up poker, Pluribus in 6-max, and recently switched to the Cicero program, which fights people on equal terms in the popular Diplomacy board game. Today I would like to discuss poker. Explain for ordinary listeners, what kind of game is this – no-limit Texas hold'em?
– This is the most popular form of poker, played in all casinos and in many popular movies. The main feature is that the player himself chooses the size of bets. One of the key strategies in poker is to put your opponent in a difficult position, if you consistently succeed, then you are a good player.
– When you create your projects, what attracts you in the first place? The beauty of poker or the desire to solve global problems with the help of AI?
– The beauty of the game. I started playing poker myself when I was in school. I quickly realized that in theory there is correct strategy, and following this you can beat everyone. Already at 16 I was amazed by the diversity of poker, and I started working on AI much later.
– Did you already understand then that poker can be solved like chess or checkers? Are they solved?
– Yes, it is impossible to beat AI in these games. Poker can be solved too. It is based on the Nash equilibrium. In any finite zero-sum game, there is an optimal strategy. If one of the players plays it, then by expectation he cannot lose, regardless of the actions of the opponent. For poker, all this is also true, but only for heads-up, in 6-max everything is more complicated.
What do you mean when you say "as expected"?
– There is a huge variance in poker. Even a perfect strategy does not guarantee that you will win every hand. But the optimal strategy guarantees that over some distance you will play at least breakeven.
How do you calculate this balance?
– There are several ways. We use a counterfactual regret minimization algorithm that is based on self-learning. That is, two copies of AI start playing with each other completely randomly, but they learn during the game. At the end of the match, they analyze their actions and conduct “research” on how other decisions would affect the result, for example, raising instead of calling. Next time they choose a more profitable action. Over a long distance, such a game comes to coincide with the Nash equilibrium. This works in both chess and poker.
– What is more difficult – chess, poker, or maybe something else?
I'll say poker. First of all, because of the nature of incomplete information. This leads to the fact that we have to think not only about how exactly to play with our cards, but also how often to choose each action. The simplest example is the Rock, Paper, Scissors game. You can’t show the “rock” all the time, the opponent will immediately notice this. Just as important, the value of our actions directly depends on the frequency of their application. Balance is one of the most important elements of poker. In chess, it doesn't matter if you play the Queen's Gambit in all games or only in 10%, the expectation will not change in any way.
– If we play with one opponent all the time, then in each hand we get new information. How significant is this for AI?
– Such an approach in poker really exists. But for bots it doesn't matter. They play as if the opponent already knows their strategy. The essence of the optimal game is that you can play tens of thousands of hands, analyze everything thoroughly, but it is still impossible to beat it. This is the ideal balance, or, in other words, the Nash equilibrium. The best players in the world also tend to play Nash, but they can deviate when they notice mistakes from their opponents.
Who is the greatest player of all time and why is it Phil Hellmuth? His game is far from optimal, but he still beats everyone. So his chaotic play makes his strategy unpredictable?
– First of all, it is important to understand that the Nash equilibrium has nothing to do with predictability. Its essence is unpredictability. I fully admit that Phil Hellmuth is a very successful player. But his unpredictability has nothing to do with it. I suppose that his strength is the ability to use the weaknesses of his opponents. The poker community has been arguing for years about which is better – a GTO or an exploit. And until 2017, the exploit had more supporters, until our Libratus played against the strongest heads-up specialists. The bot did not try to adapt, did not play mind games, it simply tried to get as close to Nash as possible in every action. And as a result, he tore his opponents apart, at a distance of 120,000 hands, the bot won about $2 million from people with blinds of $100/$200.
– Tell us more about this match.
– When I was in graduate school, several groups worked on poker AI at once, and at the end of each year we held a poker championship among bots. Our bot became the champion in 2014 and 2016, and later it formed the basis of Libratus. In 2017, we challenged the best heads-up players in the world to play 120,000 hands. We allocated $200,000 in prize money for the match, which people divided among themselves depending on the result.
– In 2014-16, did you even think that a computer could beat a person in poker?
– The first such match took place back in 2015, and then the bot suffered a rather heavy defeat. But a lot has changed in two years. The first bot played according to a pre-designed strategy aimed at solving poker. During the match, he simply turned to his vast base and looked for a solution for each specific situation. And the bot of 2017 in real time tried to build a strategy that works better than the algorithms embedded in it.
However, the 2015 match gave me a lot of food for thought. I realized that people and bots have a completely different approach. Our bot has already played a certain distance with itself. In a human match, he instantly found solutions against a human, based on his previous experience. That's how it always happened. And professionals in some situations could think for 5 minutes on the river, choosing to fold or call. I came up with the idea that this is exactly what our bot lacks. We analyzed the first match and found out that it was these situations that had a huge impact on the final result.
– Are you talking about the duration of thinking?
– Yes, but it's not about timings. The bot's problem was that it always acted instantly and did not try to find a more profitable solution compared to what was put into it before the game. And right during the hand, people use their ability to rebuild, rethink and plan. Often this helps to find a more profitable action than the intuition initially suggested. A neural network produces a result in milliseconds, but if you make it take into account even insignificant additional data, the result will improve many times over. If we imagine the strategy embedded in the bot as an analogue of a neural network, then even the slightest study of additional information will make it thousands of times larger. This gave an incredible impetus to our developments.
– Can you explain with your fingers what exactly these studies consist of?
– In hold'em, players are dealt two hole cards each, that's 1,326 possible combinations. The bot begins to sort through all possible options and looks for a strategy that works better than what was originally put into it. It is important that he began to conduct these studies only on the turn, he played the first two streets instantly according to a pre-calculated strategy.
– Were there any features in the Libratus strategy that immediately caught your eye?
– The bets that a person makes usually depend on the size of the pot. And for Libratus it was completely unimportant, he played absolutely any sizing. At some point, he suddenly started putting huge overbets of 10 pots. Before the match, we didn’t think about this option at all, so we got a little worried. In practice, no one has used such a strategy before, and we ourselves did not know what to expect, what if the regulars will be able to see through it? But almost immediately it became clear that it works great, as it constantly drives the opponent into a difficult position. But the bot did this solely because in a particular situation, such a bet seemed to him the most profitable, and the fact that people did not know how to counter this turned out to be a pleasant bonus.
– Have you ever discussed your developments, for example, with Daniel Negreanu?
– Yes, I was invited to the Isle of Man to the PokerStars office when he was still working with them. He attended the general dinner, said that all this is very interesting and can be used to work on the game.
So he wasn't scared?
– Vice versa. He even showed interest in the match against the bot, claiming that he had a good chance of winning. It was a few years ago, when not everyone understood that in heads-up, a person has no chance against AI. I think now it has become obvious to everyone.
– And what is the situation in 6-max?
– Modern bots will beat people there too. We can only discuss whether this is true for all varieties of poker. I am sure that with the desire and sufficient resources for any game, you can write a bot that will beat a person without any problems. But we are only focusing on the most popular – NLHE.
– Have you ever wondered what are the main differences in the way the human brain and AI work?
– Of course, I thought about it, this is a very important question. Calculations by AlphaGo and other well-known bots are based on the Monte Carlo method. He excelled in games with complete information – chess and go. But in poker, he is completely unsuitable, because he does not understand the concept of hidden information, does not know what balance is, with what frequencies certain hands should be folded or called. The human brain is able to make a superficial plan for any game. This is very lacking in artificial intelligence – the ability to plan and reason in general.
In the past, it seemed to many that the human factor is so important in poker that the computer will never win. What did you feel at the moment when Libratus finally beat people?
The whole project was very stressful for me. Before the start of the match, for several years without days off, I was engaged only with the bot. During preparation, we had no idea at all how high a level it would take to beat a man. Libratus played with previous versions of himself, but that only gave us a general idea that we were moving in the right direction. We did not know what maximum we needed, so we threw all the resources into development. We had the power of thousands of computers at our disposal. Now this is no surprise to anyone, but for a graduate student in 2016, everything that happened was very impressive. On the first day of the fight, I was extremely nervous. Before the start, I estimated the chances of winning as approximately equal. I understood that on paper the bot was stronger and should win, but I was afraid that the professionals would notice some weaknesses and be able to take advantage of them. That is exactly what happened in our first match of 2015. The first half passed without a clear advantage, but then the players simply tore the bot apart because they noticed his shortcomings and were able to use them effectively. The most problematic situations were when the players bet all-ins. For example, for a bot there was no difference between K-high and A-high flushes, he played them exactly the same. Sometimes it does not play any role, but in some situations it can be very expensive, and professionals easily identified such moments. he played them exactly the same.
– How did the players behave during the second match?
– As I said, their prize money directly depended on the result. I was hoping they wouldn't join forces to find the bot's flaws. But the regulars immediately made it clear that their main goal was to beat the bot. They analyzed hands together. At the end of each day, we sent them the whole story with open cards. I don’t know why I decided to go for it, in poker this is invaluable information. But now I’m even glad about it, because in the end we still won. The match lasted 20 days. The bot won the first three sessions in a row, but I still continued to estimate the odds about 50/50. Then people won back a bit and believed that they again noticed some flaws in the bot's game that really were not there. By the eighth day, it became clear that they had no chance.
How did you take the victory?
– I devoted 5 years of my life to this project, so the first reaction was great satisfaction that my work was successful.
– Tell us about the bot for 6-max.
– As I said, if in a zero-sum game one of the participants acts “according to Nash”, he, at least, will not lose by expectation. It does not matter at all what the opponent will do. All this is true for heads-up. There was a long heated debate in the poker and science community about whether this would work in 6-max. I was immediately sure that it would be, because the strategy is too effective, and the number of players will not have much effect.
We successfully transitioned to 6-max when we limited the research work of the bot. Libratus calculated in advance all possible moves on the following streets until the very end. In 6-max, this is not possible, since the game is much more diverse. Therefore, we limited the bot to only a couple of moves ahead, and this turned out to be very effective. 6-max poker remains an individual game in which none of the players cooperate with each other. The rules generally forbid it. This allowed us to successfully apply the simplified Nash equilibrium in practice. Moreover, in theory it has not yet been proven that it should work in 6-max. For some games, there is already scientific evidence that the approximate Nash equilibrium works great outside of a one-on-one game. This does not apply to 6-max poker yet, but it is already obvious to me that it works at least not bad.
– Tell us about the main differences between Pluribus and Libratus.
The Pluribus was much cheaper. If we evaluate all the resources that we needed to create the bot, then Libratus cost about $100,000, and Pluribus was less than $150. It is clear that every year computers become cheaper, but still the difference is not so huge. The main reason is a change in the algorithm. The very limitation of the research work of the bot, which I have already spoken about.
– Is such a restriction also possible for Libratus?
– Of course. First, we tested the efficiency of work on a heads-up bot. If earlier Libratus needed the power of thousands of computers, then the new algorithm made it possible to run it on any laptop.
– As someone who loves poker himself, who do you consider to be the greatest player of all time? By the way, with the help of AI it is possible to assess the level of a person's game? Is there any semblance of an ELO rating in poker at all?
– It is possible in theory, but unlikely in practice. All because of the huge variance. Even a bad player can end the year in the positive, and a top reg will play the same distance in the negative. But in modern poker, as well as in chess, it is now impossible to imagine working on a game without software.
– I was impressed by how cleverly you ignored the question about the best player of all time.
– It's a difficult question. In chess, we cannot compare Magnus Carlsen and Garry Kasparov. The game has evolved too much. In poker, modern poker players are many times more skillful than even those who played only 5-10 years ago. To be frank, the almost ESPN stars of the poker boom are mediocre.
At least from the technical side, I admit that they are still strong in reading opponents. With that in mind, I'll name Daniel Negreanu. He is one of the strongest players of the past, tries to keep up with the times and follows the development of AI, works a lot on theory. Almost all the players of his generation have long given up and are not looking in this direction at all. For that, I have a lot of respect for Daniel.