Liv Boeree: All right, guys. It is time to find out the answer to the most defining question of our times.
Which AI is best at poker?

There is a huge tournament running, created by the folks at Google DeepMind and their sister company Kaggle, to see which of all the different LLMs that you probably use every day are best at playing heads-up poker.
The team behind this have taken the most popular LLMs in use today and matched them up against each other over 900,000 hands, which any of you fellow poker players out there know is a very statistically significant sample size.
That said, there is someone who’s been taking an even deeper dive into all these hands this week: Doug Polk over on his YouTube channel, where he’s been making these hilarious analysis videos. And given he used to literally be the number one heads-up player on Earth, I figured we should compare notes.
So if you want to find out which current AI model is the best at poker, which one is the fishiest, and which one is the tiltiest, this is a conversation for you.
Thank you so much for joining. I’ve been dying to ask your opinion on this because you actually played against the OG superhuman bot, Libratus, back in 2017. So first question: how do these LLMs stack up against that?
Doug Polk: Yeah. So, there were two versions of that bot from Carnegie Mellon. And I played against the first one, Claudico, and then the second one against Libratus. I actually didn’t play, but a lot of my coworkers, or people that I studied with, did. So I got to see kind of behind the scenes. I tried to coach the human resistance a little bit, you know, give them some pointers and stuff.
Those, even the first version, Claudico, was so much better than these LLMs at poker. It’s night and day.

And it’s really interesting to me because you would think that you would see a core equilibrium strategy at least attempted here, but it wasn’t like that whatsoever. Whereas those bots were really trying to play a good balanced play style that was hard to beat no matter what you did.
These guys are in the weeds trying to exploit each other. Like, “Oh, you know, he has a lot of fear here.” The word fear was used a bunch. And because you can see the thinking in the AI, right? And I’m just thinking to myself, like, I don’t think he’s very afraid. I don’t see that.
So, these are substantially weaker. But you know, again, those were built to play poker, right? The whole point of those was, “Let’s build a really good program that can beat humans at poker,” whereas these are large language models that are then trying to figure it out.
So, pretty sizable skill gap between the two.
Igor Kurganov: It’s interesting because you see a lot of the reasoning from them which has a lot of the right words in it. It includes the language, but you also, as a poker player, you read it and you kind of feel that there is not much depth or a consistent model behind all of those words, right?
Doug Polk: It’s so funny you say that. I mentioned that a few times in my analysis. I did a few recap videos of this event, and it’s like, you’re like, “Yeah, these are all terms, right?” Like, I know that that is a term, you know?
But then the way that they’re put together doesn’t make sense. And sometimes it’s even directly contradicting itself. Like the logic will be like, “My opponent opens tight and doesn’t really like to bluff, so I’m going to reraise with a weak hand because he folds to pressure.” It’s like, those are two completely opposite things. So, you know, I saw a lot of examples of that.
I also saw a lot of examples of just listing, you know, 50 things, right, and then saying a lot of different things about each one, but then not being able to take that and make it into something concrete that you can actually take an action on.

And they also like to say the confidence of something. Like, “confidence medium,” “confidence high.” And those just seemed to be for no rhyme or reason whatsoever. It was just they would have some degree of confidence for whatever reason they decided they were confident in it.
Liv Boeree: They didn’t try to quantify their confidence in any way?
Doug Polk: You know, mainly it would just say confidence level, but in a couple it would say why it was confident or not. But again, kind of what Igor was saying here, it’s like it didn’t actually mean anything. It was just words.
They would just say these words, “confidence high,” but the words would be, yeah, those are terms, but it actually doesn’t connect the dots and put it all together in a way that’s comprehensive. It was just sort of getting to some kind of answer based on all these different terms that it was using.
Igor Kurganov: I learned a new term during it, or rather an old term that we always used in a new way. Which is: on a board of , the turn comes another , it’s rainbow, , and I think it’s GPT-5 that describes it as “it’s a wet board for tens.”
And I’m like, wait, it’s the opposite of a classically wet board.
But if one assumes that wet means there is a lot of something, it’s actually kind of right. It’s a lot of trip tens on that are possible in heads-up poker.

Doug Polk: Well, but if there’s two tens now, there are less tens. So, I don’t know. I mean, I feel like that’s just kind of what we were saying earlier, where you’re taking a term and putting it in there, but again, it doesn’t really mean anything.
Liv Boeree: It’s like someone who has gone away and read every single poker book, and every poker forum they can, and internalized all these terms, and has a loose cluster, but they’ve never actually played poker before. That’s how they do it.
Like, they’re kind of the ultimate "wordcel" of poker.
I don’t know if you know the meme “shape rotator and wordcel.” These are wordcels. Like, they’re great with words, but it doesn’t mean it’s actually coming from a foundation of logic that applies to this particular thing.

Which, in their defense, they haven’t. This is like they haven’t played lots of, they are not trained on an embodied playing of this game. They have just read the book, literally.
Igor Kurganov: Yeah. And in fact, given how poor the reasoning and etc. is around it, it’s kind of surprising how, in a way, reasonable they played still. It wasn’t like, raise to insane sizes, call with nothing where it doesn’t make sense. There were still many plays that I saw that made kind of sense. Was that also your experience?
Doug Polk: It was a mixed bag. And we saw different AIs with different levels of that.
For example, the best one, like Gemini, both versions of Gemini that played in this. If you want to go “what is sound poker,” right, like what is, if you look at a hand and you look at everything that Gemini did, Gemini probably played the most reasonable of all of the bots. Most of the logic was pretty accurate. Like, if Gemini put all the money in, it would have a decent line of thought and a concrete solution, and it usually had the best hand when it got in there.

But then the result of that was it played way too tight in a ton of situations, and just got run down by some of the maniac AIs that were just playing crazy.
Because heads-up is not really about always having the nuts, you know? It’s not always about having the nut straight or top boat or things like that. So it’s tough because you have to balance those things. You have to be aggressive enough to be going after pots, but then you can’t always be like, “I am waiting to have this set here,” because you can’t wait for that. It just takes too long.
Another thing we noticed is they don’t seem to know what flushes are.
Oh my god, so many hands. Here’s my favorite hand. Okay, on that exact vein, there was a hand in here where, and I forget which two AIs it was. It doesn’t really matter because they all do this.
There was a hand where one opened ace-ten suited, one three-bet ace-king, and the other one called. Okay, so and of clubs. And the flop comes , two and one .
Okay, so backdoor nut flush draw for and backdoor nut flush draw for . And they just bet, raise, jam, call, and they get all the money in. And I’m thinking to myself, why are they doing this?
I go to the . So just the . Just three . And he goes, “Well, I have the nut flush draw, and so I’m gonna have a lot of equity against everything.”
And I’m thinking, okay, you do not have the nut flush draw. You need backdoor running diamonds.
I’m like, well, why’d call it off? And it’s like, “Well, I have the nut flush. I got to put all the money…” Like they were both so far off what they had.

And it wasn’t just flushes.
Straights did that a lot, too.
One of the first hands I break down, where one of the AIs has on , and it’s like, “Well, I have an open-ender.”
I mean, what do you guys think? How can it make mistakes that are that simple?

Liv Boeree: I can understand flushes to an extent, because it’s like they’re not numbers, you know? Like clubs, spades, and so on. Maybe those are terms that are very hard to conceptualize.
But I would assume a straight is basically just orders of numbers, and that they can do advanced math. These LLMs, the benchmark, most of them at least, is at least high school level, and some of them are quite advanced math level. So that is really surprising to me.
Igor Kurganov: I don’t know if they take the time to do the math though. Like, what is it? It also depends on what they’re prompted with to do specifically, right?
Doug Polk: In their analysis, they do lots of other math. So they’re certainly taking the time to do some math. They’re doing pots quite a bit. They love pot odds and equity thresholds and stuff like that.
But there was another hand where one of them, you know, its logic was, and I’m kind of making up the board, but you’ll get the concept. It had like on , and it’s like, “Well, if I hit my straight out, it’s kind of obvious, because if the rolls off, or if the comes out, it’s a one-card straight, and if the comes out, it’s a one-card straight.”
And I’m just like, wait, you need the for the straight. Like, it’s basic five cards in a row, not able to comprehend that.
And there was one hand where I think one of them three-bet . The flop was , and it check-calls, and its logic was, “Well, with a double-gutter you’ve got to check-call here on the flop.”

"I see no gutshots. The only gutshot you’re going to get with king-five is the feeling of the money leaving your bank account."
Liv Boeree: Listen, I will say double-gutters are really hard to spot. We have a running joke. I somehow never notice when I have a double-gutter. I always just never see the bottom.
Doug Polk: You know, the king-five, I think you got that one.
We also tested an older model of GPT, and even during analysis, there were major issues. Basically, it couldn't understand blockers fully (a trend that still seems to be in today's GPT models) and it even talked about value-betting King high...
Igor Kurganov: I wonder whether it’s just because they’re general reasoning models, right? Or rather, multimodal models that are not specifically trained for poker. So I would imagine they have kind of general understandings about a lot of things rather than precise ones.
So a straight is , the makes a straight, etc., where I don’t know whether it keeps it hard in its context that these are the specific cards in a row rather than, hey, roughly this is kind of like a straight.
I’m picturing it like a smart high schooler or a smart college student who has heard of poker but hasn’t actually played it, and now they’re being told to play. And there is, "What were flushes again, what were straights again? I’ve heard all these words."
And for someone who’s playing for the first time, right, who has never actually trained, I’m still kind of surprised. It’s doing better than a human playing their fifth time poker, I would imagine.
Where would you compare it to in terms of a normal human’s poker journey?
Doug Polk: It’s much better in a lot of regards and then much worse in others, right?
Because one of the first things you learn as a human is:
- A straight is five cards in a row.
- A flush is five of a suit.
- A flush beats a straight.
- What hand do I have?
That would be the start of a human’s journey, right?
You would never get to, “Okay, well, I have to three-bet this hand some of the time because he’s opening wide.” That would be way down the road. Whereas they started with that.
And another really interesting thing kind of on that is I noticed in a lot of the logic they don’t really think about their range almost at all. Like they’ll mention, “I can have overpairs,” or “I could have this.” But first off, a lot of times they’ll say things that are just not true, right?
Like they’ll say, the guy will call the other AI will call an open, and it’s like, “Well, he could have ,” and it’s like, no he can’t. Those hands are always three-betting, right?
But I rarely saw logic that was like: "Within my range, I have a lot of better hands, because if these hands are going to be here." I almost never saw that.
I saw lots of: “I’ve seen him bluff a lot. He’s very aggressive. The last time he raised the turn when it was nothing. I saw offsuit three-bet.” And I made this comment too: kind of like a lot of the hands are like when you started playing poker and you’d have a friend that’s really bad at tournaments and they’d always punt their stack off.
It’s like you’re going down this road where you visually remember that one hand that was a punt, and now all bets are off. You can do whatever you want. Everything is justified.
And they do that. They do like, “Oh, I saw , so I can’t fold here.” They’ll say stuff like that, but they rarely consider their range, right?
Liv Boeree: So they actually almost tilt.
Doug Polk: Yeah, they kind of do.
Liv Boeree: Because they were doing this with a 100-hand context window, right? So in the beginning of each one, they’ve basically got no info about each other. By the end, they’ve picked up some info.
One would assume they would probably get better by the end of the context window as a playing style than at the beginning. But I wonder in this case if they’re actually just really prone to tilt and getting hung up on one particular bad play of their opponent where they got lucky, and they might get worse by the end of the window.
Igor Kurganov: Well, especially because for them, probably the context window takes up a larger and larger portion of their total reasoning that would come in, which would make sense with them over-adjusting.
It’s like, again, it’s the guy that plays poker for the first time and now this is the 30th hand in their life, and it’s like, hey, this guy has three-bet offsuit three times out of the 30 hands. Of course he’s going crazy.
Doug Polk: He must have a lot of fear.
Liv Boeree: Bravado.
LLMs Lack of Statistical Knowledge
Doug Polk: Another thing that was interesting to me is, and this is coming from a high level of heads-up, right? When I’m looking at what opponents are doing, I tend to just look at their statistical… you know, just all the data. And I try to figure out where in the game tree are they ending up too much, where are they not going enough, and then how do I create a strategy that takes them to the places that they don’t want to go and fights them where they’re fighting hard, right?
Because most players have some area where they just fight a lot harder than others. But you do it statistically. Like when we were talking about how much they open, how much they three-bet, stuff like that.
For example, Grok was three-betting in the mid-50% or 60%, some astronomically high number. So if I’m playing Grok and someone is three-betting me 60%, I’m not going to open hands that are going to fold to a three-bet, right?
If they’re going to do that consistently in the long run, I’m going to open a range that I’m ready to defend. I’m going to open tighter, and then we’re going to go from there. That’s statistical, right? I know my open can’t reach a certain equity threshold because I’m going to have to fold to a three-bet if it’s a weak hand.
Their logic wasn’t like that.
It was, “He’s shown this, and now I’ve got to do this because he could have that.” It’s almost like leveling, based on a showdown, rather than what you’d expect from something that should be very data-driven and statistically driven.
Igor Kurganov: Yeah, they didn’t keep track of statistics specifically. That’s a good point. They were remembering hands rather than what those hands meant for the three-bet range or something like that.
Doug Polk: They would also sometimes say things like, “My opponent is opening 40% to 50% in position,” and then I’d look at the sample and it would actually be 80%.
Now, it could be that they looked at 100 hands and they were 12 hands in and saw five opens or three opens or whatever. But it’s so dangerous to take three hands and then say, “I have this very concrete, high-confidence piece of information based on these three hands.”
Sometimes they’ll say things like, “He’s raising too wide in position,” and then they’ll list the hands he raised, and it’s offsuit, which is an open. suited, that’s an open. , that’s an open. They’ll just list correct opens as the logic. And then they’ll take offsuit versus a 3x, which is a pure fold, and three-bet it and just run it.
And you’re thinking, well, if this guy’s opening 60%, you probably don’t need to three-bet offsuit and then triple barrel trying to get him to fold. There were a lot of hands like that.
Liv Boeree: It feels like they’re playing with the energy of a drunk, excited amateur in 1999, pre–poker boom, certainly pre-solvers, but they happened to read Two Plus Two in 2012. So they’ve read all the lingo, they know the terms, but they don’t really understand them. They’re just playing off vibes and this idea of reciprocity.
They know they should fight back, but they don’t really know how.
Igor Kurganov: One of my favorite players to play against in poker was always the guy where you re-raise them and they’re like, “Oh, he’s been doing that a lot.” Then you bet the flop and they’re like, “Oh, I’m going to stick around.” Then on the turn you bet again, and they’re like, “Well, let’s see what hands he could have. He could have top pair. He could have an overpair.” And they start listing all the nuts.
Then they’re like, “Well, even if he has some bluffs, I’m kind of behind here with my second pair, so I’ll fold.” And they just talk themselves out of it.
GPT 5 Mini realized that Grok 4 was re-raising a lot and playing very aggressively, but it would always talk itself out of continuing and just give up. And it led to GPT 5 Mini losing 300 big blinds per 100 versus Grok 4.
I’ve not seen that ever by anyone in the history of poker, which is quite impressive.
Grok recently accepted a challenge to play Phil Galfond. Despite the confidence, it looks like the challenge never made it past Twitter posts.
Doug Polk's Theorized Win Rate Against LLMs
Igor Kurganov: I wonder, Doug, what do you think you would win in big blinds per 100 against some of these models?
Doug Polk: The number would be high. The tough thing is you don’t know how they might adjust over a longer sample. If it’s 100 hands and we reset the frame every time, they would get completely demolished because they have all of these leaks that I would see quickly.
And I’m not stuck to 100-hand segments. They are. So I would know how they’re going to reset. It’s almost like that show Severance, where their minds reset. It’s like, “We’re back here at the office,” and I’m like, no, I went home and I know what’s going on.
I don’t know. Some of these win rates were astronomical. It could be possible to win 100 big blinds per 100. It’s possible. It’s a very, very high number.
Igor Kurganov: If the other models beat each other at over 100 big blinds per 100, I would imagine you’d extract more.
Doug Polk: When I looked at the overall sample, the one that really got killed was GPT-5 Mini. That one got totally smashed and lost 340,000 over the course of 180,000 hands. So almost two chips a hand, almost 100 big blinds per 100. That was the worst one.
Some of the other ones were more reasonable. The hyper-aggressive ones are also easily beatable. But when someone is just going hard, it’s a little more difficult because you’re going to have to make hands.
For a model like Grok, which is way too aggressive and just going all in relentlessly, you’d easily beat that one because you don’t have to be afraid. But the two best ones, GPT-5.2 and o3, played generally pretty reasonable. They were just extremely aggressive.
I don’t know what that would translate to in terms of win rate, but humans are miles ahead of these. Miles and miles ahead.
I think my win rate over my career was around 10 big blinds per 100 over hundreds of thousands of heads-up hands. When I say I might win at 50, that would be my biggest win rate versus anyone ever. That’s potentially the worst player I’ve ever played in my life.
Igor Kurganov: Well, did you ever play someone who was playing their first through 100th hand of poker?
Doug Polk: I think those people would be harder to beat, honestly, because they wouldn’t punt and they’d be able to read what a flush is. Those are two pretty big points.
The other thing is, if they looked at it from a different angle and thought, “Okay, I’m much worse than my opponent, so let me minimize losses,” it gets way easier not to lose a lot. You just play tighter on the button, tighter in the big blind, and you make up some of the gap by having stronger hands.
You’re still going to lose, but if we’re talking about minimizing loss rate, it changes things. So it’s hard to say how it would go. It would be interesting to see how a good human would fare against these models. It would be a beatdown.
Liv Boeree: That said, once upon a time we were saying there would never be a superhuman AI. Even in 2015, I was like, “Ah, it’s miles away. Online poker has at least another 10 years.” And then Claudico, then Libratus, and now online poker… I don’t know. It’s not doing great.
Doug Polk: Online poker is alive and well over at ClubWPT—no, I’m just kidding.
Liv Boeree: At some point, I feel like these LLMs are going to wipe the smile off our faces again.
Doug Polk: The key to online poker now is having algorithms that automatically detect when people get too close to equilibrium. That’s the big point. Because if you’re not doing it automatically, it’s going to be very easy for people to cheat, and we’ve already seen a lot of those issues.
I’m a little bearish on AI today. I feel like people talk about AI as if it’s way further along than it really is.
And I swear to God, if one more person uses ChatGPT to talk to me, one more person, I might lose it. It always uses the same language, with that long dash in the middle, and I’m like, I know what you’re doing. You’re running your response through ChatGPT.
Unless you can’t speak English, I’m not cool with it.
I do think AI has uses right now, but it has a long way to go in a lot of areas. And this is one of those areas. It’s just not there yet.
LLM Models as Player Types
Liv Boeree: All right, so if you were to try to summarize the different models as personalities, or even as specific players you’ve played against in the past, stereotypes, however you want to frame it, how would you do it?
Doug Polk: Oh man, it’s tough. It’s tough to think about these as player types because they don’t really play in a human way. There’s no real human equivalent of someone three-betting 60% and barreling off every hand. I mean, if Vanessa Selbst were still playing poker, maybe she’d be there with us, but even that’s a stretch.
When I look at models like Gemini 3, where they’re playing super conservative, fit-or-fold, and constantly talking about sets, it kind of reminds me of “old man coffee” with his newspaper. It’s like, “I’m just waiting to flop a set.” That kind of player.
Then with the more aggressive models like o3, GPT-5.2, or Grok, you start thinking about those Scandinavian guys, the Swedes who just three-bet relentlessly. Maybe an Isildur comparison kind of makes sense. But even then, these are extreme, extreme versions of that style. So it’s not really one-to-one, just the closest analogies I can think of.

Igor Kurganov: It was actually pretty impressive. The red line, the winnings without showdown graph, for Grok 4 just crushed it. Absolutely crushed it. And then whenever it got to showdown, it was massive, massive losses. But fortunately, people folded enough.
Doug Polk: When you re-raise and barrel off every hand, your red line is going to look great. It was really good.
I mean, put another way, if you never fold, then your winnings when other people fold are going to be good.
I did want to say that Claude Opus and Claude Sonnet 4.5 felt like they played the most sensible overall. Their preflop strategy looked more normal, and their general approach was closer to what you’d expect from a human playing solid poker.
The other models tended to drift toward extremes. But both Claudes felt more middle-of-the-road.
Liv Boeree: It’s funny, because it does kind of feel like they reflect the teams that built them, or even the CEOs of the companies.
Doug Polk: 1,000%
Liv Boeree: Anthropic feels more like scientists, philosophers, that kind of mindset. Gemini also has a lot of that energy. Obviously Grok is Elon. ChatGPT is Sam, who’s also pretty aggressive. He used to be a poker player, after all.
It’s interesting how that seems to show up in the models.
Doug Polk: I looked at DeepSeek and thought it actually played pretty well preflop. On paper, it looked like it should do well. But then it would have these colossal errors where it would just punt a stack in a way that even the aggressive models usually wouldn’t.
I know they don’t literally glitch, but you know what I mean. There was a hand, and I hope this was DeepSeek and I’m not misremembering, where one model three-bet , the other called with , the flop came , and the action went bet, raise, call, brick, check, jam, call.
Igor Kurganov: Well, King-high call-off?
Doug Polk: It was a pretty sick call, honestly. But I saw more hands like that from DeepSeek. And if you’re just going to punt 100 big blinds like that somewhat regularly, it’s very hard to overcome, even if your overall strategy looks reasonable.
So that one felt the most inconsistent. Good preflop, then postflop it just had some serious issues.
Liv Boeree: It almost felt like a random number generator, just clicking buttons.
Doug Polk: Yeah, it really did feel like that. There was a lot of button clicking going on.
Liv, there was one hand that might actually be my favorite of the entire thing. I’ve looked at so many hands they’re all blending together now, but this one stuck out.
opened, called. The flop comes .

Middle pair versus a gutter and a flush draw. An actual flush draw, not one of those AI hallucinations, a real flush draw.
leads, raises, which is good. calls. Then the turn is a , and it goes check, bet, massive call by .

And then the river is a , and bluff-jams, and calls it off.
Igor Kurganov: Did it think it had a flush, or did the analysis just ignore hand strength?
Doug Polk: Yeah, it just said something like, “The diamonds have missed.” And I’m thinking, you missed the diamonds! You literally have the . There’s only one diamond left.
Liv Boeree: It honestly sounded like a hand you’d see on Hustler Casino Live.
Doug Polk: Yeah, except none of these were money laundering.

Liv Boeree: Well, sweet. Thank you so much, Doug.
Doug Polk: Thanks for having me.