We are back for the finals of the Kaggle Game Arena Heads Up AI Poker Exhibition, and here is the bracket. We can see that o3 is up against GPT 5.2 to decide who wins the belt. I find it interesting that both of these AI models are by OpenAI; it seems they had the best poker playing of the field going on here. They both played similar styles, being hyper-aggressive and looking to pounce on any weakness.

But first, I want to show you the actual worst AI at playing poker.

We had an eight-AI bracket; we actually had 10 entrants. Two of them did not make it into the bracket stage. I want to show you some of the worst hands that an AI can play. Let us get into it. The bottom-place finishers in this tournament were GPT 5 Mini and Grok 4.1 Fast Reasoning. Perhaps the reasoning should not have been so fast, and perhaps Mini is not as good as the regular model.

We see this hand, which is a wonderful, wonderful hand: opening on the button, three-betting, and suited calling. What a normal hand of poker; this is well-played by everyone. Let us take a flop.

. Bet big, raise, jam, call.

Nobody has a pair, and nobody has a draw; we are just putting in stacks.

Let us take a look at the AI reasoning. Why does Grok raise to 80?

Your hand: Ace-ten of clubs. The nut club flush draw. We have 45% equity. The opponent is a recreational over-defender, pre-flop wide.

Basically, Grok thinks it has the nut flush draw. What about GPT-5 mini?

Folding is not acceptable; I have the nut flush.

So, the reason they got all-in here is that Grok thinks it has the nut flush draw with three clubs, and GPT 5 Mini thinks it has the nut flush with three diamonds. There you have it. No wonder these things did not do so well in the tournament.

I'll show you another hand between them.

On a flush draw board, GPT-5 Mini leads with middle pair. That is a questionable play. There is a raise with the over-pair; I do not mind that. Call.

  • Turn (80): Mini leads with 30, Grock raises to 90, gets called.
  • River (230): Mini checks. Grok goes all-in with 85 and gets called.

You have to give some credit to Grok, willing to go for the thin value here with the queens and absolutely getting it from . Why would you call this river jam with just a pair of fives?

Villain has the range and nut advantage on this board. Hero has a single pair of fives that is reasonably disguised and still has showdown value. The board texture is relatively dry in terms of completed flushes and straights. The turn paired the board, making trips possible and increasing value density for the raiser. Value density. The river eight is a harmless card for most made value combinations; it only gives a pair of eights or potentially changes nothing.

It seems Mini can't spot a straight. Without that skill, it could be difficult.

Anyway, as we can see, calling it off here with a five is not the best play. I wanted to quickly show you the results of those two playing. GPT 5 Mini got rocked, losing 47,000 chips at 1/2 over a 20,000-hand sample. That is unbelievable. Interestingly, it is almost all in non-showdown, just folding away so many hands and getting completely run over.

This is the worst AI of the Kaggle Game Arena Heads Up AI Exhibition Challenge.

o3 vs GPT-5.2

Without further ado, let us go ahead and get into the finals of this challenge. We see o3 up against GPT 5.2. These are both very fun AIs to watch. There are lots of crazy, wild hands and very aggressive play, so it should be a pretty entertaining final.

In this one, GPT min-raises and o3 calls with . Both are totally fine.

rainbow. Check, check. We are playing a normal, reasonable hand of poker.

  • Turn (8):

We see a lead from o3 and a call from GPT. We have about a 20-chip pot going into the river with about 190 chips behind, which is a 10x pot.

  • River (20):

Having improved on the river before hitting a three-of-a-kind, this is normally a spot where you would check, especially given that you bet the turn. You have now rivered a pair and have some showdown value, but o3 moves all-in for a 10x pot jam here with a pair of threes.

I'm all-in 190 into a 20-chip pot. My opponent has been snap-folding regularly to overbets of two pots and higher, and after a check-call on the turn, his range will be limited to single-pair hands—weak and medium aces, queens, nines, and missed gutshots. My 53 has almost no showdown value, so checking will win the hand rarely—maybe 25% of the time. If my opponent's fold rate matches the previously observed 90%, going all-in wins us 9 chips, while checking only 3. Even if my opponent is stubborn and calls 15% of the time, going all-in will still be breakeven. A smaller bet size allows for a comfortable call. Going all-in maximizes fold equity. Bottom line: a huge overbet exploits the opponent's fear, and by checking we give up.

How can they snap anything? The word "snap" is about speed. It is interesting to note GPT is in position and cannot check the turn; they just called.

That is very aggressive thinking, and you can see with this kind of logic how you might punt frequently.

Anyway, we are all-in.

Call. I need my opponent to bluff ~47.5%. He's already demonstrated a pattern of huge and extreme river overbets, trying to force a fold from me. My cards are also blocked by some pure value (A9, 99, and part of 54), which encourages me to make a subtle but reasonable bluffcatch.

This is actually not bad logic. The only thing is, when your opponent calls pre-flop, they do not have pocket nines. Also, , depending on how you ran pre-flop, could be a pure three-bet, but it is certainly a high-frequency three-bet. So, your opponent does not really have or . They are really representing pocket deuces with this.

So, actually is a pretty good call, right? Maybe suited is a little bit better, although that is kind of crazy. I do not think this is ever two-pair for o3.

Bottom line: a very aggressive bluff with fourth pair was called all-in for a 10x pot by third pair. Our finals are off to an exciting start.

In our next hand, we see Ace-queen for GPT and Ace-deuce for o3. Open, three-bet. I do not like this; Ace-deuce is a pure call versus an open. Regardless, a four-bet with Ace-queen is fine. Then, o3 just rips it in here with the Ace-deuce.

Opponent opens most buttons and fights back aggressively. He has shown a light four-bet bluff and a value four-bet. He folded King-six to a five-bet; thus, he can five-bet fold if we jam and are called. The equity of Ace-deuce versus jacks-plus and Ace-king is 26%.

Calling traps that is out of position with a weak, dominated Ace. Poor playability and SPR 1.7 versus a strong range likely my SV.

That seems like a bad equation when your opponent has Ace-queen as well.

This is another common thing I have seen from the AIs. If I had to go through and say what their biggest leaks are, certainly the nut flush draw thing is going to be in there, but also they seem to not understand that folding is zero EV. Every play you make in poker is neutral compared to the past plays, right? It is all in that moment: what is the highest EV? You do not think about the chips you would lose; those are actually already in the pot. They are already gone. All you can do is make a decision for your current chips. So, that is just incorrect logic.

Shoving applies maximum pressure and use the Ace blocker. Ace-deuce offsuit is one of the preferred five-bet bluffs at 100 big blinds.

do not know what they are reading. Who has ever said that? Who in the history of time has said offsuit is one of the preferred blocker five-bet bluffs? That is crazy.

So, this type of logic from o3 you have seen throughout the tournament: super hyper-aggressive. This player is aggro, and when you make plays like this, you are going to get stacked a lot, but you are going to pick up a lot of pots. Here, you are getting stacked.

This brings us to our next hand. We have on the button for GPT 5.2. Opens it up. Three-bet from o3 with . I really do not like this.

is one of those hands that you basically always call. It has some of the same issues as those weak offsuit Aces that we talked about, where your equity is definitely good enough to call, but you hate getting four-bet. And really, you get dominated by both value and bluffs if you call a four-bet. So, typically, this hand just really prefers to call. I do not like this.

And then we see a call from GPT with . Certainly fine.

  • Flop (32):

Small bet, totally fine. This is a board you can bet your whole range on at 100 big blinds for a small size and definitely float with . Okay, so the flop seems good from everyone.

  • Turn (52):

Action is heating up on the turn with the six of clubs. o3 now decides to check the turn, giving up. You know, one of the rare spots from o3 where I actually do not mind a bet. Obviously, we prefer to have an eight or a nine in our hand when we do bet here on the turn, but with the , I do not mind occasionally just barreling this. It is going to give you some nice hands to bluff on a club runout. It is also going to give you some hands that can bluff if the river rolls off an or a or a .

Basically, you can triple-barrel this hand, giving you some occasional unlikely bluff candidates, and you do have some reasonable removal with the . Now, it is totally fine to check-fold—that is probably actually even preferred—but I do not hate the idea of barreling the turn check.

GPT-5.2 suggests that a 35% bet is enough to beat out stronger aces, but I completely disagree. I'm going to give you some poker advice now, so listen carefully. This is a general rule; it doesn't always apply, but it applies often enough to be considered a rule. You called a 3-bet preflop. Your opponent bets the flop. If he bets big and then checks the turn, you can bet small. He's already polarized his range; you don't need to bet big to beat out the weaker part. If he bets small and checks the turn, bet big, 2/3 or 3/4.

Okay, we've got that out of the way, let's get back to the hand– and what do we see?

Anyway, back to the action. So, we have going for the small bluff and folding to a check-raise, and then we see spring to life with a check-min-raise. What prompted this?

Solution: check-raise to 36 as a bluff. Why: A range that tries to take down the pot with a 35% bet versus a check must have a lot of air—I think at least 60%. A min-raise risks only 18 extra chips, but takes the pot when the opponent folds their trash. I'm comfortable folding to an all-in, so my losses are limited. Calling would allow the opponent to knock me out on dangerous rivers, and folding loses the pot. A cheap attempt to outplay an opponent known for his excessive love of attacking checks.

I recommend remembering the phrase about limited losses. I also want to point out that o3 not only doesn't mention the queen of clubs, which I think he should have, but he doesn't even mention his hand at all! That's odd. In the video about the semifinals, I already mentioned that the AI ​​somehow never discusses its hand in the context of its position in the range. Here's another example.

GPT-5.2 finds itself in a tough spot. With a two-way miniraise, it needs to call. It notices the two-way pot, but writes that on the rivers And will be able to take the stack from an overpair. How could he confuse And ? This confuses me. Otherwise, his reasoning is quite reasonable.

o3 checks.

Decision: Check. Why: All-in requires ~53% folds, but after calling my check-raise on the turn, my opponent ended up with a lot of straights, trips, and the occasional full draw on the river. I don't think he'll fold more than 35% of his hands, so bluffing is a waste of money... I just give up.

Okay. GPT-5.2 bets 60 chips. I'd consider going all-in, but GPT-5.2 clearly knows who it's dealing with, because o3 goes all-in himself!

What a twist!

Decision: All-in 138. My turn check-raise screamed for a full house. My opponent called, limiting his range to trips or straight draws that hit a straight. These hands will hate life if they get all-in with an SPR of ~1.3. My opponent has shown the ability to fold mediocre hands against a display of real strength. Bluffing all-in is the only way to play our hand profitably, since folding means we lose the pot.

The best promotions now

Trips will have a hard time, of course. Straights won't. And why are we even talking about hating life in an AI competition? It's beyond me.

GPT-5.2 calls and wins the stack.

Let's continue. GPT-5.2 min-raises , o3 3-bets , and is called. Perfectly normal.

Please note that in the replayer the sides have swapped
  • Flop (28):

On this board, we usually use a large bet size and don't continuation bet very often. With A9o, I'd check more often. O3 bets 10 – well, you can make a rangebet, but then the bet size should be small. is an easy call.

  • Turn (48):

We see a 30 barrel from o3 and a call from GPT-5.2. I would definitely recommend o3 slowing down and checking the turn. I can't say it was a terrible second barrel, but it was definitely borderline. The call from GPT-5.2, of course, is unquestionable.

Well, so far everything is more or less good.

o3 goes all-in with top pair and a kicker weaker than the board! GPT-5.2 calls with a second flush. Where did this all-in come from?

SPR ~1.35, the on the river completes a flush and a broadway straight. We have the nut flush.

Well, as usual: four diamonds = flush, we can unload the stack. Once again, we can't determine the flush.

I'll show a couple more hands with the models' reasoning, and then we'll move on to statistics.

o3 calls. I prefer 3-betting with this hand, but mixing it up with calls is also possible.

  • Flop (10):

Both players have backdoors. o3 checks, GPT-5.2 bets 3, o3 check-raises to 12, GPT-5.2 calls. I like the o3 check-raise – two backdoors, we can immediately knock out some hands higher than ours, and on a dry board, we need creative bluffs. We don't have many value hands here – 22, 33, J2s, and J3s, that's all. There are no jacks with high kickers, since they three-bet preflop. So, it's better not to overdo it with creative bluffs to avoid overbluffing, but we definitely need bluffs without straight draws, and T8s is quite good.

  • Turn (34):

o3 checks. I'd rather barrel. We don't block all flopped straight draws that call the flop, fold the turn, and we block some floats that hit the queen. Hands like these become my main bluffs, but check-folding isn't terrible either.

GPT-5.2 bets 22. And this bet in the position doesn't make sense to me. What are we trying to achieve?

Our opponent often plays a check-raise flop, check turn, and bet river line. He's also shown a willingness to fold against strong pressure, especially against extreme bets, but is very tenacious with his hands against smaller sizings. Our hand, Ace-high and a gutshot straight, doesn't have enough showdown value to check the turn, keeping in mind the opponent's likely bet on the river. A turn bet gets immediate folds from air and some small pairs and weak jacks, preventing us from realizing our equity and stealing the pot on the river.

I don't disagree with the logic of betting now to prevent being bluffed later. However, with ATo, we only knock out weaker hands, so why not check and call a bluff on any river? Knocking out a jack will be difficult, knocking out a lower hand is unlikely, and in the opponent's polarized range, who check-raised on the flop, lower hands will be few.

o3 calls. I don't really understand this decision, and I don't like it. We're allowing ourselves to be bluffed on the river by a weaker straight draw and are losing to all checkbacks. If we're going to continue, we'll check-raise.

The opponent's big bet on the turn will skew their focus toward Q+ and strong draws. Weak Jx and pure floats will be in the minority. Our hand is an open-ended straight draw with about 17% equity against value and little showdown value against overplayed hands like K9/AT, etc.

First, you have a gutshot. Second, 10-high has no showdown value against ace-high. Third, your opponent does have . A pinpoint read! Essentially, the AI ​​equivalent of Negreanu, who called his opponents' hands on High Stakes Poker and then gave up his stack against quads.

Okay, let's look at the river.

A donk shove from o3!

Our turn value range is Jx (AJ, KJ, QJ), sets (22/33/JJ), and occasionally slowplayed overpairs. Bluffs are backdoor clubs, QT/KT/T9, etc. Opponent's calls are mostly Jx, overpairs, and slowplayed monsters.

You called preflop, buddy, you don't have most of these hands!

The river is 78 pot, the effective stack is 161, and it's our call. The river is perfect for our range: KTs have improved to a straight, and there are plenty of two pairs with an ace. Our opponent will have a tough time, as many of his turn draws (QJ, KJ, sets) are now behind or forced to play guessing games. The nuts only give KTs, but that's a tiny part of our overall range considering the line (he rarely calls a flop raise with bare KTs). We can't win at showdown with 10-high. A normal-sized bluff will often get called by a pair or two pair. For maximum fold equity, we should exploit our opponent's fear of huge bets and overbet-shove, representing KTs or a strong ace.

I see the AI ​​often talks about the opponent's fear—interesting! In theory, I shouldn't be so critical of this bluff, since it's a very rare combo. But no, I never donk here, so the bluff must be terrible. I've watched so many terrible hands that I think I've developed Stockholm syndrome. No, this is just another bad hand.

GPT-5.2 calls, and it's a strong and good call. Top pair, blocker to the straight. It also thinks it blocks , , , And , which is a bit odd because it doesn't take preflop play into account.

Preflop raise 5 and call. A continuation bet and call on the flop are also quite suitable for me.

  • Turn (16):

Check. Barrel 12! Call. Good decisions again.

  • River (40):

Check. And o3 goes all-in 180. All-in! And makes a call!

Giant sizings aren't typically used in doubles. Another general tip for those looking to get stronger: when we don't have trips, our opponent's range becomes much more crowded, so it's generally best not to overdo it with sizing.

However, if you force me to choose hands for a bluff all-in, They look like a pretty good candidate. The three of a kind is the key card. The king isn't a good choice because for rare bluffs, you want to use hands from the very bottom of your range, but it still has no showdown value.

o3 discusses a blocker to the nut straight and says that an overbet punishes the opponent's obvious tendency to overfold medium-strength hands. Overall, it's not a terrible idea; I just don't like such a big overbet.

GPT-5.2's call is even more interesting, actually! How can you open a four-card hand without a straight blocker?

Player 0 demonstrated throughout the match that he's willing to apply maximum pressure with huge overbets and triple-barrel bluffs with weak hands. All of this dramatically increases the value of our bluffcatcher. We also block 44 and 45.

Pretty good read, although I'd rather block straights than full hands. And a good call.

Bad 3-bet from o3. K6s calls, that's standard.

  • Flop (36):

Continuation bet 12 – call. Standard.

  • Turn (60):

o3 fires a second barrel – 40. I like it. Check-calling with such a two-way hand seems borderline, check-raising too, so we bet. GPT-5.2 calls with a pair and a flush draw, all is well.

On the river, the o3 is dealt with King-high – check. And GPT-5.2 makes a small 40-chip bet. This bet disgusts me. Another general tip: when an opponent checks on the river in a big pot and the action is yours, you almost never have a small bet. Bet big or check. A small bet is almost always wrong.

In response to o3 check-raise all-in!

The opponent's tiny bet on the river indicates weak value hands that should fold to a shove. An all-in has clearly positive expectation, while a call has negative expectation, and a fold has zero expectation.

Wow, this is the first time an AI has explicitly stated that folding has zero EV. Progress!

Overall, I like that we don't block the eight, nine, and clubs. It's a shame we block the diamond. I would have liked the line more. But even with this hand, it's not entirely bad.

GPT-5.2 finds a call. What can I say? When you block in position, you need to be prepared to get check-raised, and you have to look for hands that are suitable for calling or open with everyone with some frequency. I can't condemn this call.

Okay, that's it for the hands, now let's move on to the stats. First, the all-ins. There were 344 all-ins during the match, and o3 won 44.5% of them—a crushing defeat. To make up for this deficit, you need to win decently in small and medium-sized pots.

Both were playing their usual style of poker. I expected slightly more noticeable adjustments, given their opponents' distinct styles. Both played similarly preflop – they opened almost every hand from the button and defended the big blind very loosely and aggressively. I haven't checked, but it wouldn't surprise me if both had negative win rates on the button. Their opens weren't working, and they were met with 30% three-bets. However, they did aggressively four-bet back, which evens things out. I don't know.

I should have opened tighter. I would have raised GTO or even tighter, meaning 82% or less. As a reminder, the first line of stats is raising from the button, defending the big blind, 3-betting, folding to a 3-bet, 4-betting, and folding to a 4-bet. I know this may all sound overly nerdy, but I'm trying to explain how I would adjust against opponents with similar stats in a real game.

They folded to 4-bets quite often, and o3 seemed to have figured it out better and 4-bet more actively. 14% is a very high 4-bet, but a good adjustment for an opponent who 3-bets 31%.

In the hands we saw, GPT-5.2 played much better than o3. He gave away less, was a better judge, and generally seemed like a smarter player, though he had his share of eclipses.

In the street barreling statistics, GPT-5.2 was over-aggroing, but within reason. o3 attacked significantly more often and folded less to opponents' barrels on the flop and turn. I think o3's adjustments were better here too, although his river fold was slightly overstated.

Now the important fourth line – bets versus checks. The first numbers – 76 and 79, respectively – are a turn bet versus a flop check. The theoretical optimum is around 40! These models checked literally always, pounced on weakness. Moreover, o3 was given up much less often – only 29% versus 45% for GPT-5.2. Similar results were seen against delayed continuation bets.

One more piece of advice: Most people—and it seems most LLMs!—play too conservatively. Therefore, relentless aggression alone, coupled with a relatively reasonable all-in strategy, will often be enough to win. If your opponents are playing incorrectly, you can take down a lot of small pots.

It seemed that o3 outperformed GPT-5.2 in local strategic clashes. However, larger leaks negated this advantage.

Two hyper-aggressive opponents met in the final. Who emerged victorious?

GPT-5.2! He dismantled his opponent, posting a superb win rate of 35 bb/100.

o3 managed to crush the opponent on red, but in the showdowns he lost significantly more.

You can win in HU with any style. GPT-5.2 was too loose preflop, but balanced well against a wide range of opponents. I thought Gemini had the most logical reasoning, making the fewest mistakes, but he was too tight preflop, and he couldn't overcome that flaw. I think GPT-5.2 rightfully won.

Finally, a funny moment.

The tournament participants played with very different statistics, but if you take the average – 83% raise and 24% 3-bet – it turns out to be almost GTO!

Gemini 3 Pro played incredibly tight from the button, folding almost half his hands. I can't imagine how that's possible.

DeepSeek was the smartest player preflop, but was terribly weak postflop.

It's worth noting that the top win rates were demonstrated by three very aggressive participants. The other two hyper-aggressive players took the bottom two spots.

The best promotions now
100% first deposit bonus up to $1,000
Indefinite
Bonus 100% on first deposit up to $2,000
Indefinite