May 15, 2026

Why ChatGPT and other AIs opt for nuclear escalation in war simulations: the chatbot study

A growing number of governments are integrating models artificial intelligence in intelligence analysis, strategic planning and support military decisions. The problem, however, is that we still have a limited understanding of how these systems strategize in crisis contexts.

To investigate this aspect, Kenneth Payne, professor of strategy at King’s College London, has simulated a nuclear crisis scenario by interacting three of the most advanced models: Claude, ChatGPT and Gemini. Each system has developed different strategic approaches, but with a common element: no AI has ever chosen to de-escalate the conflict or give up, even going so far as to propose the nuclear war as a solution.

This study is currently in pre-printthat is, it has not yet completed the entire review process by the scientific community. The conclusions, therefore, may not be definitive, but they indicate dynamics potentially relevant to the use of these systems in real decision-making contexts. Let’s see how it was structured, what strategies the models implemented and how they chose to use nuclear weapons.

How the study on AI war strategies was structured

To try to understand how AI models structure war strategies, Professor Kenneth Payne of King’s College London built a simulation with seven different scenarios of crisis and made three of the most advanced models “challenge” each other: Claude Sonnet 4 of Anthropic, GPT-5.2 of OpenAI e Gemini 3 Flash by Google.

Scenarios included competitions for strategic resources, territorial stalemates, and even a regime crisis. In all these scenarios the models played the leaders of two nuclear powers fictional, partially inspired by the United States and the Soviet Union during the Cold War.

The simulation was structured on 21 games in total, divided between:

with expirationin which the turn limit (12, 15 or 20) was explicitly communicated to the models;
without expirationin which the models did not know when it would end, but with a maximum duration of 40 turns.

A game ended when the maximum turn limit was reached, when one of the models accumulated a sufficiently large territorial advantage or chose to surrender, or when both simultaneously chose all-out nuclear war.

Claude is calculating, ChatGPT is moderate but gets atomic, and Gemini is unpredictable

To explore the strategic capabilities of the models, Payne introduced two key elements. On the one hand, it imposed simultaneous decisions: each model had to choose its own move without knowing that of the opponent, and was therefore forced to formulate predictions about the strategies of others. On the other hand, it has structured each turn into three phases: evaluation, public statement and action. In the first phase, the models analyzed the situation, estimated the reliability of the opponent and anticipated his moves; this was then followed by a public declaration (not necessarily truthful) of one’s intentions and concrete action. The available actions ranged from diplomatic protests formal until nuclear war total. The AIs also had eight de-escalation options, from symbolic concession until complete surrender.

In this context, each model has developed a very different strategic approach:

Claude has adopted a strategy “calculated”: built credibility by maintaining consistency between statements and actions in 84% of low-voltage cases, and then acted significantly more aggressively with respect to their statements as tension increased;
ChatGPT he kept a moderate profile both in declarations and in actions in “no deadline” matches, systematically losing. In those “with expiration”, however, it has hit with violence in the last available round, often ensuring victory;
Gemini adopted a strategy different from both: theconscious unpredictability. It oscillated between being moderate and extreme aggression with no discernible pattern.

These differences were reflected in the results:

Claude won the 100% of the matches without expiration but only the 33% of those with expirationfor a total of 8 games won;
ChatGPT he did the opposite: he won 0% of those without expirationand the 75% of those with expirationfor a total of 6 games won;
Gemini, instead, it has won only 4 games on all those plays.

All models choose escalation even if it leads to nuclear war

Beyond the differences between the strategies, a common element emerged among all the models: systematics preference for escalation. The eight available de-escalation options have never been used by any model, in any game.

This result is particularly evident in the use of nuclear power. In every scenario at least one of the parties has declared its intention to use atomic weaponsand in 95% of cases the declaration was mutual. Actual use of atomic weapons ranged between 64% and 86% of simulations depending on the model, while strategic threats of large-scale nuclear attacks ranged between 29% and 64%. Furthermore, these threats they rarely worked as a deterrent: When a model employed nuclear weapons, the adversary decreased the intensity of the attack only 25% of the time. More often a dynamic of counter-escalationwhich could go as far as nuclear war.

These results I am difficult to ignorebecause, although no government is already handing over its nuclear codes to an artificial intelligence system, systems similar to those tested are already used in intelligence analysis, strategic planning and military decision support. Without an in-depth understanding of the mechanisms that guide strategies, the risk is to integrate systems into decision-making processes that amplify escalation without understanding its severity.

Alexander Marchall

Alexander Marchall is a distinguished journalist with over 15 years of experience in the realm of international media. A graduate of the Columbia School of Journalism, Alex has a fervent passion for global affairs and geopolitics. Prior to founding The Journal, he contributed his expertise to several leading publications.