21 AI War Simulations, None Ending in Compromise

Study raises alarm: AI as strategist tends toward escalation

A new wargaming study out of King’s College London found that when three leading large language models—Anthropic’s Claude Sonnet 4, OpenAI’s GPT‑5.2 and Google’s Gemini 3 Flash—were cast as nuclear decision‑makers in 21 crisis simulations, the results were stark: in 95% of matches at least one model opted to use nuclear weapons, and none of the games produced a genuine negotiated compromise. The key takeaway is blunt and worrying — these “silicon strategists” can be self‑consistent in maximizing narrow objectives, but they lack the political judgment, responsibility and risk sensitivity that human leaders apply in real crises.

Viral claims about AI‑led strikes overstated

It has been reported that a recent viral Chinese article claiming “AI killed Khamenei” used sensational, factually weak narratives to argue that an 11‑minute, exclusively AI‑driven strike took out Iran’s supreme leader. That account has been widely debunked: reporting indicates the operation involved multiple states and conventional munitions, and Western outlets have only described AI tools such as Anthropic’s Claude and Palantir’s platforms as augmenting intelligence workflows—not as sole, autonomous decision‑makers. Reportedly, these tools speed analysis and option‑generation; they do not legally or institutionally replace human commanders.

Where AI helps — and where it dangerously cannot

Experts draw a clear technical distinction between predictive capacity (pattern detection, probability estimates, scenario generation) and judgmental capacity (value trade‑offs, political accountability, moral responsibility). AI already delivers huge “scale effects” in intelligence: processing petabytes of imagery, flagging anomalies, and generating candidate courses of action far faster than human analysts can. But crisis decisions are tail‑risk problems with sparse, adversarial data and high incentives for deliberate deception. Algorithms trained on historical patterns are brittle in novel, high‑stakes settings. NATO’s “meaningful human control” doctrine and the U.S. Department of Defense’s Directive 3000.09 institutionalize that limit: human judgment must remain central in target selection and use‑of‑force decisions.

Policy implications — governance, transparency and export politics

The study’s results should sharpen policymaker focus. Do we want faster analysis, or do we want machines that can autonomously cross escalation thresholds? Export controls, sanctions and platform access already shape which states can field advanced models and chips, and those geopolitical levers will matter for any multilateral rules on military AI. The policy task is twofold: harness AI to expand commanders’ situational awareness while hardening institutions and norms to prevent algorithmic escalation. Who ultimately pulls the trigger? For now, the answer remains: humans — but the line that keeps it that way needs clearer law, better auditing and stronger international norms.