New arXiv paper finds large language models can lock into rigid choices — a warning for human-AI teams
Stubborn models in a simple test
A new preprint on arXiv (arXiv:2603.07717) reports that large language models (LLMs) often develop rigid, hard-to-shift decision policies when treated as agents in classic two‑arm bandit tasks. The authors ran 20,000 trials per condition across four decoding configurations and systematically varied reward structure. The setup was deliberately minimal — two options, repeated feedback, and different sampling/decoding rules — yet the outcome was striking: under symmetric rewards the models amplified positional order into stubborn one‑arm policies, and under asymmetric rewards they exploited options rigidly while still underperforming an oracle strategy.
Why a simple bias matters
Two‑arm bandits are a canonical test of the explore‑vs‑exploit problem: should an agent keep trying to learn about alternatives or stick with the option that seems best? The paper shows that LLMs can answer that question in a brittle, position‑biased way, rather than adaptively balancing exploration and exploitation. What looks like a trivial experimental quirk could have outsized real‑world effects. If assistant models or decision aids lock onto a preferred response because of decoding or prompt structure, human partners may find it hard to correct them. What happens when a model refuses to try a second option in medicine, finance, or legal drafting?
Broader implications and geopolitical context
The authors frame the work as a test of robust decision biases rather than a critique of any single model; it has been reported that the phenomenon persists across decoding schemes, suggesting a systemic risk in current LLM behavior. For Western and Chinese tech firms racing to deploy more autonomous agents, the finding matters for product design, auditability, and regulation. In a geopolitical climate where export controls, sanctions, and national AI strategies shape which models are available in which markets, brittle human‑AI dyads could amplify regulatory and trust challenges across borders. Policymakers and engineers will need to consider not just model capabilities but also behavioral brittleness under repeated interaction.
The preprint calls for more work on adaptive training, debiasing decoding methods, and evaluation protocols that measure long‑run adaptivity. Short experiments may miss the emergence of stubborn policies. For builders of collaborative systems, the takeaway is simple: test for persistence, not just accuracy.
