Mind Your Tone: New Paper Finds Prompt Tone Can Change LLM Accuracy
A new arXiv preprint (arXiv:2605.29027) shows that the tone of a prompt — not just its content — can measurably alter large language model (LLM) performance on objective multiple‑choice questions. The authors probe whether tonal variations in prompts lead to disparate accuracy, using two datasets including a base set of 50 questions paired with multiple tonal rewrites for each item. The paper finds systematic differences in model answers depending on whether prompts are polite, blunt, sarcastic or otherwise stylized.
What the study did and found
The study compares model responses across tonal variants of the same objective questions and reports that accuracy shifts are neither trivial nor uniform: some tones help, others hurt, and the effect size varies by model. The authors present controlled experiments rather than anecdote, and they argue that tone should be counted as an axis of prompt design in benchmarking. The preprint is available on arXiv for scrutiny and replication: https://arxiv.org/abs/2605.29027.
Why this matters — for users, builders and regulators
Why should Western product teams care? Because prompt engineering is already a practical lever in application design, and tone may be a hidden parameter affecting fairness and safety. China's tech giants are deeply engaged in the same race: companies such as Baidu (百度) and Alibaba (阿里巴巴) have poured resources into domestic LLMs, and model behavior shaped by tone has commercial and regulatory implications on both sides of the Pacific. Could tone become a new attack surface or a vector for subtle bias? It has been reported that prompt‑sensitive behavior attracts interest from both adversaries and compliance teams.
LLM developers and standards bodies should take note: benchmarking suites need to include stylistic dimensions, and deployments ought to log prompt variants to detect systematic failures. The paper invites replication across languages, cultures and closed models; until then, designers would do well to treat tone as more than mere style.
