The AI-Fiction Paradox: New arXiv paper says models need more fiction — but can't write it

The claim

A new arXiv preprint (arXiv:2603.13545) coins the term "AI‑Fiction Paradox" to describe a surprising tension in large‑scale language models: they are trained on massive corpora that include modern fiction and, the paper argues, desperately need more of that material to produce high‑quality narrative output — yet they struggle to generate convincing new fiction themselves. The submission is a working paper on arXiv and has not been peer‑reviewed. The author reportedly offers theoretical analysis to explain why training‑data abundance does not straightforwardly translate into generative competence for literary forms.

What the paradox means

At first glance the claim is counterintuitive. In machine learning, more and better training data usually yields better outputs. So why does fiction behave differently? The paper points to factors such as the structural complexity of narrative (long‑range coherence, character arcs, implicit world‑knowledge), licensing and sampling biases in corpora, and the divergent incentives that shape what text gets published and archived. The result: models can imitate surface patterns of prose but fail at the deeper, multi‑chapter craftsmanship that readers expect.

Implications for industry and policy

If the paradox holds up empirically, it matters for model builders, publishers, and regulators alike. Tech firms from Western labs to China’s Baidu (百度), ByteDance (字节跳动) and Tencent (腾讯) tap fiction for fine‑tuning and evaluation; access to diverse, high‑quality narrative data could become a bottleneck. Copyright, platform moderation, and national content controls will shape which fiction is available for training — and geopolitical frictions over data flows and compute (export controls, sanctions) could further skew which languages and styles get represented. It has been reported that dataset curation choices already influence model behaviour in subtle ways; this paper adds narrative form to the list of fragile axes.

What comes next

The paper is a prompt rather than a conclusion: it calls for systematic empirical tests, cross‑linguistic datasets, and new evaluation metrics for long‑form generation. ArXiv's platform — including arXivLabs collaborations — makes such early ideas visible to the community, but caution is warranted. Theory can point to a paradox; experiments must show whether it truly limits practical progress, or simply reflects current training and evaluation practices.