Overseas models keep the lead in SuperCLUE; domestic LLMs closing in on cost-effectiveness

Benchmark results

It has been reported that the latest SuperCLUE Chinese large-model benchmark places four overseas systems firmly in the top tier, while China’s best models cluster around fifth place. The evaluation covered 21 mainstream models across six task areas — mathematical reasoning, scientific reasoning, code generation, agent/task planning, precise instruction following and hallucination control — using a 492-question test set. Gemini, GPT-5.5, Claude-Opus and Gemini-Flash held the top four slots, forming a hard-to-shift global first tier.

Domestic entries DeepSeek‑V4‑Pro, Qwen3.7‑Max and 豆包Seed 2.0 Pro reportedly finished with very close scores and make up China’s leading group, all hovering around the global fifth position. Qwen3.7‑Max, from Alibaba (阿里巴巴), stood out in code generation where its score fell less than two points behind the major international contenders. Domestic models also made repeated inroads on math and science reasoning tasks, signaling clear progress.

What it means for China’s AI race

Why does this matter? Because performance is only one axis. Reportedly, several Chinese models now deliver strong cost‑performance — similar output at noticeably lower cost — even as overseas systems still dominate the high-efficiency inference band. That gap in inference efficiency points to remaining differences in software-hardware co‑optimization and access to leading AI accelerators. And geopolitics looms in the background: export controls and broader US‑China tech competition complicate China's path to parity in compute and tooling.

The net picture is mixed but hopeful: domestic models are improving quickly and are commercially attractive on price, yet the top global tiers remain dominated by Western models. Can Chinese teams close the gap without unfettered hardware access? The next rounds of model updates and national investment in chips and inference stacks will answer that — and the global AI leaderboard will be watching.