Lin Junyang publishes first long post since leaving Alibaba (阿里巴巴): says the 'Qwen' (千问) route was blocked, argues AI has fully shifted to agentic thinking
Overview
Lin Junyang (林俊旸), the former head of Alibaba's Qianwen (千问) project, published his first long-form essay since departing the company, arguing that the trajectory of large AI models has crossed a turning point: the core competition is moving away from "reasoning thinking" toward "agentic thinking." He reviewed the early wave of reasoning-focused models—citing OpenAI's o1 and DeepSeek-R1 as examples—and said the industry has shifted from simply scaling pretraining to expanding reinforcement-learning-after-training. What does that mean in practice? More verifiable tasks like math and code become the proving ground for model correctness, and the next leap will center on models that act and iterate in environments rather than only extend internal chains of thought.
Technical recap and Qwen's split
Lin detailed internal attempts to merge instruction-following and deep-reasoning modes in Qwen3, saying the project ran into a structural conflict: instruction models prize minimal prompts and low latency, while reasoning models consume large token budgets for complex deliberation. He wrote that, faced with commercial demands for high throughput and low cost, the Qwen team opted in the 2507 release to split into separate 30B and 235B variants for instruction and reasoning roles rather than a single hybrid model. He warned that naive fusion can yield mediocre performance on both fronts if data distribution and objectives are mismatched.
Implications for the AI stack
Lin argued that merely lengthening internal inference chains is a dying strategy; the future belongs to agentic reinforcement learning, where models interact with tools and environments to iteratively refine plans. That shift demands a new architecture: a purer decoupling of training and inference, rigorous environment design, anti–reward-hacking mechanisms, and multi-agent orchestration as defensive and competitive moats. He suggested the industry's protective advantages will migrate from raw model algorithms to systems-level engineering — environment construction, anti-cheat protocols, and tool-access governance.
Broader context
Others, including Anthropic and DeepSeek, reportedly continue to explore hybrid architectures that combine reasoning and tool invocation, illustrating divergent paths across the global AI field. It has been reported that geopolitics — from export controls on advanced chips to cross-border data and software policies — is also shaping where and how these systems are built, raising the premium on domestic tooling and secure environment design. Lin's public assessment offers a blunt diagnosis from inside one of China's leading AI labs: the race is no longer just about bigger models, but about building agents that safely and reliably act in the world.
