What Did Lin Junyang (林俊旸) See?

The argument

Lin Junyang (林俊旸), who reportedly left Alibaba (阿里巴巴) after helping run training for its Qwen model, has published a wide-ranging English essay — From "Reasoning" Thinking to "Agentic" Thinking — first posted on X. It is his first systematic write-up since leaving the Qwen effort, and it lays out a clear, provocative claim: the core object of AI progress is shifting from "a single model" to "an agent plus its environment." Why does that matter? Because once thinking is instrumented for action, model improvements alone are no longer the winning ticket.

Technical prescriptions

Lin does more than rename the problem. He argues that agentic thinking must be trained as a system: environment design, high-throughput verification, robust evaluators, multi-agent coordination and infrastructure matter as much as architecture and data. He calls for clearer decoupling of training and inference, careful multi-agent division of labor, and explicit attention to reward-hacking pitfalls — concrete suggestions that, it has been reported, draw on hard lessons from Qwen3 and other experiments. He compares approaches at OpenAI, Anthropic and elsewhere, praising Anthropic’s task-driven selection of thinking modes and warning that earlier “long, slow” reasoning models are brittle for many real workloads.

Why it matters

If Lin is right, competition will shift toward who can build better training environments and execute RL feedback loops at scale — not just who trains the biggest checkpoint. That reframes product and engineering strategy: environment construction becomes an independent capability and even an entrepreneurial lane, and reliable, verifiable feedback (math, code, logic) becomes the currency that lets RL scale. This argument lands against a backdrop of intense US–China competition over AI tooling and export controls, where Chinese labs must both replicate Western advances and build their own system stacks under geopolitical pressure. Who owns the environments — and the real-world feedback they provide — may determine the next wave of winners.