OpenAI’s GPT-5.4 Goes Agent-Native, Tops Human Pros on 83% of Work Tasks — and Sparks a $80 “Hello” Debate

The leap: from model to work system

OpenAI has unveiled GPT-5.4, framing it as its strongest, most efficient flagship built for professional work — and, crucially, built for agents. Beyond speed or scale, the shift is architectural: GPT-5.4 natively controls computers, integrates Codex-level programming, supports million-token context, and coordinates tool use without trading off individual strengths, according to the company. In ChatGPT, the model appears as “GPT-5.4 Thinking,” with a new up-front reasoning overview so users can steer mid-run. It is rolling out on web and Android, with iOS to follow; GPT-5.2 Thinking retires on June 5, 2026. Chinese outlet Huxiu (虎嗅) first spotlighted the launch details for mainland readers.

Performance and capabilities

On GDPval — a benchmark simulating complete workplace deliverables across nine major U.S. industries — GPT-5.4 reached or exceeded industry-professional level on 83% of tasks (vs. 70.9% for GPT-5.2), OpenAI said. It reportedly scored 87.3% in a junior investment banking modeling scenario and produced presentation decks preferred by human judges 68% of the time. The agent-native control matters: in OSWorld desktop operations, GPT-5.4 achieved a 75% success rate, edging past a 72.4% human baseline and far above GPT-5.2’s 47.3%. Web automation also improved, with 67.3% on WebArena-Verified and 92.8% on Online-Mind2Web tests that rely on screenshot-only structure understanding. Coding performance is roughly at GPT-5.3-Codex levels, with a 57.7% SWE-Bench Pro score and faster, more token-efficient throughput. Third-party accounts bolster the picture: recruiting platform Mercor’s APEX-Agents evaluations highlight stronger long-form outputs (financial models, legal analyses, slide decks), while real-estate data firm Mainstay reportedly saw 95% first-try success and up to 3x faster completion across tens of thousands of HOA and tax portals, with roughly 70% fewer tokens consumed.

Price, products — and sticker shock

OpenAI prices the GPT-5.4 API at $2.50 per million input tokens and $15 per million output tokens, roughly half of Anthropic’s Claude Opus 4.6 ($5/$25), and supports subscription-based quota use. The Pro tier, though, is raising eyebrows: it has been reported that a single “Hi” triggered deep reasoning and burned through about $80 — a cautionary tale that standard tiers may be better for everyday tasks. OpenAI also introduced a ChatGPT Excel plug-in to link the model directly into spreadsheet workflows, underscoring a push to embed AI across presentations, documents, and analytics rather than isolate it in chat.

The China angle and geopolitics

Why does this matter for China’s tech scene? Agent-native reliability is a linchpin for enterprise automation and RPA — categories where Chinese vendors and integrators are active — yet OpenAI’s services are not officially available in mainland China. That leaves domestic champions such as Baidu (百度), Alibaba (阿里巴巴), and ByteDance (字节跳动) to race on home turf, while U.S. export controls on advanced chips continue to constrain training hardware supply for China-based labs. For global developers, GPT-5.4’s blend of computer use, code generation, browsing, and document intelligence suggests a shift from “smart tool” to “work system.” Is this the moment agents go mainstream? The benchmarks say yes; the bills — and access limits — will decide how fast.