OpenAI rolls out GPT‑5.4 Mini and Nano — tiny models, near‑full performance and razor‑low latency

Fast models for latency‑sensitive work

It has been reported by IT Home (IT之家) and ifeng (凤凰网) that OpenAI announced two new compact variants of its GPT‑5.4 family: GPT‑5.4 mini and GPT‑5.4 nano. The key angle is simple: squeeze down model size without surrendering performance, so developers can run high‑frequency, latency‑sensitive tasks — code assistants, real‑time image inference and UI screenshot parsing — with far lower delay than full‑size models.

GPT‑5.4 mini reportedly outperforms the earlier GPT‑5 mini across code generation, logical reasoning, multimodal understanding and tool use, while delivering more than a 2× speed improvement. On public benchmarks such as SWE‑Bench Pro and OSWorld‑Verified it has already closed much of the gap to the full‑spec GPT‑5.4, the company says. GPT‑5.4 nano is billed as the smallest, cheapest option — a practical upgrade to GPT‑5 nano designed for lightweight tasks such as text classification, data extraction, content ranking and small “subagent” code helpers.

Pricing, availability and why it matters

Both models are now generally available: mini is integrated across the API, Codex and ChatGPT; nano is offered via API only. The mini supports a 400k‑token context window with API input/output prices of $0.75 and $4.50 per million tokens respectively; Codex calls consume just 30% of a GPT‑5.4 quota; Free and Go ChatGPT users can access the mini through a “Thinking” feature, it has been reported. Nano’s costs are much lower — roughly $0.20 and $1.25 per million tokens for input and output — aiming to make high‑throughput inference economically viable.

Why does this matter beyond raw benchmarks? Compact, low‑latency models lower the bar for deploying advanced multimodal AI in products where speed and cost matter more than absolute top‑line accuracy. And in a geopolitical climate where export controls and limited access to cutting‑edge accelerators create practical constraints, smaller models can be easier to run on a wider range of cloud and on‑prem hardware. Who benefits most? Developers shipping interactive tools and firms that need fast, inexpensive, reliable inference at scale.