Jensen Huang’s GTC Pushes NVIDIA (英伟达) from Chipmaker to AI‑Infrastructure King

What Huang announced

NVIDIA (英伟达) used its GTC keynote to redraw its own map. CEO Jensen Huang framed the company as an AI infrastructure integrator rather than “just” a chip vendor, unveiling the Vera Rubin platform, a Vera Rubin DSX factory reference design, an enterprise runtime called NemoClaw, a space‑computing initiative, and a Physical AI Data Factory blueprint for automating training data pipelines. Huang repeatedly stressed “token” economics—reportedly calling tokens “the new commodity” and claiming NVIDIA’s token cost is world‑class—underscoring his shift from training‑first messaging toward low‑latency, large‑scale inference and agentic AI.

Hardware and system design

At the heart of the announcements is Vera Rubin, described as a system that integrates seven chip types across five rack designs to cover pretraining, post‑training, testing and real‑time agent inference. It has been reported that the new Vera CPU targets agentic AI and reinforcement learning with an 88‑core design, support for up to 1.5 TB of memory and heavy NVLink bandwidth to GPUs, and that Vera CPU is in mass production with partner deliveries planned later this year. Huang positioned the CPU, GPU, DPU and network as a single, deeply coupled data plane for AI factories rather than discrete components.

Low‑latency inference and the Groq LPU

NVIDIA also signaled a heterogenous inference strategy by folding in Groq technology: Groq 3 LPU chips are slated to handle the decode/token stage that GPUs struggle with. It has been reported that Groq 3 LPU uses large on‑chip SRAM and very high on‑chip bandwidth (claims include ~500 MB SRAM and up to 150 TB/s on‑chip bandwidth) to keep data resident and cut latency, with ~1.2 PFLOPs of FP8 compute. NVIDIA has reportedly licensed Groq’s tech and absorbed some team members; the result is an NVL72 rack unit that NVIDIA describes as an “AI factory core,” combining 72 Rubin GPUs and 36 Vera CPUs with NVLink 6 and DPU fabrics to deliver much higher rack‑level throughput and lower token cost.

Why this matters — and the geopolitical dimension

For Western and Chinese audiences alike, the announcements signal a race for end‑to‑end AI infrastructure: from data‑generation blueprints and enterprise runtimes (NemoClaw) to orbital compute. NVIDIA’s space computing plan — moving data‑center‑class compute into constrained spacecraft environments — illustrates ambitions beyond terrestrial cloud. It has been reported that many performance and production claims await independent verification, and geopolitical context matters: U.S. export controls, supply‑chain constraints and national AI strategies mean platform moves like these will be watched closely by regulators and cloud customers worldwide. Is NVIDIA simply scaling chips, or seeding the next global infrastructure sweep? Huang’s GTC answer was clear: the company intends to do both.