No Longer Waiting for OpenAI: Why Is “The Dark Side of the Moon” (月之暗面) Changing Tactics Now?

The pivot in public view

Jensen Huang (黄仁勋) has stopped talking like a chip-company CEO. After an unusually intense three-month public push — from a candid Joe Rogan admission of constant anxiety to multiple long-form interviews and a high-profile GTC showcase — Huang has been systematically reframing Nvidia (英伟达) as more than a GPU vendor. It has been reported that Nvidia paid roughly $20 billion in cash for core assets of Groq; whether exact figures hold, the signal is clear: this is a strategic bet on heterogeneous hardware plus orchestration, not just denser GPUs. Why the urgency? Because Huang is trying to change how Wall Street, hyperscalers and enterprise customers think about value in an era in which “tokens” — the outputs of continuous inference — may matter more than one-time training cycles.

Why chips alone are losing the story

The technical argument is simple and persuasive. Training models is a big, sporadic cost. Inference is continuous and compoundingly valuable. GPUs excel at the massively parallel prefill stage of a request, but during token-by-token decoding their huge arrays of compute units are often idle; the bottleneck becomes memory bandwidth and deterministic latency. Groq’s LPU architecture is reportedly optimized for that decoding phase — with SRAM on-chip and very high memory bandwidth — and Nvidia showed a proposal pairing Groq 3 LPX racks with its Vera Rubin GPUs that it says can be many times more efficient for high-interactivity workloads. Reportedly, those element-level numbers are striking; take them as directional evidence that hardware specialization matters.

Software and economics matter as much as silicon

Hardware heterogeneity only works if something decides, in microseconds, which chip handles which piece of work. Enter Dynamo — Nvidia’s scheduler and runtime released alongside the Groq announcement — and a five-tier “token” pricing model Huang used to illustrate how different workloads could be mapped to different cost and performance profiles. This is a classic platform move: instead of selling parts, Nvidia wants to sell an operating model for inference factories — chips, network, caching and the software that stitches them together. Think Intel-as-component versus Microsoft-as-platform; Nvidia is clearly trying to migrate toward the latter.

Geopolitics and the market test

There are complicating realities. Export controls, US-China technology tensions and supply-chain sensitivity mean that pitching a vertically integrated inference stack to Chinese customers is not merely a commercial story. Huang’s recent visits to Shanghai and Shenzhen, and his outreach to supply-chain partners, can be read as reassurance tours — but deployment and sales across jurisdictions will face policy constraints. For investors and rivals, the question is straightforward: will the market accept a recategorization of Nvidia from a chip vendor to an “AI factory” operator? If Huang succeeds, the economics of cloud providers, chip startups and even national industrial policy will have to be recalculated.