A paper triggered a plunge in storage‑chip stocks — has Google's 'DeepSeek moment' arrived?
The trigger
It has been reported that Google Research released a new AI compression method called TurboQuant that went viral online and helped spark a sell‑off in US storage‑chip shares. Investors briefly dumped names tied to high‑bandwidth memory and NVMe storage — Micron (美光科技) and SanDisk (闪迪) among them — after headlines suggested a software trick could sharply reduce GPUs' memory needs. Why the panic? Because TurboQuant, if it works at scale, threatens a core investment narrative: that ever‑bigger models will keep driving relentless demand for expensive memory hardware.
The tech
TurboQuant reportedly tackles the KV cache problem — the growing memory footprint models require to remember long context during inference — with a two‑stage mathematical design. The first stage, PolarQuant, converts vectors to polar coordinates so directional components become highly predictable and can be quantized without per‑block normalization metadata. The second stage, a Quantized Johnson‑Lindenstrauss transform (QJL), projects residual errors into low dimensions and encodes them as compact sign bits for statistical correction. It has been reported that the method compresses KV cache by roughly sixfold on benchmarks like Llama‑3.1‑8B and Mistral‑7B while preserving attention scores, and that third‑party ports on Apple Silicon and NVIDIA H100 show large speed and memory gains — claims that still need broader independent verification.
Market and geopolitical ripple effects
The immediate market reaction exposed how much of the storage‑chip bull case rests on the assumption of ever‑rising GPU memory demand. But broader consequences could be geopolitical. Western export controls and sanctions have tightened access to top‑end GPUs and HBM for some countries; software that lowers memory requirements could ease deployment constraints for local or edge inference and reshape cloud capacity economics. Or will it just be Jevons' paradox in AI form — cheaper inference leading to vastly more usage and eventually higher total demand for compute? Investors, hardware vendors and policymakers will be watching closely.
Outlook
TurboQuant is a striking lab result with clear commercial potential, and it has been reported that figures such as Cloudflare CEO Matthew Prince called it a "DeepSeek moment." Yet real‑world adoption, enterprise integration and cross‑vendor validation will determine whether this is a disruptive shift or a blip. For now, the episode is a reminder: in AI, software innovation can rewire hardware markets overnight.
