Xiaomi's MiMo API price cut driven by inference gains; "we can basically break even," says executive

What happened

It has been reported that Xiaomi (小米) MiMo yesterday announced a permanent price cut to its MiMo‑V2.5 series API, with new rates slashed by as much as 99% compared with original pricing and no longer differentiated by context‑window length. Luo Fuli (罗福莉), head of MiMo, reportedly wrote on X that the move is rooted in structural inference and caching improvements rather than a simple commercial price war.

The technical angle

Luo reportedly said the biggest reductions—up to 99%—apply to inputs that hit the cache. He attributed this to a layered KV cache optimization for SWA in Xiaomi’s inference framework, which testing showed increased cached token capacity roughly fivefold, effectively cutting cache cost by about 80%. He also cited cache‑read overlap between Full Attention modules and an extreme 1:7 Full:SWA sparsity in the Hybrid model family (MiMo‑V2.5‑Pro’s 70 layers has prefill compute roughly comparable to a 10‑layer GQA model), which together drove input and output price drops of roughly 60–80%. It has been reported that Luo added the company’s base inference costs are far below industry averages, allowing Xiaomi to pass savings to developers while saying it can “still essentially break even.”

Why it matters

This is more than a price move. Xiaomi framed the cut as a way to stimulate sustained, large‑scale inference demand that can pull the whole AI hardware stack—chips, servers, networking, cooling and data centers—along with it. Against the backdrop of U.S. export controls on advanced chips and China’s push to build domestic compute capacity, cheaper, efficient inference could be a strategic lever. Luo reportedly warned other LLM vendors against “blind price cuts,” arguing only architectures and inference infrastructures that actually reduce compute and KV cache loads can sustain such discounts without losses. Will competitors follow, or will this become a structural advantage for firms with similar architectural and infra chops?