Xiaomi (小米) rolls out three in-house MiMo‑V2 models: Pro agent flagship, Omni multimodal base, and TTS voice model
What was announced
Xiaomi (小米) late last night unveiled three self‑developed large models under the MiMo‑V2 family: MiMo‑V2‑Pro, MiMo‑V2‑Omni and MiMo‑V2‑TTS. The company said the models are already available through Xiaomi miclaw, MiMo Studio, Kingsoft (金山办公)’s WebOffice ecosystem, and the Xiaomi browser, and can be accessed via agent frameworks such as OpenClaw, OpenCode, KiloCode, Blackbox and Cline for a limited free one‑week trial. Why does this matter? Because Xiaomi is positioning these models not just as chat engines but as embedded agents and tooling for real workflows.
Capabilities and claimed performance
MiMo‑V2‑Pro is billed as an agent‑first flagship tuned for high‑intensity real‑world workflows, with “over 1T total parameters” (42B activation parameters) and a 1M token context window. It has been reported that Xiaomi uses a hybrid attention architecture and that internal and third‑party rankings place MiMo‑V2‑Pro among the top models globally; reportedly it sits near leading closed‑source systems on many agent benchmarks while being priced materially lower. MiMo‑V2‑Omni is presented as a true multimodal base for long audio, video and visual reasoning—Xiaomi claims strong audio understanding (multi‑speaker separation, >10‑hour audio comprehension), advanced chart and visual reasoning, and native audio‑video joint inputs. MiMo‑V2‑TTS is a text‑to‑speech model trained with a proprietary audio tokenizer and multi‑codebook architecture, claiming fine‑grained style and dialect control and even singing capabilities across multiple Chinese dialects.
Pricing, integration and developer access
Xiaomi opened APIs on platform.xiaomimimo.com with tiered pricing. MiMo‑V2‑Pro supports the 1M token window and is priced at $1/$3 per million tokens (input/output) within 256K, and $2/$6 per million tokens for the full 1M window. MiMo‑V2‑Omni supports 256K context at $0.4/$2 per million tokens. It has been reported that Xiaomi has integrated MiMo into Kingsoft WebOffice to power document workflows across Word, Excel, PPT and PDF, and that the models are pre‑wired into several agent frameworks to accelerate developer adoption.
Industry and geopolitical context
The rollout highlights China’s push to build domestic, production‑grade AI stacks amid global tech tensions and export controls that have complicated access to advanced chips and Western AI services. Xiaomi’s emphasis on agent orchestration, long‑context understanding and on‑device and cloud integration reflects a broader trend among Chinese tech firms to reduce reliance on foreign models and tooling while serving massive local enterprise and consumer ecosystems. As always with vendor claims, some performance and ranking statements remain based on vendor or internal benchmarks—reportedly strong, but subject to independent verification.
