← Back to stories A modern humanoid robot with digital face and luminescent screen, symbolizing innovation in technology.
Photo by Kindel Media on Pexels
凤凰科技 2026-03-18

Xiaomi (小米) debuts MiMo‑V2‑TTS, tying expressive speech to its MiMo‑V2‑Omni multimodal roadmap

The announcement

Xiaomi (小米) today unveiled MiMo‑V2‑TTS, a self‑developed large speech synthesis model the company says can speak, act and even sing with fine‑grained control. The system is built around a proprietary Audio Tokenizer and a multi‑codebook speech–text joint modelling architecture; it has been reported that Xiaomi pretrained the model on hundreds of millions of hours of speech data and applied multi‑dimensional reinforcement learning to balance stability and expressiveness. Can one model convincingly do both natural conversation and musical performance? Xiaomi claims MiMo‑V2‑TTS can.

Technical highlights

According to the announcement, the model supports multi‑granularity style control—from global speaking style down to local emotional shifts—allowing tone changes and emotional gradations within a single sentence. It reportedly maps text cues such as punctuation, interjections and emphasis markers into natural prosody without extra user annotation, and supports a range of Chinese dialects and accents (东北话, 四川话, 河南话, 粤语, 台湾腔) as well as actor‑style role play and high‑quality singing synthesis. Xiaomi frames the work as both a standalone TTS breakthrough and a component intended to be deeply fused with its MiMo‑V2‑Omni multimodal understanding capabilities.

Why it matters

For Western readers, the announcement reflects China’s rapid push to build full‑stack AI capabilities: speech, vision and language are being tied together into agentic systems that can “see, understand and speak” in expressive human voice. It has been reported that Xiaomi plans broader multilingual coverage and closer integration between MiMo‑V2‑TTS and the MiMo‑V2‑Omni base model to enable tool invocation and multimodal perception. This drive comes amid geopolitical headwinds—export controls on advanced chips and growing scrutiny of Chinese AI firms—pushing domestic players to vertically integrate software and hardware and accelerate model innovation.

AISmartphonesSpace
View original source →