← Back to stories Aerial view of a drone flying over snow-covered mountains with dramatic clouds.
Photo by Alan Kabeš on Pexels
凤凰科技 2026-03-28

HiDream.ai (智象未来) bets on full‑modal "world model" and agentic apps instead of chasing giant general models

A different path in China's cutthroat AI video race

HiDream.ai (智象未来) is deliberately avoiding the "bigger‑is‑better" chase for general large models and instead building a unified full‑modal world model to power a suite of agentic applications. The company’s CTO Yao Ting, a former Microsoft Research (微软研究院) and JD.com (京东) engineer, laid out the strategy at a Zhongguancun forum: one底座 (a single full‑modal foundation) plus three vertical agent exits — video creation, AI film production and marketing tools. It has been reported that the three‑year‑old start‑up is in B‑round fundraising and claims ARR in the multi‑tens of millions of dollars and rapid user growth for its vivago product; those figures remain unverified.

From research roots to productized pipelines

Yao’s background — from early visual dialogue work on Xiaoice at Microsoft to scaling image search and 24/7 warehouse vision systems at JD.com — shapes HiDream.ai’s emphasis on practical, scene‑specific models. He frames "world models" in three tiers (high‑level LLM‑like knowledge, mid‑level predictive understanding à la JEPA, and low‑level pixel/video generation) and argues the future lies in full‑modal tokenization rather than stitched multi‑modal encoders. The firm is pushing "agentic" end‑to‑end apps: harness plus skills plus an OS‑like layer he dubs "OpenClaw" to manage contextual orchestration. The goal: arbitrary inputs, arbitrary outputs — video, actions, or both.

Industrial playbook and geopolitical context

HiDream.ai is commercializing with a "1+3" product architecture: a world‑model base and three intelligent agents, including vivago, an AI filmmaking tool and a marketing suite. It has been reported that vivago reached tens of millions of overseas professional users and that the company has delivered thousands of minutes of short-form AI content. To reduce dependence on foreign chips and align with domestic supply chains, HiDream.ai says it has adapted to Alibaba Cloud (阿里云), Huawei Cloud (华为云) and Cambricon (寒武纪) compute — a practical pivot that mirrors broader Chinese industry moves following US export controls on advanced AI hardware. The firm is also engaging with embodied‑AI players such as robotics vendor 诺亦腾 (NuoYiteng) to combine synthetic and real data for high‑precision training.

Vertical focus versus platform ambition — will it scale?

HiDream.ai’s approach signals a strategic debate playing out across China’s AI scene: double down on vertical, productized stacks tailored to scenes, or try to build a horizontal general model to win ecosystem lock‑in with platforms like ByteDance (字节跳动) and Kuaishou (快手)? Yao argues Sora’s retreat shows that vertical depth matters more than chasing generality. But challenges remain — industrial content quality, distribution economics, and the limits of domestic compute — so can a nimble start‑up outiterate the oligopolies and capture end‑user demand at scale? The answer will say a lot about where China’s generative AI market heads next.

AIRobotics
View original source →