Four Chinese embodied-intelligence firms back a startup that wants to "compile" robot data into training-ready assets
What they are building
A new Beijing startup, 智域基石 (ArcheBase), has emerged to tackle what its founders call the next bottleneck in robotics: turning messy, multimodal physical-world signals into directly trainable data. The company was formed in a month by CEO Yang Zhexuan — an early core member of PingCAP — CTO Xu Liangwei, and COO Zhang Jiye (former Huawei (华为) city general manager and head of ecosystem at Qiongche Smart), who together cover distributed systems, robotics algorithms and industrial partnerships. It has been reported that Lingchu (灵初), Qiongche (穹彻), Zhi Pingfang (智平方) and Zhe Renxing (浙人形) jointly invested in the venture, reportedly closing an angel round worth several tens of millions of yuan and becoming ArcheBase's first customers.
Why data, not more sensors? The founders argue that advances in hardware and base models shifted the competitive edge to "data compilation" — a new layer between raw physical signals and task-level models. ArcheBase plans nationwide "real-machine" data factories exceeding 10,000 square meters, with more than 400 robots and over 10 heterogeneous hardware forms, aiming to automatically convert chaotic sensor streams (vision, IMU, joint state, force) into task-structured training inputs that improve success rates in real tasks.
How they do it — and why it matters
ArcheBase describes a five-step, cloud-native pipeline: incoming-data quality control; time-space (temporal and spatial) alignment across sensors and bodies; "compilation" to infer task features (contact states, action phases, intent, spatial relations); smart retrieval and dataset composition; and standardized, traceable delivery for different training stacks. The founders emphasize that the real barrier is not quantity but a reproducible "data refinery" — one that handles non‑standardized streams from different robot bodies, sensors and scenarios, automating what many firms still do by hand.
This plays into broader trends. Embodied AI — robots that must "do" in addition to "see" — generates far more complex, stateful data than image or driving datasets. Is this just more of the same as autonomous‑driving data work? The founders say no: robotics needs action‑aware, temporally continuous, multi‑modal compilation tied to specific embodiments. Against a backdrop of escalating US‑China tech competition and export controls on advanced chips, Chinese startups are increasingly doubling down on software and data infrastructure to de‑risk dependence on foreign hardware. Reportedly, several domestic robot firms are already using ArcheBase’s services as both investors and early customers — a sign that for many Chinese robotics teams, the next wave of differentiation may come from data engineering rather than new sensors.
