Behind USD 450 Million in Financing: Rhoda AI Uses DVA to Challenge Mainstream Robot Architectures

A bold bet against the VLA consensus

It has been reported that U.S. robotics startup Rhoda AI quietly re-emerged after an 18‑month silence to unveil a new “direct video action” (DVA) model and to close a $450 million Series A round, valuing the company at about $1.7 billion. The move is striking because DVA explicitly challenges the dominant vision‑language‑action (VLA) paradigm that many embodied‑AI teams have been pursuing: rather than learning actions from expensive teleoperated robot data, Rhoda AI’s core idea is to make robots predict how the visual world changes and then convert those imagined visual futures into motor commands.

How DVA works — imagine first, act later

According to Rhoda AI’s technical description, the system uses a causal video model to predict multiple future frames from current observation, then an inverse‑dynamics model to map predicted visual changes to robot actions. Reportedly this two‑step loop—observe, imagine, act, reobserve—can be trained with only on the order of ten hours of real robot data after large‑scale internet video pretraining. In demo tasks highlighted by the company, unpacking required roughly 11 hours of robot data and a container‑disassembly workflow about 17 hours; by contrast, teleoperation‑heavy VLA approaches commonly cite hundreds of hours to reach similar robustness.

Why data and geopolitics matter

The battle over architectures is really a battle over data economics: teleoperation data is costly, narrow and hard to scale; internet video is cheap and vast. But scaling on web video raises its own challenges—domain shift, compute demands for long‑context video models, and questions about data access and localization. It has been reported that investors are placing a clear bet on DVA with this round, yet geopolitical constraints complicate the picture: export controls on advanced AI chips, U.S.–China tensions over technology supply chains, and differing data‑governance regimes could all shape who actually benefits from internet‑scale pretraining and where industrial deployments happen.

What to watch next

DVA is a provocative, data‑efficient alternative but not yet a proven replacement for VLA in safety‑critical, multi‑step industrial settings. The key tests will be independent benchmarks, third‑party replication of the low‑data claims, and real world deployments beyond curated lab tasks. Can imagining the future reliably substitute for millions of hours of teleoperated experience—or will the field end up with hybrid stacks that combine both approaches? The capital poured into Rhoda AI says the question is worth answering fast.