As AI labs rekindle robotics efforts, they confront a crucial shortage: high-quality training data for physical interaction. Startup XDOF is filling this void with specialized data collection and annotation, attracting top AI researchers and $70 million in venture funding.
- XDOF addresses critical scarcity of robot training data
- Backed by $70M from leading venture funds
- Partnered with Berkeley to release largest robotics dataset
What happened
AI research labs are intensifying their focus on robotics, highlighted by OpenAI’s recent robotics program relaunch. Unlike language AI that leverages abundant text data, robotics requires complex datasets that capture detailed physical interactions, a resource largely unavailable today. To overcome this, XDOF launched in October 2024 to build the data infrastructure for robot training, providing teleoperation tools, data pipelines, and annotation systems.
The company, founded by robotics researchers including Philippe Wu and Fred Shentu, has secured $70 million in funding from Thrive Capital, a16z, Spark Capital, Lux, and WndrCo. XDOF currently supports around 20 customers, including frontier AI labs, offering solutions that range from hardware for human-controlled robot data capture to software for cleaning and labeling this data.
Why it matters
Robots need high-quality physical training data to perform tasks in real environments, but this data has been scarce and difficult to produce. Existing sources, such as YouTube or gig worker footage, fail to provide high-fidelity, consistent datasets critical for machine learning. XDOF’s comprehensive approach builds a feedback loop of data collection, cleaning, and annotation to accelerate robotics learning.
This development is significant as it addresses a bottleneck that could determine which AI labs lead the next frontier of AI: embodied intelligence. By partnering with UC Berkeley AI Research lab to release the ABC dataset—the largest of its kind with extensive robot manipulation trajectories and evaluations—XDOF is enabling academic and industrial researchers to train robots on real-world manipulation tasks more effectively.
What to watch next
XDOF plans to develop a tiered data strategy to enhance robot learning further: capturing teleoperation data directly from deployment robots, general teleoperated robot data, and ‘egocentric’ data from humans performing everyday tasks using custom wearable sensors. The quality of data capture hardware will be crucial, influencing the performance of robotic perception and control algorithms.
As more labs invest in robotics, XDOF’s role in providing scalable data ecosystems and tools will be key to determining who progresses first in physical AI capabilities. Keep an eye on additional dataset releases, hardware innovations in data collection, and expansion of XDOF’s client base within the AI robotics community.