Sensei

Robotic Training Data at Scale

Summer 2024active2024•Website

Artificial IntelligenceHard TechMarketplaceRoboticsData Engineering

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from about 2 months ago

What do they actually do

Sensei runs a beta service that collects human demonstration data for training robot manipulation models. They provide a low-cost, wearable controller (a sensorized exoskeleton arm) that captures natural human motion and visual/pose signals while a person performs tasks. The company says the device costs under $300, and pairs it with a web platform that routes data requests to paid human operators who record demonstrations in realistic settings YC profile.

Robotics teams submit a task (e.g., cloth folding, bin sorting), Sensei assigns trained operators (“Senseis”) to perform multiple demos with the wearable, and the platform delivers curated recordings and labels back to the customer. The company is in beta and actively collecting signups from both customers and operators beta page. Sensei is a YC Summer 2024 startup led by founders Anubhav Guha (CEO) and John Piotti (CTO) YC profile.

Who are their target customer(s)

Academic robotics labs building manipulation algorithms: They need many real human demonstrations across cluttered, real-world settings but lack the people and time to collect and label at scale. Outsourcing collection to trained operators with portable wearables reduces lab overhead YC profile.
Commercial robot product teams (warehousing, logistics, last‑mile, hospitality): On‑site teleoperation and bespoke data collection are slow and expensive. A routed operator pool promises faster turnaround and lower cost versus traditional teleop setups YC profile.
ML teams training generalizable manipulation models: Models overfit to controlled lab data and need diverse, labeled demos from many environments and body types. A service that returns curated demonstrations plus labels helps improve robustness beta page.
Companies needing bespoke task datasets (e.g., cloth folding, bin sort): Coordinating hardware, people, and quality control for one‑off datasets is operationally heavy and inconsistent. A managed request‑to‑delivery workflow reduces setup time and risk YC profile.
Robot OEMs and integrators doing validation and edge‑case testing: They require repeatable, realistic coverage across body sizes, environments, and failure modes, which is costly to reproduce internally. A distributed operator network can collect varied data at scale YC profile.

How would they acquire their first 10, 50, and 100 customers

First 10: Run high‑touch pilots with 8–10 labs and early robot teams via founder/Y Combinator intros; ship the device, have operators collect demos, and deliver a polished dataset in exchange for a short paid or discounted engagement and detailed feedback.
First 50: Standardize a 2–4 week paid pilot package and run targeted outbound via conferences, lab lists, and LinkedIn/email; use early case studies and referrals to convert pilots into repeat buyers.
First 100: Launch self‑serve pilot signup with templated intake, operator qualification tracks, and clear pricing; add OEM/university partnerships for distribution and use a small CS team to monitor quality and drive expansions.

What is the rough total addressable market

Top-down context:

The global data collection and labeling market is about ~$3.8B in 2024, with image/video work a large share; this implies roughly $1–2B aligned with the kind of vision/manipulation data Sensei delivers today Grand View Research, Spherical Insights. Niche estimates for robotics‑specific labeling also appear sizable (e.g., NA robotics data labeling in the hundreds of millions) DataIntelo.

Bottom-up calculation:

Illustratively, if ~5,000 academic labs and commercial robotics teams each commission two custom manipulation datasets per year at ~$100k average value, that yields a ~$1B annual opportunity. Higher‑spend enterprise buyers or ongoing retraining programs could expand this further.

Assumptions:

Thousands of active buyers globally across academia and commercial robotics with recurring dataset needs.
Average project value of ~$100k for multi‑demo, curated manipulation datasets; some buyers purchase multiple projects per year.
A meaningful share of vision/robotics labeling requires human‑collected demonstrations rather than synthetic or simulation‑only data.

Who are some of their notable competitors

Scale AI (Physical AI): Runs large, managed data operations (factories, at‑home collectors, annotations) and offers end‑to‑end robotics data services—an alternative for customers seeking big, turnkey contracts Scale Physical AI.
Micro1.ai: Positions as a “human data engine” for humanoid/robotics teams, recruiting/managing human collectors and delivering curated robotics datasets—overlaps with Sensei’s staffed marketplace model.
PrismaX: Teleoperation platform for robotic arms to collect visual/teleop data for training; a direct alternative for teams preferring remote teleop over a wearable plus human‑marketplace approach.
Trossen Robotics (Mass Data Collection): Sells hardware and end‑to‑end services for large, synchronized sensor/video capture—appeals to enterprises favoring centralized collection rigs over distributed human operators.
Academic datasets (RoboTurk, DROID): Research groups provide teleop systems and large in‑the‑wild demo datasets (e.g., RoboTurk RealRobotDataset, DROID) that some teams can use instead of commissioning custom data RoboTurk.