What do they actually do
Sureform collects and curates high-quality human audio+video datasets—starting with speech and first‑person (head‑mounted) household/work tasks—and sells them to AI and robotics teams that need real‑world multimodal training data. They run a paid contributor network that records POV footage (optionally with a second, tripod camera), then manually review uploads for quality, format, and consent before the data is packaged for customers [homepage; YC page; LinkedIn job]. Contributors are given clear recording specs (e.g., MP4, 1080p+, audio; 60fps preferred) and are paid per validated hour of “high‑activity” footage (publicly listed at ~$30/hr for single‑camera POV and ~$50/hr for POV + tripod) [LinkedIn job].
The company is early-stage (YC S25) with a small team and a sales‑led, bespoke dataset delivery model rather than a self‑serve platform. In practice, they operate a logistics pipeline to recruit and pay contributors, validate footage, and deliver curated multimodal datasets to AI/robotics customers [YC page; homepage; LinkedIn company profile].
Sources: homepage, YC page, LinkedIn job, LinkedIn company profile
Who are their target customer(s)
- Consumer/home robotics perception & manipulation teams: They need large volumes of realistic, head‑mounted footage of hands and objects in real homes; collecting this is logistically hard, expensive, and compliance‑sensitive.
- Multimodal model teams fusing audio + video: They struggle to source synchronized, real‑world audio+video with consistent quality and clear consent/metadata for training and evaluation.
- Warehouse/industrial robotics and HRC (human‑robot collaboration) groups: They need domain‑specific clips (including edge cases) from operational sites with privacy and legal controls; capturing validated footage on‑site is time‑consuming and risky.
- AR/VR and wearable interaction/hand‑tracking teams: They need head‑mounted POV recordings with hands and workspace under varied motion/lighting; public datasets often miss realistic rigs and synchronized modalities.
- Academic/industry embodied AI research labs: They want reproducible, well‑documented datasets with clear consent/release and manual QC; scraped or ad‑hoc datasets often lack this documentation and reliability.
How would they acquire their first 10, 50, and 100 customers
- First 10: Run tightly scoped, paid pilots via YC intros and targeted outreach to robotics, AR/VR, and multimodal labs; handle ingestion and feedback hands‑on, then convert wins into references [homepage; YC page].
- First 50: Productize a repeatable pilot kit (sample clips, data spec, pricing tiers, legal templates) and scale targeted outbound with sales/CS support to run multiple concurrent pilots and close volume deals.
- First 100: Offer standardized dataset packages with simple procurement/docs/APIs and add partner channels (robotics OEMs, ML platforms, university consortia) while scaling contributor ops and semi‑automated QC to meet SLAs.
What is the rough total addressable market
Top-down context:
Analysts size the global AI training‑dataset market at roughly USD ~2.6B–3.2B in 2024/25, with image/video the largest segment (~41%), implying ~USD 1.0–1.3B for image/video today [Grand View Research; ResearchAndMarkets] (GVR, R&M). Adjacent robotics and XR markets (tens of billions) signal growing downstream demand [Mordor Intelligence; Statista] (Robotics, XR).
Bottom-up calculation:
Starting from the ~USD 1.0–1.3B image/video slice, apply a 15–40% share for first‑person, synchronized audio+video with consent/metadata (Sureform’s niche) to get ~USD 150–520M; an upside case at 50% yields ~USD 500–650M [GVR; R&M] (GVR, R&M).
Assumptions:
- 15–50% of image/video dataset spend is on multimodal, head‑mounted/first‑person data relevant to embodied AI.
- Buyers pay a premium for consented, validated human POV data vs. scraped/open sources.
- Embodied AI and AR/VR adoption grows, while synthetic data does not fully replace real footage.
Who are some of their notable competitors
- Appen: Large, global provider of data collection and annotation (including audio/video) with enterprise programs and compliance processes; overlaps on consented data capture for ML teams.
- Scale AI: Enterprise data generation and labeling platform with data collection services and a managed workforce; competes for multimodal dataset projects and custom pipelines.
- TELUS International AI Data Solutions: Global crowd + services for collecting and annotating speech, audio, and video; strong on compliance and scale for enterprise buyers.
- Defined.ai: Marketplace and services for high‑quality speech/audio and visual datasets with consent and metadata; relevant for multimodal and domain‑specific data needs.
- Synthesis AI: Synthetic human‑centric image/video data (faces, bodies, scenes) used for perception and interaction models; a substitution competitor where synthetic can replace real data.