Sureform

High-quality human data for multimodal and physical AI

Spring 2025active2025•Website

Artificial IntelligenceB2B

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 3 months ago

What do they actually do

Sureform collects and curates high-quality human audio+video datasets—starting with speech and first‑person (head‑mounted) household/work tasks—and sells them to AI and robotics teams that need real‑world multimodal training data. They run a paid contributor network that records POV footage (optionally with a second, tripod camera), then manually review uploads for quality, format, and consent before the data is packaged for customers [homepage; YC page; LinkedIn job]. Contributors are given clear recording specs (e.g., MP4, 1080p+, audio; 60fps preferred) and are paid per validated hour of “high‑activity” footage (publicly listed at ~$30/hr for single‑camera POV and ~$50/hr for POV + tripod) [LinkedIn job].

The company is early-stage (YC S25) with a small team and a sales‑led, bespoke dataset delivery model rather than a self‑serve platform. In practice, they operate a logistics pipeline to recruit and pay contributors, validate footage, and deliver curated multimodal datasets to AI/robotics customers [YC page; homepage; LinkedIn company profile].

Sources: homepage, YC page, LinkedIn job, LinkedIn company profile

Who are their target customer(s)

Consumer/home robotics perception & manipulation teams: They need large volumes of realistic, head‑mounted footage of hands and objects in real homes; collecting this is logistically hard, expensive, and compliance‑sensitive.
Multimodal model teams fusing audio + video: They struggle to source synchronized, real‑world audio+video with consistent quality and clear consent/metadata for training and evaluation.
Warehouse/industrial robotics and HRC (human‑robot collaboration) groups: They need domain‑specific clips (including edge cases) from operational sites with privacy and legal controls; capturing validated footage on‑site is time‑consuming and risky.
AR/VR and wearable interaction/hand‑tracking teams: They need head‑mounted POV recordings with hands and workspace under varied motion/lighting; public datasets often miss realistic rigs and synchronized modalities.
Academic/industry embodied AI research labs: They want reproducible, well‑documented datasets with clear consent/release and manual QC; scraped or ad‑hoc datasets often lack this documentation and reliability.

How would they acquire their first 10, 50, and 100 customers

First 10: Run tightly scoped, paid pilots via YC intros and targeted outreach to robotics, AR/VR, and multimodal labs; handle ingestion and feedback hands‑on, then convert wins into references [homepage; YC page].
First 50: Productize a repeatable pilot kit (sample clips, data spec, pricing tiers, legal templates) and scale targeted outbound with sales/CS support to run multiple concurrent pilots and close volume deals.
First 100: Offer standardized dataset packages with simple procurement/docs/APIs and add partner channels (robotics OEMs, ML platforms, university consortia) while scaling contributor ops and semi‑automated QC to meet SLAs.

What is the rough total addressable market

Top-down context:

Analysts size the global AI training‑dataset market at roughly USD ~2.6B–3.2B in 2024/25, with image/video the largest segment (~41%), implying ~USD 1.0–1.3B for image/video today [Grand View Research; ResearchAndMarkets] (GVR, R&M). Adjacent robotics and XR markets (tens of billions) signal growing downstream demand [Mordor Intelligence; Statista] (Robotics, XR).

Bottom-up calculation:

Starting from the ~USD 1.0–1.3B image/video slice, apply a 15–40% share for first‑person, synchronized audio+video with consent/metadata (Sureform’s niche) to get ~USD 150–520M; an upside case at 50% yields ~USD 500–650M [GVR; R&M] (GVR, R&M).

Assumptions:

15–50% of image/video dataset spend is on multimodal, head‑mounted/first‑person data relevant to embodied AI.
Buyers pay a premium for consented, validated human POV data vs. scraped/open sources.
Embodied AI and AR/VR adoption grows, while synthetic data does not fully replace real footage.

Who are some of their notable competitors

Appen: Large, global provider of data collection and annotation (including audio/video) with enterprise programs and compliance processes; overlaps on consented data capture for ML teams.
Scale AI: Enterprise data generation and labeling platform with data collection services and a managed workforce; competes for multimodal dataset projects and custom pipelines.
TELUS International AI Data Solutions: Global crowd + services for collecting and annotating speech, audio, and video; strong on compliance and scale for enterprise buyers.
Defined.ai: Marketplace and services for high‑quality speech/audio and visual datasets with consent and metadata; relevant for multimodal and domain‑specific data needs.
Synthesis AI: Synthetic human‑centric image/video data (faces, bodies, scenes) used for perception and interaction models; a substitution competitor where synthetic can replace real data.