Cortex AI logo

Cortex AI

Large-scale real-world robot & egocentric data for embodied AI

Fall 2025active2025Website
Reinforcement LearningRoboticsAI
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 27 days ago

What do they actually do

Cortex AI collects and packages two kinds of real‑world data for robotics teams: (1) egocentric human video from workplaces with depth, hand/body pose, and subtask labels; and (2) robot trajectory data captured in real industry environments. They also provide human‑in‑the‑loop support during deployments, where remote operators oversee rollouts, perform recoveries, and log interventions so each deployment yields new training and evaluation data Cortex site Egocentric data page.

Today they are an early data pipeline and marketplace rather than a finished robot product or large public dataset. They recruit workplaces to host capture sessions, run teleop/robot data collection in those environments, and return annotated datasets that can be used to pretrain or fine‑tune embodied AI models; public materials indicate an active pilot/early commercial phase Cortex site YC profile.

Who are their target customer(s)

  • Robotics research labs building foundation models: Need large volumes of real‑world, annotated egocentric and robot‑trajectory data because lab/sim data under‑represent the variability and edge cases of actual workplaces. Existing public datasets rarely match their tasks or embodiments.
  • Early‑stage robotics startups (general‑purpose/manipulation): Struggle to capture failure and recovery examples during pilots and lack embodiment‑specific trajectory logs with labels to iteratively improve policies in production environments.
  • Industrial automation teams / system integrators: Require data from real factories/warehouses that reflects their equipment, layouts, and workflows; lab or synthetic trajectories often fail to transfer without costly re‑collection in situ.
  • Vision and embodied‑AI model teams focused on hand/object interactions: Need depth plus hand/body pose and subtask labels from real workers; most public egocentric datasets are too narrow or lack the annotation richness needed for manipulation and perception training Egocentric data page.
  • Deployment and operations managers running pilot rollouts: Need reliable human‑in‑the‑loop oversight, recovery tooling, and a way to capture intervention data while maintaining consent and privacy in live workplaces Cortex site community discussion.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Run hands‑on pilots with frontier labs and a few early startups, offering discounted data capture and human‑in‑the‑loop ops in exchange for technical feedback, case studies, and clear consent/data‑use agreements, sourced via founder networks and YC intros Cortex site YC profile.
  • First 50: Productize pilots into a repeatable “pilot‑to‑rollout” package (onboarding checklist, consent/legal templates, field SOPs) and use early case studies for targeted outreach to startups, university labs, and integrators; stand up regional capture teams to run deployments in parallel Egocentric data page YC profile.
  • First 100: Scale via a workplace marketplace and channels: self‑serve onboarding for host sites, a reseller program for integrators/OEMs, and a subscription‑like package for ongoing rollout support and continuous data streaming; market anonymized benchmarks and deployment wins while enforcing robust data‑governance workflows Cortex site YC profile privacy/community.

What is the rough total addressable market

Top-down context:

The closest direct spend is the data collection and labeling market, about $3.8B in 2024, which covers annotated video/LiDAR and trajectory labeling like Cortex’s offering Grand View Research. The broader buyer ecosystem sits within the ~$50B global robotics market, with embodied‑AI/software spend estimated in the low billions and growing IMARC Grand View—Industrial Robotics Market.us—Embodied AI.

Bottom-up calculation:

A practical bottom‑up view multiplies the number of qualified buyers (frontier labs, robotics startups, and integrators willing to outsource capture/ops) by an average annual contract for real‑world egocentric/trajectory data plus rollout support. For example, standardizing on scoped pilots that expand to ongoing data streams yields multi‑tens‑to‑hundreds‑of‑thousands of dollars per customer per year, scaling with capture days, annotation depth, and human‑in‑the‑loop coverage.

Assumptions:

  • A meaningful share of robotics teams prefer outsourcing real‑world capture/annotation versus building in‑house pipelines.
  • Workplace consent, privacy, and data governance can be operationalized at scale to permit ongoing recording and reuse of data.
  • Customers value embodiment‑specific trajectories and recovery data enough to fund recurring contracts rather than one‑off projects.

Who are some of their notable competitors

  • Ego4D / Ego‑Exo4D (Meta): Large public egocentric datasets with extensive annotations widely used by researchers; substitutes for commissioning new workplace captures when scope aligns Ego4D Meta blog.
  • Datagen: Synthetic human/hand/body data with built‑in labels that can reduce the need for real‑world captures when synthetic realism is sufficient Datagen Wikipedia.
  • Appen: Established crowdsourcing and annotation marketplace; customers may choose generalized video collection/labeling pipelines over a robotics‑specific provider Appen.
  • Scale AI (Physical AI / Data Engine): End‑to‑end data collection, labeling, and tooling for robotics/“physical AI,” a direct alternative for turnkey high‑quality annotation and data engineering Scale AI.
  • Covariant (RFM‑1): Operations‑scale robotics vendor with production robot telemetry and a robotics foundation model; some teams may partner for models/data instead of buying from third‑party data providers Covariant RFM‑1 Covariant blog.