What do they actually do
Idler builds reinforcement‑learning training environments from real‑world coding problems so model‑training teams can teach and evaluate code‑generation models on realistic engineering tasks. Labs run these scenario‑based environments during training and evaluation to give models practice and to measure progress against clear success criteria YC page, LinkedIn.
Today the company operates as a very small, YC‑backed team selling bespoke environment and data work directly to model labs, not as a public self‑serve product. The founders say they’ve closed a multimillion‑dollar contract with a leading foundation lab and are scaling capacity to meet demand; specific delivery mechanics (hosted service vs. datasets vs. API) aren’t publicly detailed Work at a Startup, YC page, PitchBook.
Who are their target customer(s)
- Foundation model labs training frontier code models: They need high‑quality, instrumented scenarios to teach models complex, end‑to‑end engineering tasks but struggle to build enough of them in‑house at the required quality and scale. Idler already reports at least one multimillion‑dollar lab contract Work at a Startup, YC page.
- ML research and evaluation teams inside AI companies: They require repeatable, hard problem suites and clear success criteria to track model progress, but existing benchmarks are low‑signal for real engineering work and costly to design and maintain YC page, LinkedIn.
- Small model startups and labs without large engineering ops: They lack bandwidth and specialist tooling to create realistic RL environments, forcing trade‑offs between training‑data quality and core model work; a packaged environment layer reduces that burden YC page, Work at a Startup.
- Product teams shipping code‑assistant features: They need confidence models can complete end‑to‑end engineering tasks reliably, but unit tests and public datasets don’t capture complex real‑world workflows YC page, LinkedIn.
- Academic groups and benchmarking organizations: They want reproducible, versioned environments that reflect engineering complexity, but building well‑instrumented scenarios is time‑consuming and inconsistently specified YC page, Work at a Startup.
How would they acquire their first 10, 50, and 100 customers
- First 10: Run paid, bespoke pilots with foundation labs and top research teams, delivering one or two instrumented environments and reporting model gains against agreed metrics; leverage the existing multimillion‑dollar reference and YC network for introductions and procurement YC page, Work at a Startup.
- First 50: Productize repeatable environment bundles and a standard pilot playbook, then target mid‑sized labs, startups, and internal eval teams via outbound, conference presence, and a small delivery+sales pairing to convert pilots into subscriptions YC page, LinkedIn.
- First 100: Launch a hosted catalog/API with tiered pricing and partner distribution (compute providers, benchmarking orgs, academic groups) for self‑serve adoption, while an enterprise team pursues larger bespoke deals Work at a Startup, YC page.
What is the rough total addressable market
Top-down context:
Idler sits across AI training datasets, data labeling/annotation, RL tooling, and AI infrastructure—categories collectively worth many billions today and growing quickly Grand View Research, Fortune Business Insights, Grand View Research—Labeling, Precedence Research, ResearchAndMarkets, MarketsandMarkets.
Bottom-up calculation:
Near‑term buyers are limited to a few dozen–few hundred high‑budget labs/teams globally. If ~75 buyers adopt at $1–2M annually for bespoke RL coding environments, the serviceable near‑term market is roughly $75–150M, with upside as the offer becomes a reusable platform/API.
Assumptions:
- Global pool of near‑term buyers ~50–150 labs/teams running repeated RL training/eval cycles.
- Average annual contract value in the $1–2M range for bespoke, instrumented coding environments (multimillion‑dollar deals exist) Work at a Startup.
- Adoption initially concentrated among foundation labs and advanced ML orgs, expanding with productization.
Who are some of their notable competitors
- OpenAI Evals: Open framework and registry for building and automating model evaluations; teams can create custom evals and run large test suites, reducing the need to buy bespoke environments for many coding/behavior tests.
- PrimeIntellect (Environments Hub): Hosts and runs RL environments—including code/agent tasks—paired with an RL training stack; positions as an open alternative for teams wanting ready‑made, instrumented environments and infra instead of a vendor.
- Scale AI (Evaluation): Managed evaluation pipelines with custom eval sets, human‑in‑the‑loop grading, and reporting for frontier models; appeals to enterprises seeking audited, turnkey evals rather than building scenario suites in‑house.
- OpenCompass: Open‑source eval platform and benchmark registry covering many LLM tasks; low‑cost option for teams that can assemble and run public eval suites instead of purchasing bespoke, instrumented coding environments.
- CoderEval and code‑execution harnesses: Open projects that compile/run generated code against tests to measure correctness; overlaps with code‑generation evaluation needs for teams that only require execution/test harnesses vs. full scenario‑driven RL environments.