What do they actually do
MangoDesk is a hosted service that turns a plain‑English spec into a working human‑labeling and evaluation pipeline. From a written description, it generates the annotation interface, annotator instructions, routing rules, QA checks, and a clean handoff so teams can download the finished dataset or evals when complete (MangoDesk homepage; YC page).
They support common AI data types including alignment data, evaluation data, conversational and agent interactions, plus image, video, and audio, with public docs describing formats and workflow guidance (Human Data docs).
Customers can bring their own annotators or use MangoDesk’s vetted pool of domain experts. MangoDesk can run the operational side (sourcing/screening and executing the pipeline), perform QA, and deliver production‑grade outputs that plug into training and evaluation workflows (MangoDesk homepage; YC page).
Who are their target customer(s)
- Research labs fine‑tuning or training models: They need bespoke, repeatable human evaluations and alignment data without rebuilding interfaces, instructions, routing, and QA for every experiment (YC page; Human Data docs).
- Product teams shipping LLM features: They require targeted post‑training datasets and fast, consistent evals to catch real‑world failures; typical labeling cycles are slow and inconsistent (MangoDesk homepage; Human Data docs).
- ML engineers / evaluation owners: They repeatedly build annotation UIs, routing rules, and QA by hand, making benchmarks hard to compare and reproduce across runs (YC page; LinkedIn/YC descriptions).
- Domain‑specific teams (healthcare, finance, legal): They struggle to source screened expert annotators and to write instructions that yield reliable labels in regulated or specialized domains (MangoDesk homepage; YC page).
- Early‑stage AI startups with small ops teams: They lack bandwidth to hire/train annotators and want turnkey delivery of datasets/evals so they can iterate on models quickly (MangoDesk homepage; Human Data docs).
How would they acquire their first 10, 50, and 100 customers
- First 10: Founder‑led outbound to ML leads at labs and YC startups with short, supported pilots where MangoDesk builds the first eval end‑to‑end and charges on delivery; use these to convert into paid runs and harden spec→pipeline quality (MangoDesk homepage; YC page).
- First 50: Publish ready‑to‑use templates and annotated examples (conversational, agent, image, audio) and run targeted outbound + technical demos; convert via low‑friction paid pilots, clear self‑serve pricing, and brief onboarding with a success rep (Human Data docs).
- First 100: Add integrations/partnerships (model hosting, MLOps, dataset marketplaces) and pursue vertical outreach (healthcare, finance, legal) with vetted expert pools and case studies; support with referrals/credits, APIs, and templated SLAs for repeat runs (MangoDesk homepage; Human Data docs).
What is the rough total addressable market
Top-down context:
Analysts size the broad data labeling solutions/services market around $18.6B in 2024, projecting tens of billions by 2030; the AI training dataset market is cited near $2.6B in 2024; and annotation tools around $1.03B in 2023, all with strong growth through 2030 (Grand View Research — data labeling services; Grand View Research — AI training dataset; Grand View Research — data annotation tools; Mordor Intelligence).
Bottom-up calculation:
If 8,000–12,000 active AI product teams and labs globally buy custom evals/post‑training datasets each year, and 20–30% of them need bespoke, expert‑run pipelines, that’s 1,600–3,600 target buyers. At an average $75k–$200k annual spend per buyer (multiple eval runs + datasets), the serviceable market is roughly $120M–$720M today, with headroom as more teams adopt rigorous evals.
Assumptions:
- Population of 8k–12k global teams with material evaluation needs; 20–30% require bespoke pipelines vs. commodity labeling.
- Average annual spend per buyer covers several eval runs/datasets plus QA and expert labor: $75k–$200k.
- Excludes very large, fully in‑house data ops and the lowest‑cost commodity labeling.
Who are some of their notable competitors
- Scale AI: Full‑stack data platform and services (including expert RLHF and evaluations) used by leading labs; overlaps on expert labeling and eval pipelines.
- Surge AI: Expert human labeling for LLMs with emphasis on quality evaluators and complex tasks; close competitor on evals and alignment data.
- Labelbox: Annotation platform and QA workflows with workforce integrations; competes on tooling for building and managing labeling pipelines.
- Sama: Managed data labeling services with QA and enterprise processes; overlaps on outsourced human data production for ML.
- Prolific: Participant recruiting platform for human studies and evaluations; used by AI teams to source vetted annotators/testers for evals.