What do they actually do
Panels sells custom conversational audio datasets to teams training or evaluating voice models. They recruit vetted contributors to record multi‑speaker conversations and deliver richly labeled outputs, including transcripts and speaker diarization, suitable for training and benchmarking voice/ASR systems (website, YC profile, LinkedIn launch).
Access is sales‑led: customers scope requirements via a call, Panels runs controlled recordings, and returns the dataset to spec rather than offering a self‑serve marketplace or dashboard (website, YC profile).
Who are their target customer(s)
- Foundation‑model and voice‑research labs: They need realistic, multi‑speaker conversational audio with reliable transcripts and diarization; public corpora often skew single‑speaker, are noisy, or lack consistent speaker segmentation (website, YC profile).
- Startups building voice assistants or conversational products: They must fine‑tune and test on languages, accents, and scenarios that match their users, but recruiting, recording, and labeling in‑house is slow and expensive; they prefer fast, scenario‑driven datasets delivered to spec (website).
- ASR, diarization, and benchmarking teams: They require ground‑truth, speaker‑segmented transcripts with timestamps and metadata to measure accuracy and debug failures; many public datasets lack sufficient scale, diversity, or labeling detail (LinkedIn launch, YC profile).
- Localization and multilingual product teams: They need conversational recordings across many languages/dialects and demographics; sourcing representative, natural dialogue at scale is difficult without a managed contributor pool (LinkedIn launch, website).
- Safety, fairness, and compliance teams: They must audit models for bias, accessibility, and consent across demographics; assembling diverse, consented, well‑labeled test sets is legally complex and resource‑intensive (YC profile, website).
How would they acquire their first 10, 50, and 100 customers
- First 10: Use YC/founder warm intros to hand‑sell short, paid pilots to foundation‑model and voice research teams; convert pilots into multi‑dataset agreements by proving labeling quality and diarization accuracy (YC profile, website).
- First 50: Publish anonymized case studies and leverage referrals; run targeted outbound to ASR/diarization leads, localization owners, and voice‑assistant startups with sample datasets and direct founder outreach (website, LinkedIn launch).
- First 100: Productize common SKUs and scenario templates with a simple order portal and limited self‑serve trials; support with content (evaluation guides, benchmarks) and partnerships to drive lower‑touch inbound (website – Simulations).
What is the rough total addressable market
Top-down context:
Top‑down, the audio & speech slice of the AI training‑dataset market is estimated at roughly USD 0.8–1.0B in 2024; the broader AI training‑dataset market is ~USD 2.6–3.2B and the downstream speech/voice recognition market is ~USD 15B+ (Market.us summaries, Market.us report, Grand View Research, Fortune Business Insights).
Bottom-up calculation:
Assume 600–1,000 active buyers globally (foundation‑model labs, ASR vendors, voice‑AI startups, and large enterprises) each purchasing USD 0.8–1.2M of conversational audio per year (new data + refreshes), implying a USD ~0.5–1.2B market—consistent with top‑down estimates.
Assumptions:
- 600–1,000 organizations buy external conversational audio annually (labs, ASR providers, voice product companies, and select enterprises).
- Average external spend per buyer on managed conversational audio datasets is ~USD 0.8–1.2M/year (mix of new data and evaluation sets).
- Purchases recur annually due to model refreshes, new languages, and evaluation needs.
Who are some of their notable competitors
- David AI: Research‑driven audio‑data company building large, curated conversational/multi‑speaker datasets for labs and enterprises; overlaps with Panels on bespoke, diversity‑focused collections.
- Appen: Large data collection and annotation provider offering off‑the‑shelf and custom speech datasets plus transcription/labeling pipelines; competes on scale and enterprise reach.
- Shaip: Sells ready‑made and custom speech/audio datasets and transcription services across many languages; a common alternative for ASR/benchmarking data needs.
- Magic Data Tech: AI‑data marketplace and annotation platform listing multi‑speaker/conversational audio datasets with DataOps tooling; competes via prebuilt catalogs vs. bespoke recording.
- Twine: Global freelance/contract platform that provides audio dataset collection and transcription; competes by offering rapid access to large participant pools for data collection.