Sylvian logo

Sylvian

Data for LLMs through Competition

Fall 2025active2025Website
Artificial IntelligenceReinforcement LearningB2B
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 27 days ago

What do they actually do

Sylvian runs competitions to capture expert, step‑by‑step tool use in environments like VS Code and Excel, then packages those logs as training datasets and benchmarks for LLMs that need to operate real tools. They offer contest tooling (including a VS Code extension) to record actions and produce auditable traces and benchmarks that reflect how experts actually work (YC launch, VS Code extension).

The company recruits skilled contributors via leaderboards and prize pools, verifies submissions, and delivers reproducible datasets for teams training agentic or domain‑specific models where process fidelity matters (YC launch).

Who are their target customer(s)

  • AI model training teams at startups and labs: They need realistic examples of expert tool use (not just text outputs). Existing vendors struggle to attract and motivate top experts, and lack stepwise action data; Sylvian collects this via competitions and logging (YC launch).
  • Product teams building LLM agents that operate tools (e.g., Excel automation, code assistants): Models fail on multi‑step workflows and make costly mistakes because training data lacks fine‑grained actions; Sylvian focuses on tool environments like VS Code and Excel to capture those steps (YC launch).
  • Domain teams in finance, quant trading, and data science: Generic text datasets miss domain‑specific procedures, and auditing is hard where mistakes have financial or compliance costs; Sylvian benchmarks against expert tool‑use to show the gap (YC launch).
  • Data‑procurement and vendor management teams: Recruiting, verifying, and retaining top‑tier contributors is expensive. Pay‑per‑task models skew to lower quality; competitions and leaderboards attract higher‑skill contributors (YC launch).
  • ML research and benchmarking groups: They lack standardized, high‑skill, reproducible tool‑use datasets; Sylvian publishes benchmarks and contest tooling (e.g., a VS Code logging extension) to create auditable datasets (YC launch, VS Code extension).

How would they acquire their first 10, 50, and 100 customers

  • First 10: Run invite‑only paid pilots with YC startups, AI labs, and accessible finance/data science teams: host a short contest, capture tool‑use logs with the VS Code/Excel tooling, deliver a small labeled dataset and benchmark in exchange for feedback and shareable anonymized examples.
  • First 50: Launch public contests with modest prizes, publish clear docs and examples, list the VS Code extension and tooling where developers look, and recruit experts from niche communities; convert participants via a fixed‑scope, fixed‑price pilot package.
  • First 100: Offer a self‑serve “contest as a service” for small teams, formalize a sales playbook and pricing for mid‑sized customers, and sign channel partnerships with data vendors and ML consultancies to resell or run contests; use case studies and benchmarks to shorten cycles and support enterprise SLAs.

What is the rough total addressable market

Top-down context:

Direct buyers sit within the data collection/labeling and AI training‑dataset markets, estimated at roughly $3.77B in 2024 and projected to reach several billions by 2028, respectively (Grand View Research, BCC Research). Adoption of enterprise LLMs is a multi‑billion market today and projected to grow to tens of billions within five years, expanding demand for specialized tool‑use data (TBRC).

Bottom-up calculation:

Estimate 2,000–3,000 global teams (AI labs, LLM startups, and enterprise LLM groups) with need for expert tool‑use datasets, averaging $0.5–1.5M annual spend on specialized data/benchmarks, implying roughly $1–4.5B core TAM today; growth in agentic LLM adoption could expand this materially toward the high single‑digit billions by late decade, consistent with top‑down forecasts (Grand View, BCC, TBRC).

Assumptions:

  • Number of active buyers for specialized tool‑use datasets is 2,000–3,000 globally across labs, startups, and enterprises.
  • Average annual contract value for high‑skill, audited tool‑use datasets and benchmarks is $0.5–1.5M.
  • Agentic LLM adoption increases both the number of buyers and average spend over the next 3–5 years.

Who are some of their notable competitors

  • Scale AI: Large data provider for AI training, including labeling, synthetic data, and evaluations; strong enterprise reach but not primarily focused on fine‑grained expert tool‑use logs.
  • Surge AI: High‑quality annotation with expert contributors and evaluation tooling; adjacent on expert data collection though not centered on competition‑driven tool‑use traces.
  • Appen: Established data‑labeling vendor with broad workforce; widely used for annotation but optimized for scale rather than expert, auditable tool‑use workflows.
  • Kaggle (Google): Competition platform with a large expert community that can be mobilized for tasks and benchmarks; competitions are core, but it does not natively capture stepwise tool‑use logs.
  • AIcrowd: Platform for hosting AI challenges across domains; can recruit skilled participants, though it is not focused on producing standardized tool‑use datasets for LLM training.