What do they actually do
Delineate runs a service-plus-software pipeline that ingests large volumes of biopharma documents (papers, patents, figures/plots) and converts them into cleaned, standardized datasets scientists can use for clinical development decisions. They combine custom LLMs and computer vision with human quality control to extract items like numeric values from figures, dosing schedules, endpoints, and patient characteristics, then normalize units and labels across studies for cross-study analysis (YC company page; YC/LinkedIn post).
Today they deliver project-specific datasets and analyses (demo referenced on their YC page) rather than a self-serve SaaS product. They report active engagements with top-10 pharma, increased processed-study throughput versus industry norms, and review timelines shortened from months to weeks, with human-in-the-loop QC for defensibility (YC company page).
Who are their target customer(s)
- Heads of Clinical Development at large pharma: They need fast, defensible syntheses of published evidence to pick doses, endpoints, and sample sizes, but manual reviews take months and miss numeric details buried in figures and tables (YC company page).
- Protocol authors and biostatisticians inside pharma/CROs: They must convert heterogeneous trial reports into consistent, analysis-ready datasets; extracting numbers from plots and dosing schedules is slow and error-prone (YC/LinkedIn post).
- Evidence-synthesis / systematic-review teams in R&D or medical affairs: Aggregating all relevant studies is costly and often out of date due to inconsistent reporting and dispersed literature. They need cleaned, normalized datasets to finish reviews faster (YC company page).
- Heads of Translational Medicine / small-biotech CMOs: They need rapid, high-confidence go/no-go answers but lack bandwidth for exhaustive, standardized evidence extractions. Faster, scalable reviews de-risk decisions (YC company page).
- CRO project managers and regulatory strategy teams: They require traceable, audit-quality datasets and clear rationales for protocol choices, but assembling reproducible evidence packages is labor intensive. They benefit from automated extraction plus human QC outputs (YC/LinkedIn post).
How would they acquire their first 10, 50, and 100 customers
- First 10: Run paid, time-boxed pilots with heads of clinical development at top pharma, delivering a vetted dataset and a short recommendation memo with explicit acceptance criteria (speed, QC, traceability) (YC company page).
- First 50: Package repeated pilot work into standard “protocol evidence” templates (dose selection, endpoint choice, sample-size inputs) and sell as repeatable projects to additional pharma teams and top CROs, using pilot case studies for proof (YC company page).
- First 100: Open channels into mid-sized biotechs and CROs via partnerships and a lighter onboarding for standardized packages; publish validated case studies and emphasize human-in-the-loop QC to address procurement and regulatory needs (YC company page; YC/LinkedIn post).
What is the rough total addressable market
Top-down context:
The broader ecosystem Delineate touches combines CRO/clinical-development services and internal pharma R&D budgets. Recent estimates put CRO services around ~$65–$82B in 2024 and pharma R&D spend around ~$120–145B annually (Precedence Research; Proclinical/MarketsandMarkets; OECD; Deloitte).
Bottom-up calculation:
Delineate’s immediate slice—evidence extraction, evidence synthesis, and protocol-design support—is a small portion of CRO spend. Assuming 1–5% of the CRO market maps to these tasks yields ~$0.6B–$4B SAM (1% of $65B to 5% of $82B), with upside if they expand into continuous surveillance and protocol drafting (Precedence Research; Proclinical/MarketsandMarkets).
Assumptions:
- Most CRO revenue is execution-heavy; evidence/design tasks are a smaller but strategic share (estimated 1–5%).
- Enterprise adoption will remain human-in-the-loop near term, limiting short-run capture vs. total need.
- If agents mature to support protocol drafting and surveillance, the addressable share within CRO/R&D budgets increases.
Who are some of their notable competitors
- DistillerSR: Enterprise software for systematic reviews that automates screening, data extraction, normalization, and audit trails. Overlaps on turning literature into structured, audit-ready datasets but is primarily a configurable SaaS platform rather than a high-touch ML+human extraction service.
- Trials.ai (ZS): Mines past trials and documents to recommend protocol elements and optimize design—directly relevant to Delineate’s roadmap of moving from evidence extraction to automated protocol suggestions.
- Medidata (Dassault Systèmes): Incumbent clinical-trial platform with tools for protocol design/optimization, data standardization, and AI-driven planning—competes where buyers want integrated protocol generation linked to downstream EDC/analytics.
- Pentavere (DARWEN): AI engine that extracts structured clinical variables from unstructured medical records to produce RWE datasets. Overlaps on automated extraction for pharma, though it emphasizes EHR/chart abstraction rather than published literature and figures (validation example).
- Rayyan: Widely used tool for screening and managing systematic reviews; overlaps in early literature workflows but not in specialized numeric-from-figures extraction, normalization, or enterprise handoff service.