Baseline AI

AI document creation and data management for clinical trials

Summer 2024active2024•Website

Artificial IntelligenceSaaSHealth TechB2BBiotech

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 5 months ago

What do they actually do

Baseline AI builds an AI agent that automates much of the setup work for clinical trial data systems. It reads a study protocol and generates artifacts teams typically hand‑build: case report forms (CRFs), a clinical database model, edit checks, code to transform and validate source data, and starter analysis code/tables. The workflow is human‑in‑the‑loop: users review specifications and outputs before use, and the company emphasizes HIPAA‑aligned practices and auditability (how it works, homepage, YC profile).

The product is organized around three areas: Build (CRFs/database/edit checks), Harmonize (transform external data into the study data model), and Analyze (generate analysis tables/figures based on the protocol and model). Early usage appears to be pilot engagements with pharma/biotech trial teams; the site features a testimonial from Portal Instruments’ CEO and invites teams to book demos (homepage, YC profile).

Next, the team is working to extend from SDTM‑style dataset production into analysis‑ready datasets (ADaM) and analysis programming, add more data connectors/integrations, and strengthen validation and deployment options suited to regulated customers (how it works – evaluations/tools, Crunchbase – ADaM in development).

Who are their target customer(s)

Clinical data manager: Spends weeks designing CRFs, database schemas, and edit checks and must document everything for audits; manual work increases error risk and slows study startup (aligns with Baseline’s Build workflow and human‑in‑the‑loop review) (how it works).
Clinical programmer: Writes repeatable transform/validation code to convert source data into the study model and fix edge cases; this rework delays downstream analysis and submissions (how it works).
Biostatistician / lead statistician: Needs reliable analysis‑ready datasets and reproducible analysis code on a timeline; data issues and late changes force manual rework and slow report delivery (Baseline’s Analyze scope targets this) (how it works).
Clinical operations / project manager (sponsor or biotech): Trials slip and costs rise when data systems take too long to design/validate across teams and vendors; seeks predictable, auditable setup to keep timelines on track (homepage testimonial & demo invite).
Data integration or IT lead (sponsor/CRO): Must ingest labs/devices/EHR feeds securely and map them to the study model; building/maintaining many connectors is burdensome and must meet security/regulatory controls (how it works – tools, homepage HIPAA note).

How would they acquire their first 10, 50, and 100 customers

First 10: Run high‑touch paid pilots via existing relationships and YC introductions. For each 4–8 week pilot, process a real protocol and deliver CRFs, a database model, transform code, and an audit package, capturing written quotes and a technical success story (YC profile, how it works, homepage).
First 50: Productize pilots into a repeatable 6–8 week “starter pilot” for small biotechs and boutique CROs via targeted outbound and partner referrals, gated by a signed success‑criteria checklist (CRFs delivered, datasets produced, reproducible validation artifacts). Publish 2–3 public case studies to drive inbound (how it works – evaluations/tools, homepage).
First 100: Shift to channel and enterprise motions: partner with EDC vendors and larger CROs so Baseline is a standard option at study startup, offer customer‑hosted/managed deployments, and provide regulatory/evidence packages to shorten procurement. Use accumulated validation and ADaM/analysis capabilities to win multi‑study sponsor contracts (how it works, Crunchbase – ADaM).

What is the rough total addressable market

Top-down context:

Baseline sits inside the eClinical/CDMS stack (EDC/CDMS, data integration, analytics). The global eClinical solutions market is estimated at about $10.5B in 2024 and projected to reach ~$23.7B by 2030, implying sustained double‑digit growth (Yahoo Finance summary of TechSci Research).

Bottom-up calculation:

Rough order‑of‑magnitude: ~72,000 studies were registered globally in 2023 across registries; ~76% of ClinicalTrials.gov registered studies are interventional. If ~40% of interventional trials are industry‑sponsored and relevant to Baseline, that’s ~22k trials/year. At an average $75k per study for Build+Harmonize automation, annual TAM ≈ 22,000 × $75k ≈ $1.65B (WHO/ICTRP reporting of ~72k studies registered in 2023, ClinicalTrials.gov study mix).

Assumptions:

Share of interventional trials (~76%) and industry‑sponsored share (~40%) applicable to Baseline’s scope.
Average price per study of ~$50k–$100k; midpoint $75k used for Build+Harmonize deliverables.
Focus on new annual starts (not backfiled or observational studies); excludes deeper enterprise‑wide licenses.

Who are some of their notable competitors

Medidata Rave EDC (Dassault Systèmes): Widely used EDC/CDMS for sponsors and CROs; a core system Baseline would integrate with or displace parts of in study startup and data standardization.
Veeva Vault CDMS: Cloud CDMS used by biopharma; strong footprint in study build and data management where Baseline’s automation could complement or accelerate workflows.
Oracle Clinical One/Oracle Health Sciences: Oracle’s clinical platform spanning EDC and data management; entrenched in large sponsors and CROs and a frequent integration point.
Certara Pinnacle 21: De facto standard for SDTM/ADaM compliance checks used for regulatory submissions; adjacent to Baseline’s SDTM/ADaM generation and validation workflows.
Formedix: Tools for CRF design/metadata and SDTM automation; overlaps with Baseline’s Build and standardization steps for faster study setup.