Aluna logo

Aluna

Data for biomedical AI

Summer 2024active2024Website
Artificial Intelligence
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from about 2 months ago

What do they actually do

Aluna builds curated biomedical datasets and evaluation suites for AI teams in healthcare, with a current focus on oncology. Publicly listed products include PathFoundry (a large pathology image resource), Duplex (a multimodal dataset pairing whole‑genome sequencing with clinical data and digitized pathology slides), and OncoBench (an oncology benchmarking suite marked “coming soon”). The PathFoundry page highlights “2M+ whole slide images across 25+ tumor types” with H&E, IHC, and IF stains and diagnoses/annotations from board‑certified pathologists Aluna site. YC’s profile confirms the company’s focus and that they work with health organizations to build curated datasets and evaluations YC profile.

In practice, Aluna appears to partner with hospitals, labs, and similar organizations to collect slides, sequencing, and clinical records; standardize and de‑identify data; obtain expert pathology annotations; and package the result as model‑ready datasets and evaluation/benchmarking bundles (e.g., OncoBench) for training and testing. The website’s contact‑driven flow implies bespoke projects or licensing rather than a self‑serve product, and there are no public customer names or pricing disclosed Aluna site YC profile.

Who are their target customer(s)

  • Pharma and biotech ML teams building oncology models: They need well‑curated, multimodal clinical datasets with expert pathology labels, but face fragmented data, inconsistent annotations, and long lead times to assemble training data. They also need standardized test sets to compare model performance reliably Aluna site YC profile.
  • Hospital pathology and research departments with slides and clinical records: They want a safe, governed way to turn their data into usable AI datasets but often lack time, annotation capacity, and legal/technical processes for de‑identification and standardization. Outsourcing data packaging and expert annotations is more practical than doing it in‑house Aluna site.
  • Genomics and diagnostic labs building multimodal products: They need linked whole‑genome sequencing plus slide and clinical data for models, but pairing, harmonizing, and labeling those modalities is expensive and technically difficult. Validated, ready‑to‑use multimodal datasets remove a bottleneck for development and validation Aluna site – Duplex.
  • AI startups and academic labs that must benchmark new algorithms: They lack widely accepted, high‑quality oncology benchmarks to prove methods on realistic clinical data, which slows publishing, fundraising, and partnerships. A common evaluation suite reduces ambiguity about claims and speeds adoption Aluna site – OncoBench (coming soon).
  • Regulatory/validation teams and CROs running model evaluations: They need datasets with board‑certified annotations and clear provenance for validation and submission, but sourcing reproducible, governance‑ready data is time‑consuming and costly. Curated, audit‑friendly datasets cut time and legal complexity Aluna site – PathFoundry.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Offer 6–8 week discounted pilots with hospital pathology departments, genomics/diagnostic labs, and a small number of pharma ML teams to deliver a governance‑ready dataset plus a pathologist‑annotated subset of PathFoundry or Duplex, sourced via founder networks and YC intros; convert pilots into public case studies/validation reports.
  • First 50: Productize the pilot into a fixed‑scope “dataset build” package with standard contracts/IRB checklists and use OncoBench early access as a sweetener. Add referral channels via CROs, digital‑pathology vendors, and sequencing labs; prioritize paid case studies and brief validation summaries for pharma buyers.
  • First 100: Hire a commercial lead and AEs to close larger licenses, standardize terms, and add a self‑serve option for non‑sensitive benchmark datasets. Publish OncoBench leaderboards/reports, attend oncology/pathology conferences, and integrate with model‑validation vendors to drive inbound demand.

What is the rough total addressable market

Top-down context:

Adjacent markets suggest meaningful spend on oncology data and digital pathology: digital pathology is estimated at about $1.15B in 2024 and projected to grow to ~$3.86B by 2032, while oncology real‑world evidence solutions are estimated around $893M in 2025 with double‑digit CAGR Fortune Business Insights Meticulous Research.

Bottom-up calculation:

Assuming 250 likely buyers worldwide (top pharma/biotech oncology teams, diagnostics/genomics labs, major academic centers) each spending ~$150k annually on curated oncology datasets/benchmarks (~$37.5M), plus 50 buyers purchasing higher‑priced multimodal packages at ~$400k (~$20M), the oncology‑only initial TAM is roughly $55–60M; expanding beyond oncology and adding evaluation subscriptions could increase this materially.

Assumptions:

  • Oncology‑only scope; excludes other disease areas.
  • Average annual license of ~$150k for standard curated datasets/benchmarks; ~$400k for larger multimodal packages.
  • Buyer count approximates global pharma/biotech oncology programs, major labs, and leading medical centers.

Who are some of their notable competitors

  • Tempus: Tempus assembles and licenses linked clinico‑genomic and digitized pathology data and offers services to life sciences; it also operates digital pathology offerings, competing on multimodal oncology datasets and real‑world data access Tempus – Life Sciences Tempus – Digital Pathology.
  • PathAI: PathAI provides AI‑derived, standardized pathology features and curated real‑world datasets, licensing structured pathology data integrated with clinical and molecular data across oncology indications PathAI – RWD PathExplore.
  • Paige: Paige maintains a very large digitized slide corpus and foundation models and licenses AI technology and data to partners, competing on slide datasets, multimodal linkages, and model‑ready oncology data Paige – About.
  • Proscia: Proscia’s Concentriq platform and real‑world data offerings provide de‑identified WSIs matched with clinical and genomic data at scale, delivering standardized, research‑ready datasets to life sciences customers Proscia – RWD.
  • Owkin: Owkin curates deep, multimodal datasets via academic partnerships and privacy‑preserving/federated approaches; its MOSAIC program builds a large spatial‑omics atlas, competing on governed, multi‑institution oncology data and multimodal depth Owkin – Federated Network MOSAIC.