What do they actually do
Unsiloed AI provides an API and dashboard that turn unstructured, multimodal documents (PDFs, Word, PowerPoint, images, tables, charts) into structured outputs suitable for downstream search or agents. Core endpoints cover parsing and extraction (text, tables, figures, images), classification, document splitting, and a PDF Editor for safe in‑place edits, with both sync and async job flows documented for production use Unsiloed docs, extraction endpoints, PDF Editor.
They say the product is used in accuracy‑sensitive workflows by large banks and public companies and that they process “millions of pages each week” YC profile. Access is self‑serve with a free tier and paid plans, plus SDKs/examples and a parser library to speed integration Pricing, FAQ, GitHub, PyPI.
Who are their target customer(s)
- Enterprise banking and finance teams: They process research, earnings reports, filings and client docs, and need near‑perfect capture of numbers, tables and clauses; in‑house parsers across varied formats are brittle and slow to maintain YC profile, API docs.
- Corporate legal teams and law firms: They require exact clause extraction, reliable redlining/edits, and preserved PDF layout; mixed file types at scale make manual review and ad‑hoc scripts error‑prone API docs, YC profile.
- Healthcare/clinical data teams: They ingest reports, charts and images and need accurate structured data from multimodal records for compliance and analytics; manual review is costly and inconsistent YC profile, API docs.
- Enterprise data/ML engineering teams: They need a dependable ingestion layer (fetch → parse → embed → store) with throughput, monitoring and SLAs; bespoke pipelines break under volume and slow delivery pipeline examples, FAQ.
- Product/AI teams at startups: They want a simple API and SDKs to convert PDFs/slides/tables into consistent JSON/Markdown to ship search/agent features quickly instead of building fragile parsers Docs, GitHub.
How would they acquire their first 10, 50, and 100 customers
- First 10: Run paid, time‑boxed pilots with banks, legal, and healthcare teams using their real documents; measure extraction error rates against agreed metrics and assign a dedicated engineer in exchange for references if targets are met API docs, YC profile.
- First 50: Lean into self‑serve: promote the free tier and ship ready‑made templates (earnings reports, contracts, claims) plus pipeline examples to prove value in hours; convert trials via one‑click upgrades and short pilot contracts Pricing, pipeline guidance.
- First 100: Add a small field sales team for regulated sectors, certify for required compliance, and sign systems‑integrator partners who bundle Unsiloed as the ingestion layer; standardize SLAs, pilot playbooks, and connectors to popular stores/embeddings FAQ, GitHub.
What is the rough total addressable market
Top-down context:
Broad “Document AI” software is estimated at about USD 14.66B in 2025, growing to ~USD 27.62B by 2030, while the narrower Intelligent Document Processing (IDP) segment is ~USD 2.3B in 2024 with strong growth expected MarketsandMarkets, Grand View Research.
Bottom-up calculation:
Focus on document‑heavy mid‑market and enterprise buyers in finance, legal, and healthcare: assume ~25,000 target organizations globally, 10% near‑term adoption, and an average annual contract of USD 60k for parsing/editing/API usage → ~USD 150M serviceable near‑term opportunity; adding ~2,000 large enterprises at USD 200k ASP would add ~USD 400M potential over time.
Assumptions:
- ~25k relevant mid‑market/enterprise orgs with recurring document‑processing needs in finance/legal/healthcare.
- 10% near‑term adoption for API parsing/editing given incumbent spend and evaluation cycles.
- Average contract sizes of ~USD 60k (mid‑market) and ~USD 200k (large enterprise) reflecting usage, SLAs, and support.
Who are some of their notable competitors
- Google Document AI: Managed document processors on Google Cloud for text, tables, and entity extraction, with strong GCP integrations—often the default for GCP‑standardized teams.
- Amazon Textract: AWS OCR/extraction for PDFs and images with queries/adapters; attractive to teams standardized on AWS and needing elastic scale.
- ABBYY (Vantage / FlexiCapture): Established enterprise OCR/IDP platforms with on‑prem options and complex workflow tooling; chosen for mature controls and legacy integrations.
- Rossum: AI‑first document automation focused on transactional docs (invoices, POs) with ERP/AP integrations; favored for operations automation.
- Hyperscience: Enterprise IDP with human‑in‑the‑loop, orchestration, and auditability; used where accuracy and compliance are paramount, including public sector.