Pulse

Production-grade unstructured document extraction

Summer 2024active2024•Website

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 6 months ago

What do they actually do

Pulse runs a hosted service and API that turns messy documents (PDFs, scans, Excel/Word files, slide decks, even handwriting and charts) into consistent, machine‑readable data you can use in search, analytics, or RAG pipelines. You can upload files in their web app to inspect results or call their API to run pipelines programmatically docs quickstart.

The product uses a schema‑first workflow: it normalizes each document into an inspectable intermediate representation (you can see reading order, tables, and bounding boxes), then maps outputs into a target schema so fields stay consistent across many file types. Production features include table/chart/handwriting support, deduplication and chunking for downstream models, and tooling to debug outputs at scale docs approach one billion pages.

Who are their target customer(s)

Investment firms and deal teams reviewing CIMs and diligence packs: Analysts spend days pulling the same financial and KPI fields from PDFs and decks; manual extraction slows deals and risks missed details use case.
Real‑estate analysts and property managers handling rent rolls and leases: Data lives across PDFs, spreadsheets, and scans; teams waste time cleaning and reconciling before analysis use case.
Finance and accounting teams processing invoices, statements, and complex spreadsheets: Manual entry and brittle spreadsheet parsing cause errors and backlogs; they need reliable, consistent field extraction across file formats product focus.
Insurance underwriting and claims operations with scanned, legacy forms: Messy scans, rotated pages, and inconsistent layouts force manual review; they need robust extraction that handles real‑world scans rotation model.
Legal and compliance teams managing contracts and regulatory documents: Key clauses and metadata are buried across PDFs and Word files; without consistent extraction, searches and compliance checks are slow and error‑prone docs.

How would they acquire their first 10, 50, and 100 customers

First 10: Run founder‑led pilots in 8–12 target accounts; ingest a real dataset, define schemas, and show time saved in 2–6 weeks, converting successful pilots into paid deals and case studies quickstart.
First 50: Package templates and onboarding from early wins into 3–5 vertical playbooks; run targeted outbound and short paid pilots with clear SLAs and published results to close similar accounts use cases.
First 100: Productize common templates with self‑serve pricing and SDKs, add integrations with storage/vector stores, and scale through content/webinars and referrals while a small CS team drives repeatable deployments approach.

What is the rough total addressable market

Top-down context:

Conservatively, enterprise Intelligent Document Processing is ~low single‑digit billions (Gartner cites ~$2.09B by 2026) Gartner. Including broader OCR/document capture spend puts the market at ~$10B+ today, with long‑term estimates for IDP and adjacent automation reaching tens of billions Grand View Research Precedence.

Bottom-up calculation:

Focus on document‑heavy buyers across finance/insurance/legal/real estate: ~20–30k global mid‑to‑large organizations adopting 1–2 production pipelines at an average $30k–$80k ARR implies ~$0.6B–$2.4B practical near‑term spend addressable by a platform like Pulse.

Assumptions:

20–30k organizations with recurring document‑extraction needs across target verticals
Average contract value of $30k–$80k ARR for production deployments (schemas, integrations, support)
Estimates exclude very small businesses and include only buyers likely to run ongoing pipelines, not one‑off projects

Who are some of their notable competitors

Google Document AI: Google Cloud’s document‑processing service with prebuilt/custom processors and tight GCP integrations; often chosen by teams standardizing on Google Cloud.
Amazon Textract: AWS’s managed OCR/key‑value/table extraction, used as a building block for document pipelines within AWS‑centric stacks.
Microsoft Azure Document Intelligence (Form Recognizer): Azure’s prebuilt and trainable document models with enterprise integrations; fits buyers standardized on Microsoft/Azure.
ABBYY (Vantage / FlexiCapture): Enterprise IDP suite used in complex, regulated workflows; strong on on‑prem/hybrid deployments and long‑standing automation tooling.
Rossum: Specialist in transactional documents (invoices, POs) with template‑free extraction and validation workflows; common in finance/AP teams.