Foundry

Infrastructure to ship real apps with AI - end-to-end in the browser

Fall 2024active2024•Website

Developer ToolsGenerative AIAutomation

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 3 months ago

What do they actually do

Foundry runs isolated, repeatable browser environments so teams can build, test, and score AI agents that operate inside web apps. It offers a hosted simulator and a Python SDK (AWE) that hands an agent an instrumented browser session (via a CDP endpoint), records every action and DOM change, and returns a final state plus an event trace for scoring and debugging (homepage).

Because the environment is controlled and deterministic, runs are reproducible—useful for benchmarking, debugging, and collecting training data for supervised or RL loops. Today the product is in private beta with an evaluation stack, SDK, leaderboards, and “sample dataset—coming soon,” and an apply-for-access flow targeting researchers and ML teams (homepage; benchmarking blog; YC profile).

Who are their target customer(s)

AI researchers and academic labs building web agents: They need repeatable, comparable experiments, but browser runs are flaky and shared benchmarks are scarce, making it hard to measure progress or publish reliable results (benchmarking blog).
ML engineers trying to ship agents into production: Demos that work once often fail in real environments; nondeterministic browser behavior is hard to reproduce and debug, slowing deployment and increasing risk (homepage; YC profile).
Teams that need training data and labels for web agents: Collecting high‑quality, annotated trajectories is time‑consuming and error‑prone without full run capture and consistent environments (homepage).
QA and automation engineers for enterprise web workflows: They can’t safely test bots against production and need isolated, controlled environments with test data to validate auth flows, edge cases, and compliance (homepage; YC profile).
Internal research/benchmarking teams running leaderboards: Maintaining complex web-task benchmarks is costly, and inconsistent environments make results unreliable; they need standardized tasks, scoring, and hosted evaluation (homepage).

How would they acquire their first 10, 50, and 100 customers

First 10: Personally onboard early beta applicants and research labs with 1:1 setup, tailored sample tasks, and free credits; encourage co-published results or leaderboard entries to create visible proof points (homepage; blog).
First 50: Publish high‑quality sample datasets and leaderboards, run small challenges and workshops, and provide templates so labs and builders can self‑serve quickly (homepage).
First 100: Run paid pilots with mid‑size ML/QA teams to document reductions in flaky runs and faster debugging, publish case studies, and use targeted outbound plus referrals; offer a managed data service to convert research users to paying customers (YC profile).

What is the rough total addressable market

Top-down context:

Foundry spans parts of software testing/automation (~$55.8B, 2024), RPA software (~$3.2B, 2023), MLOps (~$2.19B, 2024), and data collection/labeling (~$3.77B, 2024)—a broad ~ $65B opportunity when summed (GMInsights; Gartner RPA; Grand View MLOps; Grand View Labeling).

Bottom-up calculation:

A conservative SAM focuses on browser‑agent evaluation, reproducible simulation, and dataset tooling: ~10% of the web‑testing slice of testing (~$1.7B), ~30% of MLOps (~$0.66B midpoint), ~20% of data labeling (~$0.75B), and ~10% of RPA (~$0.32B) ≈ ~$3.4B today (sources above).

Assumptions:

~30% of software testing is web; ~10% of that requires agent‑grade deterministic simulation and auditing.
~25–35% of MLOps spend relates to agent development/evaluation requiring environment‑level scoring and traces.
~20% of data labeling and ~10% of RPA involve browser‑based automation that benefits from repeatable simulation.

Who are some of their notable competitors

Playwright: Open‑source browser automation for writing tests and scripts. Overlap: teams automate and reproduce interactions. Difference: a library, not a hosted, deterministic simulator with built‑in recording, scoring, datasets, or leaderboards (docs).
BrowserStack: Cloud browsers/devices for cross‑browser testing at scale. Overlap: hosted browsers and replay. Difference: focuses on functional testing, not agent benchmarking, event‑level traces for ML, or RL‑friendly deterministic snapshots and scoring (site).
Browserless: Hosted headless Chrome/CDP endpoints. Overlap: remote CDP URLs to drive sessions. Difference: raw endpoints without reproducible snapshotting, run instrumentation, or dataset/benchmark tooling built in (site).
Cypress: End‑to‑end testing framework with dashboard and time‑travel debugging. Overlap: reproducible runs and rich failure debugging. Difference: targets developer/QA app testing, not ML agent evaluation and training‑data collection (site).
UiPath (and RPA peers): Enterprise RPA platform for scripted automation, deployment, and auditing. Overlap: enterprise automation workflows. Difference: rule‑based RPA vs. Foundry’s focus on learned web agents, benchmarking, trajectory collection, and retraining (UiPath).