Lemma

Continuous learning for AI agents

Fall 2025active2025•Website

Artificial IntelligenceDeveloper ToolsB2BInfrastructureAI

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 3 months ago

What do they actually do

Lemma provides a hosted tool that connects to your production AI agents, watches real user traffic and outcomes, flags failures or drift, and pinpoints the exact step that broke. It then runs structured experiments (e.g., prompt/template variants), analyzes results, and proposes prompt changes you can apply via API or have Lemma open as a pull request in your repo https://www.uselemma.ai/ https://www.ycombinator.com/companies/uselemma.

The product is publicly available with demos and a free trial, and it’s being marketed to engineering teams shipping customer‑facing AI features. Lemma is listed as a YC Fall 2025 company and appears to be in early commercial rollout focused on pilots rather than broad self‑serve plans https://www.uselemma.ai/ https://www.ycombinator.com/companies/uselemma.

Who are their target customer(s)

Engineering lead at an AI-native startup running customer-facing agents: Agents break in production and it’s time‑consuming to locate which prompt or step failed; this drags engineers away from core product work https://www.uselemma.ai/ https://www.ycombinator.com/companies/uselemma.
ML or prompt engineer responsible for reliability: Lacks a repeatable way to surface failing cases, run controlled experiments, and validate improvements; current prompt tuning is ad hoc and slow https://www.uselemma.ai/.
Product manager for an AI feature: Can’t easily see which failures matter most to users or quantify impact from model/prompt changes, making prioritization and ROI proof hard https://www.uselemma.ai/.
Customer-support or bot-ops lead running AI assistants at scale: Worried about incorrect or unsafe responses; needs a low‑risk path to test and roll out fixes (including code changes) quickly https://www.uselemma.ai/.
Security/compliance or platform ops at larger companies: Requires auditability, approvals, and controls before any automated prompt or code changes; wary of tools that modify production behavior without governance https://www.uselemma.ai/.

How would they acquire their first 10, 50, and 100 customers

First 10: Run high‑touch pilots with YC and inbound demo/trial leads; Lemma’s team handles onboarding and iterates until measurable reductions in agent failures are shown, delivering changes via API or PR https://www.uselemma.ai/ https://www.ycombinator.com/companies/uselemma.
First 50: Turn pilots into case studies and references; do targeted outbound to engineering/ML leads, developer meetups, and a guided self‑serve trial emphasizing GitHub PR/API integration to reduce adoption friction https://www.uselemma.ai/.
First 100: Add enterprise controls (audit trails, approvals, RBAC) and formal SLAs; build integrations and partner channels (LLM platforms, CI/CD, observability) to land mid‑market/enterprise using published before/after metrics https://www.uselemma.ai/.

What is the rough total addressable market

Top-down context:

Lemma sits in AI/ML monitoring, observability, and continuous‑improvement tooling. Public MLOps estimates range from low billions today to mid‑tens of billions by 2030, suggesting a narrow TAM of roughly $2–$6B and an expanded TAM up to the low‑tens of billions as scopes broaden Fortune Business Insights Grand View Research. Broader observability and generative‑AI software markets are larger but overlap with MLOps and should not be double‑counted MarketsandMarkets Grand View Research.

Bottom-up calculation:

Example framing: 40k–120k teams globally running production LLM agents over the next 5–7 years, with an average annual spend of ~$50k on monitoring/experimentation and continuous learning, implies ~$2B–$6B in annual spend addressable by vendors like Lemma.

Assumptions:

Tens of thousands of teams will operate production LLM agents (not just pilots) over a 5–7 year horizon.
Average annual contract value for reliability/experimentation tooling is ~$25k–$100k, centered near $50k for mid‑market.
Adoption is primarily among AI‑native startups and enterprise teams with governance needs; overlapping budgets with MLOps/observability are not double‑counted.

Who are some of their notable competitors

LangSmith (LangChain): Tracing, evaluation, and monitoring for LLM apps; helps teams debug and improve prompts/agents in development and production.
HoneyHive: LLM evaluation, prompt experimentation, and monitoring platform aimed at shipping reliable AI features faster.
Humanloop: Prompt management and evaluation workflows with experiment tracking to improve LLM app quality.
Arize AI (Phoenix): Model/LLM observability and evaluation; open‑source Phoenix plus enterprise tooling for monitoring and drift detection.
Vellum AI: Prompt registry, testing, and workflow tools for teams building LLM applications; supports experiments and review before deployment.