What do they actually do
Mohi makes a debugging and monitoring tool for teams building multi‑step LLM “agents.” Developers add Mohi’s SDK to their agent code so each run is recorded with prompts, tool calls, intermediate outputs, timing, and errors. Mohi then shows these runs as trace views and metrics to help engineers find failure points, spot regressions, and iterate on prompts or logic. The site also mentions AI‑assisted prompt tuning and auto‑optimization features aimed at turning insights into fixes (Mohi).
The product appears to be demo‑first and in private beta: the website emphasizes scheduling a demo rather than self‑serve sign‑ups, and YC lists a small founding team in the S25 batch, consistent with an early pilot stage (Mohi, YC).
Who are their target customer(s)
- Early‑stage startup engineers building multi‑step LLM agents: They struggle to reproduce failures across many intermediate calls and external APIs. They need detailed traces and session views to see exactly where an agent goes wrong (Mohi).
- Platform/ML engineers at mid‑size teams running agents in production: They lack real‑time visibility and alerting to detect regressions, latency spikes, or systemic errors across many runs. They need live metrics and monitoring for agent behavior (Mohi).
- Prompt engineers and LLM experimenters: They spend time on manual A/B runs and can’t easily inspect intermediate outputs. They want traces plus concrete, AI‑assisted suggestions for prompt improvements (Mohi).
- QA and reliability engineers for agent workflows: They can’t easily run reproducible end‑to‑end tests or triage flaky behavior. They need recorded sessions with inputs/outputs, timing, and error signals to pinpoint root causes (Mohi).
- Engineering leads/CTOs deciding pilot → production: They need confidence in observability, access control, and reliability trends before rollout. They want proof of stable behavior over time and operational readiness (Mohi).
How would they acquire their first 10, 50, and 100 customers
- First 10: Leverage YC and founder networks to recruit a handful of agent‑building startups into free, time‑boxed pilots with white‑glove integration and a written failure report; convert successful pilots to paid and secure testimonials/referrals.
- First 50: Publish 2–3 short case studies from early pilots and run targeted outreach (LinkedIn, GitHub/Discord in agent frameworks, invite‑only webinars). Standardize a paid pilot pack with fixed scope and success metrics so one engineer can onboard multiple teams per month.
- First 100: Launch a self‑serve path with clear SDK guides and a low‑friction paid tier for small teams, while 1–2 BD/implementation reps pursue mid‑market accounts. Add partnership integrations with popular agent frameworks and show up at 1–2 developer/ML conferences for steady inbound.
What is the rough total addressable market
Top-down context:
Nearby markets suggest meaningful spend: observability platforms (~$2.4B in 2024), data observability (~$2.14B in 2023), and AI developer tools (~$9.8B in 2025). Broader AI software spend provides long‑term headroom (FMI, Grand View, Statista, ABI Research).
Bottom-up calculation:
Core near‑term niche: agent‑specific observability at ~10%–25% of observability spend → ~$240M–$600M today. With adjacent expansion into AI dev tools and data observability, a realistic SAM is ~5%–10% of ~$11.9B combined → ~$0.6B–$1.2B over the next few years (FMI, Grand View, Statista).
Assumptions:
- Agent observability captures 10%–25% of general observability spend in the near term as agentic apps grow.
- Mohi expands beyond pilots with production features to compete for 5%–10% of adjacent AI tooling budgets.
- Market estimates (2023–2025) are directionally accurate and comparable despite differing methodologies.
Who are some of their notable competitors
- LangSmith (LangChain): Observability and evaluation for LLM apps and agents with tracing, monitoring, alerts, and insights; framework‑agnostic and widely adopted alongside LangChain/LangGraph (LangSmith Observability, Docs).
- Langfuse: Open‑source LLM engineering platform for tracing, evals, prompt management, and metrics; supports OpenTelemetry and popular frameworks and can be self‑hosted (Langfuse, Docs).
- HoneyHive: AI observability and evaluation focused on agents: distributed tracing, experiments/datasets, online evals, and monitoring/alerts; OpenTelemetry‑native with enterprise options (HoneyHive, Docs).
- Helicone: AI gateway and open‑source LLM observability: proxy‑based logging, analytics, cost tracking, experiments, and integrations across providers and frameworks (Helicone, Guides).
- Arize Phoenix (OSS): Open‑source LLM tracing and evaluation with prompt playgrounds, datasets/experiments, and OTEL‑based instrumentation; self‑hostable with broad integrations (Phoenix, Docs).