Kashikoi

Simulation Engine for Benchmarking AI Products

Spring 2025active2025•Website

Artificial IntelligenceDeveloper ToolsGenerative AIMachine Learning

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 3 months ago

What do they actually do

Kashikoi provides a hosted simulation and benchmarking service for AI agents. Teams connect their agents, run multi‑turn, realistic scenarios, and get evaluation reports that highlight failures, edge cases, costs, and optimization opportunities. The public demo content shows dashboards with run counts, success rates, and per‑run cost summaries, and the current flow centers on demos/pilots with custom connectors built for each customer’s stack (homepage, YC profile).

They also ship a focused tool for security teams: an Okta Log Generator that creates realistic, correlated Okta system‑log sequences as downloadable JSON. It supports configurable environments, shows generation limits (e.g., up to ~100 logs per generation and an estimate of ~25 independent user sessions in example runs), and includes a free/Pro model with feature voting for future log types (Okta Log Generator).

Today’s users are early AI product and platform teams that need repeatable multi‑turn agent evaluations, and security/SOC teams that need high‑quality synthetic telemetry to test detection rules without touching sensitive production data. The company is early‑stage (Spring 2025 YC) with live demos, public tools, and a workflow that moves from demo/waitlist to customer integrations (homepage, YC profile, Okta Log Generator).

Who are their target customer(s)

AI product managers shipping conversational features: They rely on manual testing and anecdotal feedback, so important edge cases and regressions slip into production, creating user‑visible failures and rework.
ML engineers and platform teams responsible for agent reliability: They spend excessive time hand‑tuning prompts/models because they lack an automated way to run many realistic multi‑turn tests and track regressions over time.
Security/SOC teams validating detection rules and playbooks: They need realistic, correlated synthetic logs to test detections safely; current test data is limited, unrealistic, or blocked by data‑sensitivity constraints.
Customer support ops deploying multi‑step assistants: They struggle to simulate long, complex conversations and failure modes before rollout, so bots break on real user journeys and require costly manual fixes.
Compliance/QA/risk teams approving agents for production: They lack standardized, auditable verification and continuous testing to show agents meet policy and reliability requirements before rollout.

How would they acquire their first 10, 50, and 100 customers

First 10: Run targeted enterprise pilots via founders’ network and YC intros. Offer time‑boxed pilots with a custom connector and deliver a concrete failure/optimization report to drive conversion (homepage, YC profile).
First 50: Turn early pilots into two public case studies (AI product and SOC/security) and use them plus the live Okta demo as lead magnets for similar teams; offer low‑friction pilot contracts with clear success metrics (Okta Log Generator, homepage).
First 100: Launch a self‑serve Pro tier with credits and common connectors, publish integration docs, and add channel partners (SIEM/observability/consultancies). Pair with developer content, targeted SEO, and referral incentives (homepage, Okta Log Generator).

What is the rough total addressable market

Top-down context:

Kashikoi sits across conversational AI (~$11.6B in 2024, growing to ~$40B+ by 2030), MLOps (multi‑billion and growing), synthetic data (projected to ~$2.1B by ~2028), SIEM/security monitoring (multi‑billion and expanding), and observability (multi‑billion) — together indicating tens of billions in adjacent spend over the next 3–5 years (Grand View Research; Knowledge Sourcing; MarketsandMarkets; BCC Research; Technavio).

Bottom-up calculation:

Illustrative beachhead SAM: assume ~2,500 target teams across Global 2000 and tech mid‑market (AI product/platform + SOC) that will pay specifically for agent simulation/eval and synthetic correlated traces. At $40k average ARR, this implies ~$100M SAM, expandable with additional verticals/connectors.

Assumptions:

Roughly 1–2 qualified buyer teams per target account initially (AI agent owners and/or SOC)
Average annual contract value around $25k–$75k for enterprise‑grade eval/simulation
Early focus on enterprise/upper mid‑market; expansion adds more accounts and upsell through more scenarios/connectors

Who are some of their notable competitors

LangSmith (LangChain): Evaluation, tracing, and dataset tooling for LLM applications; often used to test and monitor agentic workflows.
Humanloop: Platform for LLM app development with evaluation, feedback, and iteration workflows used by product and ML teams.
Giskard: Testing and quality assurance for ML/LLM applications, including evaluation suites and guardrails for enterprise use.
Arize Phoenix: Open‑source observability and evaluation for LLM/RAG systems, used to diagnose failures and measure quality.
Splunk Attack Range (security): Open‑source lab that simulates adversary behavior and generates telemetry for detection engineering; used by SOC teams to validate SIEM detections.