Cekura logo

Cekura

Voice AI and Chat AI agents: Testing and Observability

Fall 2024active2024Website
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 2 months ago

What do they actually do

Cekura provides testing and observability for conversational agents (voice and chat). Teams connect their bot or voice stack, generate large batches of simulated conversations, and score results on metrics like instruction‑following, empathy, hallucination, latency, and tool usage. In production, it monitors real conversations to flag regressions and failures, so teams can catch issues before users do (site; YC).

It offers a dashboard and APIs to run simulations, review transcripts and logs, set alerts, replay real calls, and embed checks into CI/CD so releases can be gated on conversation‑level tests. It integrates with common voice/chat stacks (e.g., Retell, VAPI, ElevenLabs, LiveKit, Pipecat) and supports agent descriptions to auto‑generate scenarios and evaluators (site; Docs overview; Agents docs).

Cekura focuses on enterprises and startups deploying agents in healthcare, banking/finance, logistics, recruitment, and retail, and reports working with 70+ customers across these sectors (YC; site).

Who are their target customer(s)

  • Product or engineering leads building voice/chat agents: They need confidence that prompt or model changes won’t break critical flows, but lack reliable pre‑release tests and fast feedback on regressions (site; YC).
  • QA and test engineers for conversational UX: Creating realistic multi‑turn test suites (personas, accents, noise, interruptions) is manual and slow, so edge cases slip into production (site).
  • DevOps/CI engineers who gate releases: They can’t easily automate conversation‑level checks in CI or block deploys when core flows fail, leading to rollbacks and incidents (CI/CD docs).
  • Voice/speech engineers integrating ASR/TTS and telephony: Reproducing real‑world audio conditions and measuring latency, tool calls, and failure modes across providers is difficult without specialized simulation and telemetry (site).
  • Compliance, trust, and risk teams in regulated industries: They need monitoring, transcripts, and alerts to detect hallucinations or policy violations and to audit incidents quickly, but lack purpose‑built tools (Agents & monitoring docs).

How would they acquire their first 10, 50, and 100 customers

  • First 10: Run tightly scoped, hands‑on pilots with product/engineering leads at startups and mid‑size teams to integrate Cekura, generate reproducible test suites, and deliver concrete failure reports using YC/partner intros for access (YC; site).
  • First 50: Publish case studies from pilots, hire 1–2 AEs plus a customer‑facing engineer to speed onboarding, and ship integrations/recipes so technical evaluations complete in days (Docs).
  • First 100: Scale via channel partnerships with voice/CCaaS vendors, list CI/CD templates in developer marketplaces for self‑serve, and run targeted outbound into regulated verticals using monitoring/compliance as the entry point (site; Agents docs; raise post).

What is the rough total addressable market

Top-down context:

Conversational AI software/services are projected to reach tens of billions by the late 2020s (IDC cites ~$31.9B by 2028), CCaaS is a multi‑billion category (~$6.2B in 2024), while AI‑enabled testing (~$857M in 2024) and deep observability (low‑hundreds of millions) are growing quickly (IDC; Synergy Research; Fortune Business Insights; Frost & Sullivan via Gigamon).

Bottom-up calculation:

Assuming thousands to low‑tens‑of‑thousands of enterprise teams run conversational agents, 15–30% adopt third‑party testing/observability, and spend ~$20k–$200k per year per program, yields a practical near‑term TAM in the low‑hundreds of millions to a few billion dollars annually. This scales with broader agent adoption and regulatory pressure.

Assumptions:

  • Number of active enterprise conversational agent programs is in the 5,000–20,000 range globally.
  • 15–30% of those programs purchase dedicated third‑party testing/observability vs. in‑house only.
  • Annual spend per program on testing/observability averages ~$20k–$200k depending on complexity and scale.

Who are some of their notable competitors

  • Cyara Botium: Enterprise testing and monitoring for chatbots/voicebots with scripted and AI‑driven scenarios, regression/load testing, and continuous monitoring; overlaps with Cekura on automated testing and observability (product; docs).
  • Bespoken: Voice‑focused end‑to‑end testing and virtual device tooling for Alexa/Google/IVR with test scripts, prerecorded‑audio runs, and monitoring; strong in legacy voice platform simulation (docs).
  • Langfuse: LLM observability, prompt/version management, and evaluations for text/agent workflows; emphasizes tracing and prompt experiments rather than audio/telephony simulations (docs).
  • Observe.AI: Contact‑center conversation intelligence and production agents with post‑interaction QA, real‑time assistance, and compliance monitoring; overlaps on continuous monitoring/regression detection for voice agents (platform overview).
  • TruEra: Model quality and ML monitoring platform focusing on diagnostics, drift detection, and explainability; broader ML scope rather than conversational‑flow audio simulations (product).
Cekura | FYI Combinator