What do they actually do
Zep AI provides a hosted memory and context-engineering service for AI agents. Developers send conversation turns and business events (like orders or profile updates) to Zep via SDKs or API. Zep stores this information as a temporal knowledge graph and, on each new user message, returns a compact block of the most relevant context for the LLM to use so agents respond more accurately and personally (Quickstart, Concepts). The service exposes a dashboard/playground, Python/TypeScript/Go SDKs, and an API for ingestion and retrieval, with documented sub-200ms retrieval for assembled context calls in typical flows (Quickstart latency).
Zep also publishes Graphiti, the temporal knowledge-graph framework that underpins its approach, along with examples and integrations to help teams adopt the pattern in production (Graphiti, GitHub). For larger customers, Zep offers enterprise features such as SOC 2, HIPAA BAA, bring-your-own-key/cloud, and dedicated deployments, with a tiered path from development to production (Pricing/Enterprise).
Who are their target customer(s)
- Developers building personalized chatbots and assistants: Agents forget prior interactions and require custom pipelines to gather chat history and business data for each turn, slowing development and degrading response quality (Quickstart).
- Product teams at startups prototyping stateful agents: They need faster ways to capture events, summarize user state, and tune retention without building memory stacks from scratch, which delays pilots and iteration.
- Customer support and customer success teams using AI for replies: Assistants give inconsistent answers when they can’t combine chat history with orders, profiles, and other events; teams need reliable context assembly across data sources (Cookbook).
- ML engineers and backend teams running production agents: Operating a homegrown memory service creates latency, retention-policy, and security burdens; they prefer a low-latency, scalable hosted API/SDK (Quickstart).
- Platforms or engineering teams building multi-step, stateful workflows: They must track facts and their changes over time so agents don’t act on stale information; a temporal-graph approach helps manage versions and temporal relationships (Graphiti).
How would they acquire their first 10, 50, and 100 customers
- First 10: Target developer-led startups and accelerator teams with a runnable demo repo, direct outreach in YC/GitHub/dev communities, free credits, and a 1–2 hour guided integration session to get a pilot live quickly.
- First 50: Publish focused cookbooks and end-to-end examples (support, onboarding, sales), record walkthroughs and host office hours, and ship connectors to popular agent frameworks; convert pilots into two public case studies and add referral credits.
- First 100: Keep self-serve for SMBs and add a light enterprise motion (SDR + paid pilots) for mid-market teams needing SLAs, SOC 2/HIPAA, BYOC/BYOK; turn pilots into contracts and expand via marketplace listings and platform partnerships.
What is the rough total addressable market
Top-down context:
By 2030, conversational AI is projected at about $41.4B and vector/embedding databases at about $7.34B, a combined $48.73B pool that a memory/context service can sell into (Grand View Research: Conversational AI, Grand View Research: Vector DB). Allocating 5–15% of this spend to a dedicated memory layer implies ~$2.44B–$7.31B TAM, with a mid-case near ~$4.9B.
Bottom-up calculation:
Assume 75k–120k organizations globally adopt managed memory by 2030 (a 5–8% attach rate among companies deploying LLM assistants). With a blended ARPA of $20k–$40k/year (mix of SMB usage fees and enterprise SLAs/controls), TAM ranges from ~$1.5B to ~$4.8B. Higher attach (≈10%) and enterprise-heavy mix (~$50k ARPA) would push TAM toward ~$6B, broadly consistent with the top-down mid-case.
Assumptions:
- 5–10% of companies deploying AI assistants choose a dedicated managed memory layer over in-house builds.
- Blended ARPA spans $20k–$50k/year reflecting usage-based fees plus compliance/SLAs for larger customers.
- TAM refers to 2030 steady-state; near-term (next 1–3 years) will be smaller as adoption ramps.
Who are some of their notable competitors
- LlamaIndex: Framework for retrieval, indexing, and agent memory with a growing ecosystem and managed services; often used as an alternative to building a separate memory layer.
- LangChain: Widely used agent/RAG framework with memory components and LangGraph; a common DIY path for teams to assemble their own context pipelines.
- Vectara: Hosted RAG platform offering semantic search, summarization, and connectors; provides an end-to-end retrieval layer that can substitute for a separate memory service.
- Pinecone: Managed vector database focused on low-latency, scalable retrieval; frequently used as the backbone for agent memory and context recall.
- Weaviate: Open-source vector database with a managed cloud offering; popular for search and RAG use cases that overlap with agent memory workloads.