Butter logo

Butter

Muscle Memory Cache for Agents

Winter 2025active2025Website
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 8 days ago

What do they actually do

Butter runs an OpenAI‑compatible API proxy between your code and the model provider. On first run, it forwards your chat/completions request to the model and records the full request→tool→response trajectory. On later runs that match a stored template, it serves the response from cache immediately instead of calling the model. The cache is template‑aware: you can mark variable parts of prompts (like names, dates, IDs) so one stored trajectory can serve many similar runs rather than only exact duplicates (docs, quickstart, bindings).

Early users are developer teams building repeatable, non‑interactive agents (e.g., back‑office automations, desktop/computer‑use tasks). Butter explicitly recommends it for “replayable” flows and not creative, one‑off chat use cases. The proxy is currently flagged as experimental in the docs (who it’s for, docs).

Today this buys you: more deterministic behavior for repeatable runs, lower latency on cache hits, and lower model spend for repetitive workloads. The team documents current limitations with matching and variable delimiting and is focused on making the cache smarter and more robust (e.g., auto‑inferring variables, better fuzzy matching, small deterministic transformers) (template‑aware caching).

Who are their target customer(s)

  • Developer teams building repeatable agent automations: They run the same sequence many times with different inputs, but small output variations and non‑determinism break the workflow. Repeated model calls also add latency and cost.
  • Operations or business‑process owners (finance, HR, customer ops): They need predictable, auditable outputs so automations can run hands‑off. Current AI outputs vary enough to force manual checks or slow audits.
  • RPA/desktop‑automation engineers: Scripted interactions fail when AI outputs change wording or formatting. Calling the model on every step makes end‑to‑end runs slow and expensive.
  • Platform/infrastructure teams running many agents: They face high, spiky model spend and throughput/latency issues when fleets of agents repeatedly ask models for routine responses.
  • QA, audit, and compliance owners: Variable AI outputs and poor reproducibility make it hard to test, certify, and investigate automated processes after failures.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Founder‑led, hands‑on pilots with engineer‑to‑engineer outreach. Use YC/GitHub networks to find teams with repeatable agents, run time‑boxed free pilots, and show cache‑hit rates, latency drops, and cost savings; fix bugs found in these pilots.
  • First 50: Lean into developer channels and templates. Publish short how‑tos for popular agent frameworks, post in targeted communities (HN/Reddit/GitHub issues), open Slack/Discord, and enable self‑serve signup with starter templates; convert successful trials to paid.
  • First 100: Add partnerships and a light commercial motion. Partner with RPA vendors/consultancies, list in marketplaces, and run 30–60 day paid pilots for ops/platform teams. Publish brief case studies and reproducible benchmarks for determinism and scale.

What is the rough total addressable market

Top-down context:

Butter sits at the intersection of enterprise LLM/inference spend and workflow/RPA automation software. Recent 2024 estimates put enterprise LLM spend around $6.7B–$8.4B and RPA around $18.2B, implying a combined space on the order of ~$25–26B today (GM Insights, Menlo Ventures press summary, Fortune Business Insights).

Bottom-up calculation:

As a pragmatic near‑term SAM, assume ~8,000 enterprises run repeatable LLM‑driven automations with 1 team each adopting a caching proxy at an average $40k ARR. That yields roughly ~$320M in immediately serviceable demand within the larger TAM.

Assumptions:

  • ~8,000 enterprises globally operate repeatable LLM automations in 2025.
  • 1 adopting team per enterprise for this use case in the near term.
  • Average ARR per team for a caching/proxy product ≈ $40k.

Who are some of their notable competitors

  • GPTCache: Open‑source library to cache LLM responses (semantic/embedding and keyword strategies). Appeals to teams that prefer in‑app caching over a managed proxy.
  • Helicone: LLM API proxy for logging, usage tracking, and ops. Adjacent to Butter for teams that want observability and cost controls at the API layer.
  • Langfuse: Open‑source LLM observability and analytics. Competes for budget/attention in LLM ops stacks where teams instrument prompts, traces, and outcomes.
  • Vellum: Prompt/workflow management and evaluation for production LLM apps. Overlaps where customers want reliable, versioned flows and guardrails rather than ad‑hoc prompts.
  • UiPath (GenAI features): Leading RPA platform adding GenAI to automations. Not a caching proxy, but a powerful alternative path for enterprises aiming for deterministic, large‑scale automations.