Spongecake logo

Spongecake

Helping developers build computer use agents

Summer 2024active2024Website
Generative AIB2BAI
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 29 days ago

What do they actually do

Spongecake is an open‑source developer toolkit that spins up a small Linux desktop inside Docker and lets you drive it with an LLM “computer‑use” model. It exposes a VNC view you can watch and an SDK/API to send clicks, keypresses, screenshots, and shell commands. The code, examples, and PyPI package are public and MIT‑licensed (no hosted SaaS) GitHub README, PyPI.

A typical flow is: install from PyPI or clone the repo, run the setup, start the frontend and backend, create a Desktop() in Python and call start() to launch the containerized Xfce desktop. You then connect your OpenAI Computer Use API key and call action(...) to let the model operate the UI; the SDK returns statuses like COMPLETE, NEEDS_INPUT, NEEDS_SECURITY_CHECK, or ERROR so your script can proceed or pause. The repo includes runnable examples (LinkedIn prospecting, Amazon shopping, data entry) to show concrete patterns GitHub README, PyPI.

Today it runs locally on Linux via Docker and is not a cloud product; Windows/macOS support and browser‑only modes are on the roadmap. Like other LLM‑driven UI automation, it can be slow or flaky on scrolling/targeting; the project ships DOM extraction helpers to mitigate some of this, but model behavior is a real constraint. Telemetry is anonymous and can be disabled via env var GitHub README, Show HN discussion.

Who are their target customer(s)

  • Developers building "computer‑use" agents or prototyping automation: They need a reproducible desktop they can programmatically control with an LLM without wiring brittle browser scripts or managing full VMs. They also want a simple action API to handle clicks/keypresses/screenshots and interactive prompts GitHub README.
  • Growth hackers and freelancers automating repetitive web tasks: They often work on sites with no API or with fragile selectors. They want a tool that mimics real user interactions across sites without building one‑off bots each time; examples like LinkedIn/Amazon help them start quickly GitHub examples.
  • QA and product testers needing repeatable GUI test environments: They want consistent, isolated desktops for end‑to‑end flows without provisioning heavy VMs or fighting environment drift. A Dockerized Linux GUI makes runs more predictable across tests PyPI.
  • ML researchers/engineers iterating on LLM UI agents: They need an observable sandbox that emits screenshots, DOM extracts, and action logs to study model behavior and failure modes, with easy integration to different computer‑use models PyPI.
  • Internal tools engineers automating legacy desktop apps: They need to embed automation inside larger scripts, surface safety checks or human approvals, and run multiple jobs in parallel without disturbing other services. Clear action statuses and local deployment help with control and integration GitHub README.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Personally onboard early repo engagers (HN commenters, GitHub stargazers, YC contacts) via 1:1 sessions, get one real workflow running per user, and turn each into a short case study or quote with incentives like priority fixes and a custom example.
  • First 50: Ship 3–5 runnable templates with one‑click Docker setups and short screencasts, post them to targeted dev communities (HN, Reddit, dev.to, relevant Slack/Discord), and run two public demos plus weekly office hours to convert users within a 10‑minute onboarding window.
  • First 100: Publish integration guides for major LLM/tooling ecosystems (e.g., OpenAI examples, LangChain patterns), list on developer marketplaces/Docker Hub, add a “who we help” page pointing each persona to templates, and offer fixed‑price onboarding/training while sponsoring one targeted meetup/newsletter.

What is the rough total addressable market

Top-down context:

Spongecake sits at the intersection of AI developer tools, automation/RPA, and test automation—markets measured in the billions today and growing quickly Grand View Research AI code tools, Grand View Research RPA, Fortune Business Insights automation testing.

Bottom-up calculation:

As an initial niche, assume 150k developers globally actively build automation/QA/LLM‑agent workflows (a small fraction of ~47M developers) and a $20–$50 monthly price for supported/enterprise features; that implies roughly $36M–$90M in annual spend addressable by a productized offering SlashData developer population.

Assumptions:

  • Share of developers doing relevant automation/QA/LLM work is a small subset of the 47M global developer population.
  • Pricing is seat- or workflow-based at roughly $20–$50 per user/month for paid features/support.
  • Counts exclude broader enterprise RPA budgets and double‑counting across categories.

Who are some of their notable competitors

  • Browser Use: Open‑source Python library for letting AI agents control Chromium to automate web tasks. Overlaps on LLM‑driven browser automation without a full desktop GitHub.
  • Playwright: Widely used end‑to‑end web testing and automation framework. Not LLM‑native, but a common alternative for browser automation where deterministic scripts suffice Playwright.
  • Robocorp: Python‑centric RPA stack (open‑source libraries plus orchestration) for automating desktop and web workflows; a developer‑friendly alternative to legacy RPA Robocorp, rpaframework.
  • UiPath: Enterprise RPA platform with broad desktop/web automation, orchestration, and compliance features; a likely incumbent for enterprises considering GUI automation UiPath.
  • OpenDevin/OpenHands: Open‑source dev agent that operates a sandboxed environment with browser and terminal; adjacent to “computer‑use” agents for complex software tasks GitHub notice of move.