What do they actually do
Aqua Voice makes a desktop voice-to-text app for macOS and Windows that types into whatever text box is active, so you can speak into Gmail, Slack, your IDE, terminals, and more. There’s also a browser-based sandbox to try it without installing anything (homepage, sandbox).
When you hold a hotkey and speak, it streams text and can also interpret simple commands to edit what’s already on screen (e.g., turning lines into a list). It includes a custom dictionary for names/terms, global style/instruction rules, and a history view to replay or undo transcriptions (YC listing, changelog).
Under the hood, Aqua uses its own speech model, Avalon, which is built into the app and also offered as a developer API. They position Avalon as tuned for “people talking to computers” and publish benchmark accuracy/latency numbers; the API is presented as drop‑in compatible with Whisper‑style integrations (Avalon API, Introducing Avalon).
Who are their target customer(s)
- Long-form writers and editors: Switching between speaking ideas and typing edits breaks flow, and cleaning up raw dictation takes time. They want continuous dictation with real-time cleanups so drafts stay readable (demo).
- Developers and technical authors: General-purpose speech tools miss technical terms (e.g., kubectl, model names), causing errors and manual fixes. They need reliable recognition of jargon directly in IDEs/CLIs (Avalon API).
- Knowledge workers with heavy email/chat/docs workload: Moving across apps and repeating boilerplate is slow, and some voice tools don’t paste reliably anywhere. They need a desktop client that types into any active field without extra steps (sandbox).
- Small teams and orgs with billing/privacy requirements: They need centralized billing, admin controls, and privacy/audit settings that consumer dictation tools lack. Aqua has a Team plan and is rolling out org controls (pricing, changelog).
How would they acquire their first 10, 50, and 100 customers
- First 10: Founder-led onboarding of friends, YC peers, and early sandbox signups; hands-on setup sessions to gather feedback and quickly ship fixes (YC page, sandbox).
- First 50: Open the free 1,000-word tier and promote the live demo in writer/dev communities (e.g., HN), plus short webinars and before/after dictation examples to drive trial signups (pricing, HN launch).
- First 100: Convert to paid and small teams via trials and simple case studies; push Team plan with centralized billing and ship sample IDE/Slack/docs integrations using the Avalon API (plans/changelog, Avalon API).
What is the rough total addressable market
Top-down context:
Speech-to-text APIs are estimated at roughly $3.8–5.0B in 2024, with broader speech/voice recognition forecasts reaching tens of billions by 2030 (Grand View Research, Allied/PR Newswire, MarketsandMarkets).
Bottom-up calculation:
At $96/year (Pro), 0.5–1% penetration of a 47.2M developer base implies roughly $22.7M–$45.3M ARR; separately, 1% of a $3.8B API market would be about $38M in revenue (pricing, SlashData dev count, Grand View Research).
Assumptions:
- Annual Pro price is $96/year based on $8/month billed annually.
- Global developer population baseline is ~47.2M.
- Use the conservative $3.8B 2024 API market size for share scenarios.
Who are some of their notable competitors
- Apple macOS Dictation / Voice Control: Built-in Mac features for dictation and voice commands that work system‑wide, giving macOS users a default way to speak text and edit without extra software.
- Nuance Dragon (Microsoft): A long-standing professional Windows dictation product with custom vocabularies and voice commands, widely used in enterprise and heavy dictation workflows.
- Otter.ai: Cloud transcription for meetings and conversations with team workspaces and searchable archives; competes for users prioritizing shared transcripts and summaries over local text entry.
- OpenAI Whisper / Speech-to-text API: A popular developer API for transcription; a direct alternative to Avalon for teams that want existing STT infrastructure and can add their own post-processing.
- Deepgram: Developer-focused speech-to-text with low-latency streaming and custom vocabularies; commonly used to embed transcription and voice features into products.