Aqua Voice

Talk into any text box - write 4x faster than typing.

Winter 2024active2024•Website

Artificial IntelligenceConsumerProductivityAI

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 6 months ago

What do they actually do

Aqua Voice makes a desktop voice-to-text app for macOS and Windows that types into whatever text box is active, so you can speak into Gmail, Slack, your IDE, terminals, and more. There’s also a browser-based sandbox to try it without installing anything (homepage, sandbox).

When you hold a hotkey and speak, it streams text and can also interpret simple commands to edit what’s already on screen (e.g., turning lines into a list). It includes a custom dictionary for names/terms, global style/instruction rules, and a history view to replay or undo transcriptions (YC listing, changelog).

Under the hood, Aqua uses its own speech model, Avalon, which is built into the app and also offered as a developer API. They position Avalon as tuned for “people talking to computers” and publish benchmark accuracy/latency numbers; the API is presented as drop‑in compatible with Whisper‑style integrations (Avalon API, Introducing Avalon).

Who are their target customer(s)

Long-form writers and editors: Switching between speaking ideas and typing edits breaks flow, and cleaning up raw dictation takes time. They want continuous dictation with real-time cleanups so drafts stay readable (demo).
Developers and technical authors: General-purpose speech tools miss technical terms (e.g., kubectl, model names), causing errors and manual fixes. They need reliable recognition of jargon directly in IDEs/CLIs (Avalon API).
Knowledge workers with heavy email/chat/docs workload: Moving across apps and repeating boilerplate is slow, and some voice tools don’t paste reliably anywhere. They need a desktop client that types into any active field without extra steps (sandbox).
Small teams and orgs with billing/privacy requirements: They need centralized billing, admin controls, and privacy/audit settings that consumer dictation tools lack. Aqua has a Team plan and is rolling out org controls (pricing, changelog).

How would they acquire their first 10, 50, and 100 customers

First 10: Founder-led onboarding of friends, YC peers, and early sandbox signups; hands-on setup sessions to gather feedback and quickly ship fixes (YC page, sandbox).
First 50: Open the free 1,000-word tier and promote the live demo in writer/dev communities (e.g., HN), plus short webinars and before/after dictation examples to drive trial signups (pricing, HN launch).
First 100: Convert to paid and small teams via trials and simple case studies; push Team plan with centralized billing and ship sample IDE/Slack/docs integrations using the Avalon API (plans/changelog, Avalon API).

What is the rough total addressable market

Top-down context:

Speech-to-text APIs are estimated at roughly $3.8–5.0B in 2024, with broader speech/voice recognition forecasts reaching tens of billions by 2030 (Grand View Research, Allied/PR Newswire, MarketsandMarkets).

Bottom-up calculation:

At $96/year (Pro), 0.5–1% penetration of a 47.2M developer base implies roughly $22.7M–$45.3M ARR; separately, 1% of a $3.8B API market would be about $38M in revenue (pricing, SlashData dev count, Grand View Research).

Assumptions:

Annual Pro price is $96/year based on $8/month billed annually.
Global developer population baseline is ~47.2M.
Use the conservative $3.8B 2024 API market size for share scenarios.

Who are some of their notable competitors

Apple macOS Dictation / Voice Control: Built-in Mac features for dictation and voice commands that work system‑wide, giving macOS users a default way to speak text and edit without extra software.
Nuance Dragon (Microsoft): A long-standing professional Windows dictation product with custom vocabularies and voice commands, widely used in enterprise and heavy dictation workflows.
Otter.ai: Cloud transcription for meetings and conversations with team workspaces and searchable archives; competes for users prioritizing shared transcripts and summaries over local text entry.
OpenAI Whisper / Speech-to-text API: A popular developer API for transcription; a direct alternative to Avalon for teams that want existing STT infrastructure and can add their own post-processing.
Deepgram: Developer-focused speech-to-text with low-latency streaming and custom vocabularies; commonly used to embed transcription and voice features into products.