sync. logo

sync.

AI lipsync tool for video content creators

Winter 2024active2024Website
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 26 days ago

What do they actually do

sync. runs a hosted AI lipsync service you can use via a web Studio or through an API/SDK. It edits a person’s mouth movements in an existing video so the lips match a target audio track (or generated TTS), while aiming to preserve the speaker’s style and facial detail sync.so homepage models docs.

They offer several models: lipsync-1.9, lipsync-2 (zero‑shot and style‑preserving), and lipsync-2-pro, which adds diffusion‑based super‑resolution for finer details (teeth, beards) and supports up to 4K outputs. Output is a finished video file; you can provide a video plus audio, or a script plus voice clone/TTS, and receive a lipsynced video back via Studio or API lipsync-2-pro page models docs pricing.

Typical workflow: upload a source video, provide target audio (or text + voice ID), pick a model and options (e.g., obstruction detection), then the service detects faces, associates audio to the speaking face (active speaker), synthesizes matching mouth motion, and returns a new video. They publish SDKs, examples, and integration guides for automation and batch jobs. Best results require natural speaking motion in the input; still frames or heavy occlusions remain challenging, though an obstruction‑detection option can help at slower speeds models docs docs index.

Who are their target customer(s)

  • Individual creators: They need quick dialogue changes or translations for short videos without re‑shoots, and want a simple web tool with predictable, low per‑second costs Studio/use cases pricing.
  • Small agencies and video producers: They must deliver polished client videos fast and need high fidelity (fine facial detail, teeth, beards) without manual frame‑by‑frame fixes lipsync-2-pro plan limits.
  • Ad and marketing teams: They need rapid localization and A/B variants while keeping on‑camera style consistent and avoiding lip‑sync artefacts that hurt performance use cases style preservation.
  • Localization / video‑translation vendors: They process large volumes and require batch automation, voice cloning/TTS support, and reliable active‑speaker detection for long, multi‑speaker videos pricing/enterprise docs.
  • Product and engineering teams embedding lipsync: They need an API/SDK with clear limits (video length, concurrency), predictable latency and costs, and examples to automate end‑to‑end workflows docs pricing/limits.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Founder‑led, hands‑on pilots with creators and small agencies from networks/YC; offer free credits and 1:1 onboarding to iterate inputs until production‑ready, then publish short case‑study videos and testimonials.
  • First 50: Seed creator communities (Discord, Reddit, editor groups), target micro‑influencers with time‑limited credits, and ship short how‑to tutorials for common workflows (translation, ad variants) to convert trials to paid usage.
  • First 100: Lean into product‑led developer and agency expansion with SDK examples, integration guides, and batch/whitelabel pilots; add one seller to close multi‑video pilots with ad/marketing teams and localization vendors using simple pilot pricing and case studies.

What is the rough total addressable market

Top-down context:

Relevant spend spans video editing software (~$2.3–2.5B), AI video generation tools (~$0.8–0.9B in the mid‑2020s), language services (~$27B, with a meaningful media/video subset), and the scale driver of digital video ad spend (~$190–191B). sync.’s near‑term direct TAM sits in tools and video‑localization budgets, with longer‑term upside as it expands into broader production workflows video editing AI video generators language services digital video ad spend.

Bottom-up calculation:

Direct product TAM today can be approximated by summing tool and video‑localization budgets: ~$2.3B (video editing) + ~$0.6–0.9B (AI video tools) + ~10% of $27B for video localization (~$2.7B) ≈ low single‑digit billions (~$5–6B) before overlap. A practical early SAM could be $50–$250M if the company captures ~1–5% of that direct TAM, expanding with enterprise/batch use language services source AI video generators video editing.

Assumptions:

  • Around 10% of language services spend is attributable to video dubbing/subtitling/localization.
  • Overlap across software/tool categories is limited for a coarse TAM view.
  • Early capture rates are in the 1–5% range of direct TAM given competition and switching costs.

Who are some of their notable competitors

  • Flawless: Film/TV‑grade “visual dubbing” and performance editing aimed at studio workflows with VFX‑level fidelity and consent tooling; overlaps on high‑end dubbing but focuses on professional post‑production rather than self‑serve/API product Variety.
  • D‑ID: Talking‑head/avatar generation and video translation via Studio/API; strong for avatar‑led content and multilingual assets, but more avatar/agent‑first than a dedicated lipsync editor for real footage Studio/API dubbing features.
  • Synthesia: End‑to‑end AI video platform with avatars and AI dubbing (lip sync + voice preservation) built for enterprise templates and scale; relies heavily on synthetic presenters vs. editing a specific live‑action performance AI Dubbing.
  • Papercup: AI dubbing and localization service emphasizing automated translation + human review; overlaps on localization but is voice/translation‑first rather than hosted per‑video face re‑animation overview RWS partner.
  • Wav2Lip (open source): Free research model many teams prototype with; requires engineering effort and extra processing for high fidelity, and lacks a managed Studio/API experience that sync. provides GitHub.