What do they actually do
DeepGrove publishes an open small language model called Bonsai, a ~0.5B‑parameter LLM that uses ternary weights (−1, 0, +1) to shrink size and computation for edge‑style use. The model, code, and a short Transformers example are public on Hugging Face and GitHub; the authors note it is not instruction‑tuned and recommend fine‑tuning for downstream tasks (Hugging Face model card, GitHub repo). Early users are researchers, hobbyists, and ML teams who download and run the open model, visible via repo stars and model downloads (Hugging Face, GitHub).
Today, inference runs with standard libraries (e.g., BF16/FP16 execution). The team says they are building custom mixed‑precision/low‑bit kernels so ternary weights translate into actual runtime memory and compute savings, not just smaller checkpoints, and they plan to release more model variants and tooling (Hugging Face model card, GitHub README).
The company is very early (YC Summer 2025) and hiring ML/engineering talent. Their stated goal is to run “frontier” intelligence on phones and other constrained devices, with public materials pointing toward a combined model + runtime stack for on‑device deployment (YC company page, homepage).
Who are their target customer(s)
- Mobile app developers building offline assistants, summarization, and search into iOS/Android apps.: Cloud models add latency and privacy risk; current capable models are too large or power‑hungry for phones. They need compact models and an easy on‑device runtime (homepage, Bonsai card).
- Consumer IoT and embedded‑device makers (smart speakers, wearables, sensors).: Tight memory/compute and intermittent connectivity make cloud‑dependent models unreliable; they need lightweight local models that run within device power and RAM limits (homepage, Bonsai card).
- Regulated enterprises (healthcare, finance, gov/enterprise apps).: They must keep sensitive data on‑prem or on‑device; current on‑device options are too weak or hard to integrate. They want efficient, auditable open models and a supported runtime (Bonsai card, GitHub).
- Startup ML teams and researchers working on compact models/quantization.: Turning research models into fast, memory‑efficient runtimes requires custom kernels and engineering; Bonsai’s ternary weights need optimized inference to realize speedups (Bonsai card, GitHub).
- Edge runtime/SDK builders and chip vendors.: They lack models designed for aggressive quantization and tight co‑optimization with hardware accelerators; they want models plus kernels that map cleanly to NPUs/GPUs (homepage, Bonsai card).
How would they acquire their first 10, 50, and 100 customers
- First 10: Contact users who starred/downloaded Bonsai and offer hands‑on help (fine‑tuning, integration, short engineering sprints) to co‑build small case studies and convert them into reference customers (Hugging Face, GitHub).
- First 50: Run a limited SDK preview and funded pilots targeting mobile/IoT teams, with device reference apps (phone assistant, offline summarizer, Raspberry Pi demo), clear docs, and rapid technical support (homepage, Bonsai notes on kernels).
- First 100: Convert pilots to paid pilots with NDAs/compliance, provide on‑device integration help and enterprise support contracts, and launch select hardware partnerships to preinstall/co‑optimize the runtime (YC page, GitHub/runtime roadmap).
What is the rough total addressable market
Top-down context:
The directly relevant market is on‑device AI software and runtimes, projected to reach about $36.6B by 2030; adding a modest slice of regulated on‑prem enterprise demand puts a practical target around $40–60B (Grand View on‑device AI, Fortune Business Insights AI market).
Bottom-up calculation:
Assume 3–6B active devices running on‑device LLMs by 2030 with $0.50–$2 per‑device/year bundled software/runtime fees via OEMs ($1.5–$12B), plus 200–500 regulated enterprises at $0.5–$2M/year ($0.1–$1B). That implies roughly $2–13B in annual software revenue potential for a vendor in this category, a meaningful slice of the $36.6B on‑device AI market by 2030 (Grand View on‑device AI).
Assumptions:
- Per‑device software royalties or equivalent OEM bundling are viable for phones, wearables, and smart/home devices.
- Enterprise on‑prem/on‑device demand remains a small but durable portion of overall AI budgets by 2030.
- Definitions in market reports align with software/runtime spend rather than being dominated by hardware.
Who are some of their notable competitors
- llama.cpp / ggml ecosystem: Open‑source tooling to run quantized LLMs locally across CPUs/NPUs/GPUs; for many developers it’s the fastest path to on‑device inference today, overlapping with DeepGrove’s model+runtime pitch (llama.cpp).
- Hugging Face (model hub + Optimum): Hosts compact/quantized models and provides optimization/export pipelines (Optimum) for many hardware targets, offering an end‑to‑end ecosystem that can substitute for a bespoke ternary model stack (Optimum docs).
- Google’s on‑device stack (TensorFlow Lite / LiteRT): A widely adopted mobile/embedded inference toolchain with conversion, quantization, and profiling; default choice for many Android and IoT deployments (LiteRT overview, LiteRT announcement).
- Apple Core ML: Apple’s native on‑device ML framework and conversion tools integrated with iOS and Apple Silicon; the mainstream option for offline AI in iPhone apps (Core ML docs).
- Chip vendors + SDKs (Qualcomm, Arm, NVIDIA): Hardware vendors provide optimized runtimes/model compilers (e.g., Qualcomm AI Engine Direct/AI Hub, Arm Compute Library/Ethos, NVIDIA TensorRT/Jetson), letting teams tune models to existing accelerators instead of adopting new model formats (Qualcomm, Arm, TensorRT/Jetson).