Exla logo

Exla

An SDK to run transformer models anywhere

Winter 2025active2025Website
Edge Computing SemiconductorsComputer VisionAI
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 12 days ago

What do they actually do

Exla builds InferX, an SDK that lets developers run popular transformer and vision models across different hardware with a single API. The open-source toolkit auto-detects the device (Jetson, GPU, CPU, mobile) and loads an optimized path so you can pip-install, pick a model from their catalog (e.g., CLIP, Whisper, MobileNet, ResNet), and run inference locally Quickstart / docs, Models.

They also ship a mobile SDK and example Android app so teams can download, cache, and run models offline on-device, with progress callbacks and consistent APIs across platforms Android SDK docs. The site and docs show pre-optimized models and tools to optimize your own for specific targets like Jetson, Raspberry Pi, and Apple Silicon Homepage, Quickstart.

Recently, Exla added an on‑demand GPU service for heavier optimization and inference workloads. The company reports sizable speed and memory gains from its optimizations (up to 80% smaller footprint and 3–20× faster), but these are their own benchmarks at this stage. The product is in active private beta with pilots in robotics, manufacturing cameras, drones, and in‑car assistants Homepage, YC profile.

Who are their target customer(s)

  • Robotics team building autonomous robots: Need reliable real‑time perception and control on small boards (e.g., Jetson). Current models are too big/slow and performance varies across hardware and firmware, causing missed deadlines and brittle deployments.
  • Manufacturing/vision system integrator: Must run camera detection on the factory floor with no cloud and strict uptime. Deploying/maintaining models across varied edge hardware is slow and fragile; model size, thermal/power limits, and latency spikes risk production quality.
  • Drone platform engineer: Tight weight, battery, and compute budgets require compact, fast models. Oversized models cut flight time or force offloading; updating and testing models across a distributed fleet with mixed hardware is risky and time-consuming.
  • In‑car assistant/product developer: Requires offline speech/vision with low latency on vehicle SoCs under safety/privacy constraints. Integration cycles are long, with limited control over memory/storage and certification requirements.
  • Mobile app developer adding offline ML features: Wants offline features without bloating app size or draining battery. Packaging/optimizing for many Android devices, managing first‑run downloads/caching, and inconsistent performance across phones hurt UX.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Founder‑led, hands‑on 4–8 week pilots that integrate InferX on the customer’s exact hardware, deliver a working demo, and train their team; sourced via YC/founder intros, targeted LinkedIn outreach, and relevant GitHub projects.
  • First 50: Turn the early wins into three polished reference integrations (e.g., Jetson robot, factory camera, Android in‑car app) with reproducible scripts and case studies, then target lookalike accounts and OEM integrators using that playbook.
  • First 100: Make onboarding self‑serve (better SDK UX, one‑click optimizer, device checks) with paid support tiers, while expanding field engineers and channel partnerships to convert mid‑market accounts and upsell the GPU‑on‑demand service.

What is the rough total addressable market

Top-down context:

Edge AI is projected at about $24.9B in 2025 and ~$66.5B by 2030, including hardware and software Grand View Research. Within that, on‑device AI software/platform spend is a multi‑billion segment expected to grow rapidly GlobeNewswire summary.

Bottom-up calculation:

Near‑term TAM comes from summing conservative software shares of adjacent markets: Edge AI 2025 ($24.9B at ~15% software ≈ $3.7B) Grand View; industrial machine vision (~$11–12B baseline at ~25% software ≈ $2.8–3.0B) MarketsandMarkets; robotics (~$45–50B at ~10% software ≈ $4–5B) GMI Insights; commercial drones (~$13–17B at ~25% software ≈ $3–4B) Fortune Business Insights; plus mobile on‑device SDK/platform spend (~$2–3B) GlobeNewswire. Treated conservatively to avoid double counting, this yields roughly $10–15B near‑term software TAM.

Assumptions:

  • Only the software/platform slice (SDKs, runtimes, optimization) is counted; hardware spend is excluded.
  • Software share per vertical estimated at ~10–30% depending on capital intensity and typical stack makeup.
  • Overlap across segments (e.g., robotics using machine vision) is adjusted conservatively to avoid double counting.

Who are some of their notable competitors

  • NVIDIA TensorRT: Inference optimizer and runtime for NVIDIA GPUs, widely used to accelerate models on edge and datacenter hardware.
  • ONNX Runtime: Cross‑platform inference engine with hardware acceleration backends for CPUs, GPUs, and NPUs; common choice for deploying models across devices.
  • Intel OpenVINO: Toolkit for optimizing and deploying models on Intel CPUs, integrated GPUs, and VPUs, aimed at edge inferencing.
  • TensorFlow Lite: Lightweight runtime for mobile and embedded devices, used to run optimized models on Android and other platforms.
  • Apache TVM / MLC: Compiler stack and community projects (e.g., MLC) focused on running and optimizing models across heterogeneous hardware, including mobile and edge.