nCompass Technologies logo

nCompass Technologies

Deploy hardware accelerated AI models with only one line of code

Winter 2024active2024Website
HardwareOpen SourceAPICloud ComputingAI
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 26 days ago

What do they actually do

nCompass ships two products today. First, a performance‑profiling IDE (ncprof) for GPU/accelerated code that runs inside VS Code/Cursor with a Python SDK. Engineers add tracepoints, run standard profilers (Nsight Systems/Compute, perf, PyTorch profiler), then open interactive traces in the editor and jump from hotspots to the exact source line. The current release supports manual profiling and trace analysis; AI suggestions are marked as “coming soon” in the docs (quick start, ncprof docs, repo).

Second, an LLM inference hosting/service with a self‑serve API and on‑prem options. It focuses on high concurrency and low time‑to‑first‑token for open‑source and custom models, with “no rate limits,” dedicated/private deployments, and autoscaling. The team attributes efficiency to scheduling and kernel‑level optimizations; those are vendor‑reported results in their launch materials and a partner case study (YC page, Show‑HN thread, Ori case study).

The profiling tool serves engineers who need to find GPU bottlenecks without switching tools, and the hosting product serves teams that want low‑latency, predictable inference (including on‑prem/private) for open‑source or fine‑tuned models. Near‑term roadmap includes an IDE agent that interprets traces and suggests fixes, broader model coverage (e.g., vision), and better efficiency dashboards (docs, Show‑HN).

Who are their target customer(s)

  • GPU/accelerator engineers profiling performance in VS Code/Cursor: They lose time juggling Nsight/perf/PyTorch profiler and their editor to map a hot trace event back to the exact source line. They want to open a trace and navigate directly to offending code without context switching.
  • Developers building on open‑source or fine‑tuned LLMs: They hit rate limits, see unpredictable latency/costs, and face integration friction. They want a simple, low‑latency API and the option to host custom models without heavy setup.
  • Infrastructure/SRE teams running GPU inference clusters: They struggle with low GPU utilization and latency spikes under many concurrent small requests. They need better request scheduling/packing to cut GPU counts and stabilize latency.
  • Enterprises with compliance or data‑sovereignty constraints: They can’t send sensitive data/models to public SaaS endpoints. They need on‑prem or private deployments that autoscale while keeping spend predictable and controllable.
  • Teams serving heavy multimodal or specialized models: Text‑tuned stacks often underperform on large vision/multimodal models, with poor throughput and slow startup. They need improved runtime efficiency and wider model support.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Directly onboard developers who installed/starred the ncprof repo or engaged on GitHub/HN/YC; offer hands‑on sessions and short‑term API/cluster credits to get a trace or model live within a day (docs, GitHub, HN, YC).
  • First 50: Publish 5‑minute tutorials and one‑line SDK snippets, trigger in‑extension prompts and targeted emails to active users, and host two technical webinars showing concrete cost/latency wins using the Ori case study to convert mid‑sized teams (YC, Ori).
  • First 100: Productize pilots with time‑boxed contracts (on‑prem or dedicated) and SLAs; prioritize self‑serve converts, and add 2–3 channel partners (cloud/orchestration/SIs) while publishing pilot case studies to speed procurement (Ori, HN).

What is the rough total addressable market

Top-down context:

Industry reports place AI inference in the low hundreds of billions by 2030, and AI infrastructure/data‑center GPUs in similar ranges; MLOps/APM developer tooling is in the multi‑billion tier and growing fast (MarketsandMarkets inference, Grand View AI infra, Fortune MLOps/APM, Fortune APM).

Bottom-up calculation:

Inference hosting: assume ~25,000 orgs globally running open‑source/custom LLMs with average annual inference infra/hosting spend of ~$400k → ~$10B SAM. Profiling/optimization tooling: assume ~15,000 orgs with GPU‑heavy codebases buying specialized performance tools at ~$75k/year on average → ~$1.1B; combined bottom‑up SAM ≈ ~$11B.

Assumptions:

  • Counts reflect orgs actively operating open/custom LLMs or GPU‑heavy services (not the entire AI adopter base).
  • Average annual spend includes GPUs, serving infra, and managed/on‑prem software but excludes model training.
  • Seat/server‑based tooling spend consolidated to org‑level averages; large enterprises skew higher, startups lower.

Who are some of their notable competitors

  • Together AI: Cloud inference for open‑source models with managed hosting and fine‑tuning; a direct alternative for teams wanting hosted OSS LLMs.
  • Fireworks.ai: High‑performance inference platform for OSS models with focus on low latency and scale; overlaps on developer‑facing APIs.
  • NVIDIA NIM: NVIDIA’s microservices for deploying AI models; relevant for enterprises standardizing on NVIDIA’s serving stack and on‑prem options.
  • vLLM: Open‑source LLM serving engine optimized for throughput/latency; often the baseline many teams consider for self‑hosting.
  • NVIDIA Nsight Systems/Compute: NVIDIA’s official GPU profiling tools; widely used today and the incumbent alternative to nCompass’s IDE‑based workflow.