Tensorfuse logo

Tensorfuse

Run serverless GPUs on your own cloud

Winter 2024active2024Website
Sponsored
Documenso logo

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →
?

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 26 days ago

What do they actually do

Tensorfuse installs a serverless GPU platform inside a customer’s AWS account so teams can deploy, autoscale, and operate GPU-backed APIs and jobs without building Kubernetes + GPU infrastructure themselves. The onboarding flow uses CloudFormation to create an EKS cluster under a cross‑account IAM role, keeping data and models within the customer’s VPC Tensorfuse intro Getting started.

The live product includes a runtime (Tensorkube) that builds your container, deploys it with GPU resources, and exposes it as an HTTPS endpoint with autoscaling. It supports common inference servers and frameworks (e.g., vLLM, TensorRT) and can run text, audio, image, and custom models. It also provides job queues, batch jobs, dev containers, fine‑tuning workflows, and multi‑LoRA inference. Billing combines monthly plans with a usage metric called Managed GPU Hours (MGH) Deployments Product/features Site/pricing.

The setup is self‑serve via web console/CLI and typically involves connecting AWS, running the permissions stack, and provisioning the EKS cluster (docs cite ~25–30 minutes for cluster creation). YC materials claim teams can get deployments running in under an hour, and the site lists testimonials and case stories from early customers using Tensorfuse to get production retrievers and LLM pipelines online quickly Getting started YC listing Homepage testimonials.

Who are their target customer(s)

  • Startup ML/engineering teams shipping model APIs: They need GPU-backed endpoints fast without spending weeks on Kubernetes, GPU drivers, autoscaling, and observability. They prefer to run in their AWS account and avoid undifferentiated infra work.
  • Data scientists and ML researchers running fine‑tuning and batch jobs: They want a straightforward way to queue and run fine‑tuning/training on cloud GPUs in their own account without managing job orchestration, containers, or cluster setup.
  • Product teams building LLM pipelines and retrievers: They need reliable throughput and latency with support for common inference servers, plus simple deployment of retrievers and multi‑model endpoints into production.
  • Platform/SRE teams at mid‑market and enterprises: They must keep data/models in their own cloud and meet RBAC, SSO, auditability, and compliance requirements (e.g., SOC2/HIPAA) while enabling teams to run GPU workloads.
  • Apps with spiky or latency‑sensitive inference traffic: They struggle with idle GPU cost and cold starts; they need autoscaling and faster container start techniques to keep latency low without overprovisioning.

How would they acquire their first 10, 50, and 100 customers

  • First 10: Do white‑glove PoCs for YC and early LLM startups: install the CloudFormation/EKS stack in their AWS account and prove a production endpoint within a week, using YC intros and existing testimonials to convert pilots.
  • First 50: Lean on product‑led trials: publish ready‑to‑run examples and one‑hour workshops, pair with trial credits and templated deployment YAMLs so teams can self‑validate inference, autoscaling, and fine‑tuning.
  • First 100: Add AWS/channel partners and a standardized two‑week onboarding playbook with sales engineering and customer success. Publish SOC2/SSO/RBAC docs and case studies to win platform/SRE buyers and move from PoCs to paid MGH contracts.

What is the rough total addressable market

Top-down context:

Analyst estimates put today’s AI inference market around $90–104B and growing quickly, with broader data‑center GPU markets in the tens to low hundreds of billions over the next 3–5 years Grand View Research Fortune Business Insights MarketsandMarkets.

Bottom-up calculation:

Tensorfuse addresses the slice of AI inference/GPU spend where teams run workloads in their own cloud accounts and want managed tooling instead of building Kubernetes+GPU plumbing. If cloud inference/GPU spend is ~50% of the total and 15–30% of that prefers in‑account operation over fully managed services, the SAM is roughly $10–30B today; near‑term SOM for one focused vendor is likely in the low‑to‑hundreds of millions depending on execution market context.

Assumptions:

  • Cloud accounts represent ~50% of total AI inference/GPU spend that Tensorfuse can plausibly touch.
  • Within cloud spend, 15–30% of buyers prefer to run in their own accounts vs. fully managed hyperscaler services.
  • Adoption depends on enterprise comfort with third‑party control planes, compliance, and competition from hyperscalers/neoclouds.

Who are some of their notable competitors

  • AWS SageMaker: AWS’s native service for training and hosting models, including GPU‑backed production endpoints and multi‑model endpoints, chosen when teams want a fully AWS‑managed path inside their account SageMaker docs.
  • Run:ai: GPU orchestration for self‑hosted environments focused on pooling GPUs, improving utilization, and reducing model cold starts for enterprise multi‑tenant use cases Run:ai docs.
  • Cortex: Open‑source tooling to deploy ML APIs into your AWS account, creating and managing clusters in your VPC with autoscaling GPU endpoints; a lower‑level, code‑first alternative Cortex docs.
  • Modal: A hosted serverless GPU platform for quickly deploying GPU containers and HTTP endpoints with low startup times; convenient when in‑account control/compliance isn’t required Modal blog.
  • Replicate: Hosted model deployments that turn pushed models into API endpoints on Replicate’s GPU fleet; simple to get started but runs on Replicate infra by default rather than in the customer’s AWS account Replicate docs.