RunRL

Reinforcement learning as a service

Spring 2025active2025•Website

Disclaimer

FYI Combinator is not affiliated with Y Combinator. Reports are generated by AI Research Agents and may not be 100% accurate.

Documenso

Open source e-signing

The open source DocuSign alternative. Beautiful, modern, and built for developers.

Learn more →

Your Company Here

Sponsor slot available

Want to be listed as a sponsor? Reach thousands of founders and developers.

Report from 3 months ago

What do they actually do

RunRL provides a hosted platform with a Python SDK and REST API that runs reinforcement‑learning fine‑tuning on language models where you have access to the weights. Users upload prompts or datasets, define a reward function (Python code, an automatic “judge” model, or an environment), launch an RL job, and receive trained checkpoints along with run statistics to deploy back into their systems (RunRL home, Docs overview, Quickstart, REST API).

Today, their materials target researchers and developer teams building LLM‑based agents (examples include chemistry tasks, browser agents, and code generation). The company is early‑stage (YC S25), with a small team and public messaging that implies beta users but no published revenue or MAU metrics (YC profile, Why Run RL?, Launch HN). Pricing is usage‑based (self‑serve at $80 per node‑hour, shown alongside an equivalent $10/H100‑hour), with options for enterprise contracts, on‑prem/VPC deployments, and larger GPU clusters; they also list an agent‑oriented product called AgentFlow for continuous/production workflows (pricing, products).

Who are their target customer(s)

ML/RL researchers and engineers running experiments: They want to apply RL to LLMs without building orchestration, GPU tooling, or training pipelines from scratch; they need a system that accepts a reward function and returns usable checkpoints (docs, pricing).
Product teams building domain‑specific agents (code, browser, chemistry): General LLMs are inconsistent on niche tasks; these teams need a reliable way to specialize behavior using a clear reward signal rather than ad‑hoc prompt tweaks (use cases, products).
Small developer teams/startups without RL expertise: They don’t want to implement RL algorithms or infra; they need simple SDKs/APIs, quickstarts, and tooling to define/debug rewards and automate runs (quickstart, PyPI SDK).
Enterprise ML/platform teams owning production deployments: They require on‑prem/VPC options, strict data controls, and predictable scaling on large GPU fleets with enterprise support and contracts (enterprise/pricing).
Teams running agents in production seeking continuous improvement: They need a safe loop to collect behavior, score it, and retrain so agents improve over time without re‑architecting their stack (AgentFlow/products).

How would they acquire their first 10, 50, and 100 customers

First 10: Convert hands‑on pilots via YC and founder networks: offer free compute credits, help wire up a reward function in onboarding, and deliver a deployable checkpoint that demonstrates concrete task gains (YC profile, quickstart).
First 50: Publish ready‑to‑run templates and tutorials (agents, code, chemistry) and run targeted workshops/contests to turn participants into paid trials (use cases, PyPI SDK).
First 100: Layer in account‑led pilots with platform/ML teams, ship on‑prem/VPC and simple contracts for compliance buyers, and sign 2–3 consulting/cloud partners to resell integration/deployment services (enterprise, pricing).

What is the rough total addressable market

Top-down context:

Practical near‑term TAM lines up with AI development tools and fine‑tuning services: Statista estimates ~US$9.8B for AI development tool software in 2025, and one report pegs LLM fine‑tuning services at ~US$1.4B in 2024 (Statista, Dataintelo). The broader enterprise LLM market is several billion today and AI infrastructure spend is scaling into the tens‑to‑hundreds of billions per IDC/Gartner (FBI, IDC, Gartner).

Bottom-up calculation:

If ~5,000 organizations run domain‑specific LLMs and 20% adopt an RL fine‑tuning platform at an average blended software+compute spend of US$500k/year, the near‑term bottom‑up TAM is ~US$500M. At 10% adoption among 10,000 such teams with US$1M/year in enterprise deployments, TAM approaches ~US$1B (illustrative).

Assumptions:

A meaningful subset of LLM teams require access to model weights and are willing to use RL‑based fine‑tuning rather than prompt‑only approaches.
Average annual spend blends platform fees with compute billed per training hour (e.g., list pricing cites $80/node‑hour; $10/H100‑hour equivalent) (pricing).
Adoption depends on availability of reward functions, data controls (on‑prem/VPC), and integration effort.

Who are some of their notable competitors

Predibase: End‑to‑end platform for reinforcement fine‑tuning (“RFT”) with hosted UI, secure reward server, and integrated serving/monitoring—positioned to let teams run RFT without building infra (RFT overview).
Hugging Face (TRL + ecosystem): Widely used open‑source RLHF/RL fine‑tuning tools (TRL) plus managed training/inference and the Hub; appeals to teams that prefer OSS + self‑hosted or managed workflows (TRL docs, RLHF guide).
OpenAI: Managed reinforcement fine‑tuning pipelines for OpenAI models using grader/judge‑style rewards; attractive if customers accept model lock‑in and a closed‑weight, fully managed stack (RFT guide).
Amazon SageMaker / AWS ML stack: Platform components (data labeling, reward‑model training, managed training) that enterprises can stitch together to run RLHF/RFT inside their VPC at scale (AWS RLHF post).
Scale AI and similar data vendors: Provide human‑in‑the‑loop data, preference labeling, judge/review workflows, and eval pipelines used alongside RL training stacks; partial substitute/adjacent to RLaaS platforms (Scale RLHF).