What do they actually do
Percival ships Percy, a lightweight tool that runs inside a researcher’s Python workspace (Jupyter, VS Code, etc.) and generates Python code to clean, transform, and explore messy, real‑world datasets. You install Percy, open its sidebar, optionally connect external data sources, and ask it to prepare or analyze data; Percy proposes code and can execute it in your environment so you can iterate quickly on results Percival homepage, FAQ.
Percy is available today as a downloadable closed‑beta and is free to use during the beta (with token limits). The product is aimed at researchers and data analysts who already work in Python and want to speed up data preparation and exploratory analysis without leaving their existing tools FAQ, Percival homepage, YC page.
Public writeups have occasionally described a different use case (document/ERP automation for distributors), but the company’s site and YC profile currently emphasize Percy as an AI copilot for research data analysis; treat older third‑party descriptions as out‑of‑date or unrelated pilots Fondo post, LinkedIn post, Percival homepage, YC page.
Who are their target customer(s)
- Academic researchers (PhD students, postdocs) working in Python notebooks: They spend many hours turning messy experiment/survey outputs into analysis‑ready tables, writing repetitive cleaning code and struggling to keep transformations reproducible Percival homepage, YC page.
- Industry data scientists on small teams: They need to explore new datasets quickly for prototypes and analyses but get slowed by ad‑hoc cleaning and switching between tools Percival homepage, YC page.
- Research engineers / data assistants handling external data sources: They ingest CSV exports, instrument files, and partner feeds with inconsistent formats, leading to brittle one‑off scripts and reconnecting broken pipelines FAQ, Percival homepage.
- Team leads and lab managers: They need analyses to be auditable and repeatable but can’t easily see what cleaning steps were applied or reproduce colleagues’ results, slowing reviews and onboarding YC page, Percival homepage.
- Independent analysts / consultants doing one‑off Python analyses: They lose billable time to repetitive cleaning code and verifying that transformed data matches the source for each client project Percival homepage, FAQ.
How would they acquire their first 10, 50, and 100 customers
- First 10: Direct outreach to PhD students, postdocs, and research engineers in founder and YC networks; run hands‑on onboarding sessions on a real dataset to produce a reproducible notebook and secure a short case‑study quote Percival homepage, YC page.
- First 50: Host small-group workshops and office hours with university departments and small data teams; publish how‑to notebooks and before/after examples, and recruit a few campus/company ambassadors in exchange for feedback and introductions.
- First 100: Reduce install friction (pip/conda; VS Code/Jupyter extensions) and ship turnkey templates for common file types so value appears in minutes; partner with university core facilities and small consultancies for paid lab pilots and add a simple referral credit to turn single‑user wins into multi‑seat trials.
What is the rough total addressable market
Top-down context:
Percival sits in the data‑preparation/data‑cleaning software category. IMARC estimates the global data‑preparation market at about USD 6.5B in 2024 with strong growth; other firms put 2024/2025 in a similar range with rapid expansion thereafter IMARC, Precedence Research.
Bottom-up calculation:
A conservative SAM for Percy’s Python‑first researcher/small‑team niche assumes 10–20% of category spend comes from code‑centric users who work in notebooks/editors. Applying that share to a ~USD 6.5B baseline implies roughly USD 650M–1.3B SAM IMARC. A multi‑million global researcher population underscores the user base scale, though not all will be addressable or paid Our World in Data, UNESCO.
Assumptions:
- 10–20% of data‑prep spend is by code‑first researchers and small teams using notebooks/editors.
- These users adopt Python‑native assistants and pay for productivity/reproducibility once Percy leaves closed beta.
- The USD ~6.5B 2024 market size is a reasonable baseline for current planning despite variance across reports.
Who are some of their notable competitors
- GitHub Copilot (in notebooks/VS Code): General code generation and assistance inside Jupyter notebooks via VS Code; helps write and edit Python for data analysis and cleaning tasks.
- Jupyter AI (official Jupyter extension): An extension that integrates LLMs directly into Jupyter for chat and code generation via magic commands, overlapping with Percy’s in‑notebook assistance.
- Databricks Assistant: A context‑aware assistant embedded in Databricks notebooks to generate and fix code for data/ML workflows; notable for teams already on Databricks Docs.
- Alteryx Designer Cloud (Trifacta): A mature, GUI‑driven data preparation platform (formerly Trifacta) used widely for cleaning and transformation; different modality but competes for the same workflows.
- PandasAI (open‑source): A Python library that brings LLM‑based code generation and data cleaning/visualization to pandas, overlapping with Percy’s Python‑native assistant approach.