Model-Market Fit: The Capability Threshold Framework for AI Startups
Research Date: 2026-01-26 Source URL: https://www.nicolasbustamante.com/p/model-market-fit
Reference URLs
- Model-Market Fit (Primary Source)
- Are LLMs Plateauing? No. You Are. (Related Article)
- Marc Andreessen - The Only Thing That Matters (2007)
- Vals.ai Finance Agent Benchmark
- Vals.ai LegalBench Benchmark
- Nicolas Bustamante LinkedIn
- Fintool
- Doctrine
Summary
Nicolas Bustamante, co-founder of Doctrine (AI legal tech) and Fintool (AI financial copilot), introduces Model-Market Fit (MMF) as a prerequisite layer beneath traditional product-market fit for AI startups. The framework extends Marc Andreessen’s influential 2007 essay “The Only Thing That Matters” by adding model capability as a determining variable in whether markets can adopt AI products at all.
The central thesis holds that when MMF exists—when underlying models can perform the core task a market demands—Andreessen’s framework applies perfectly and markets “pull the product out of the startup.” When MMF does not exist, no amount of engineering, UX design, or go-to-market strategy can compensate for models that cannot perform the fundamental job to be done.
Bustamante demonstrates this pattern through case studies in legal AI (which exploded after GPT-4 crossed the capability threshold in March 2023) and coding assistants (which became indispensable after Claude 3.5 Sonnet in June 2024), contrasted with domains where MMF remains absent: mathematical proof generation, high-stakes finance, and autonomous drug discovery.
The MMF Framework
Conceptual Foundation
The framework builds on Andy Rachleff’s insight (popularized by Andreessen) that market matters more than team or product because great markets pull products out of startups. Bustamante argues that for AI products specifically, model capability determines whether that gravitational pull can begin:
The MMF Test
Bustamante proposes a three-component test for determining whether MMF exists:
| Component | Definition |
|---|---|
| Same inputs as human expert | Model receives what the human would receive—documents, data, context—without magical preprocessing |
| Output customer would pay for | Production-quality work solving a real problem, not a demo or proof of concept |
| Without significant human correction | Human may review, refine, or approve, but not rewrite 50% of output |
The Human-in-the-Loop Diagnostic
A critical diagnostic for MMF presence versus absence lies in how “human-in-the-loop” functions within a product:
| MMF Status | Human-in-the-Loop Role | Test |
|---|---|---|
| Present | Feature | Maintains quality, builds trust, handles edge cases. AI does work; human provides oversight. |
| Absent | Crutch | Hides that AI cannot perform core task. Human compensates, not augments. |
The definitive test: “If all human correction were removed from this workflow, would customers still pay? If the answer is no, there’s no MMF. There’s only a demo.”
Case Studies: MMF Existence and Absence
Legal AI: GPT-4 (March 2023)
Legal AI exemplifies MMF unlocking a dormant market. Before 2023, legal tech AI companies struggled to cross $100M ARR despite market demand. Bustamante draws on firsthand experience founding Doctrine in 2016:
Pre-MMF State (Pre-2023):
- BERT and similar transformer models excelled at classification tasks (document sorting, contract type identification, issue flagging)
- Legal work requires generation and reasoning: drafting memos synthesizing case law, summarizing depositions while preserving nuanced arguments, generating tailored discovery requests
- Traditional ML could categorize contracts but could not write coherent briefs explaining enforceability under specific state laws
Post-MMF State (Post-GPT-4):
- Within 18 months of GPT-4 release, Silicon Valley legal startups raised hundreds of millions in funding
- Thomson Reuters acquired Casetext for $650 million
- Doctrine’s business grew substantially
- Legal AI “minted more unicorns in 12 months than in the previous 10 years combined”
The market demand remained constant; model capability crossed the threshold.
Coding Assistants: Claude 3.5 Sonnet (June 2024)
Coding assistants demonstrate a similar pattern at a different threshold:
Pre-Sonnet: GitHub Copilot had millions of users, but the experience was “autocomplete that occasionally helps.” Bustamante reports trying Cursor early, finding it “meh,” and deleting it repeatedly.
Post-Sonnet: “Within a week, I couldn’t work without Cursor. Neither could anyone on my team. The product became the workflow.”
Cursor’s growth went vertical not due to new feature development but because Claude 3.5 Sonnet crossed the capability threshold for genuine codebase understanding and high-quality code generation.
Mathematical Proofs: MMF Absent
Mathematical proof generation represents a market where MMF remains uncrossed despite significant demand:
- Research institutions, defense contractors, and tech companies would pay millions for genuine mathematical reasoning
- Models can verify known proofs, assist with mechanical steps, and occasionally produce insights on bounded problems
- Originating novel proofs on open problems remains beyond current capability
However, progress is occurring at the frontier. Bustamante cites Sébastien Bubeck’s experiment where GPT-5-Pro improved a bound in convex optimization from 1/L to 1.5/L, reasoning for 17 minutes to generate a correct proof. This suggests MMF for mathematical reasoning may be approaching.
High-Stakes Finance: MMF Absent
Financial analysis presents a stark capability gap:
Capability Challenges:
- Excel output remains unreliable for complex financial models
- AI struggles to combine quantitative analysis with qualitative insights from extensive documents
- End-to-end reasoning that justifies million-dollar positions exceeds current capability
Benchmark Evidence (Vals.ai):
| Benchmark | Top Model Accuracy | Top Performer |
|---|---|---|
| LegalBench | 87.04% | Gemini 3 Pro |
| Finance Agent | 56.55% | GPT 5.1 |
The 30-point accuracy gap between legal and finance benchmarks quantifies the MMF disparity. Legal has crossed the production-grade threshold; finance has not.
The 80/99 Accuracy Gap
Bustamante identifies a critical distinction in accuracy requirements across market types:
| Market Type | Acceptable Accuracy | Rationale |
|---|---|---|
| Unregulated | ~80% | AI writing drafts of marketing copy creates value even with heavy editing |
| Regulated | ~99% | Contract review missing 20% of clauses creates liability, not value |
“The gap between 80% and 99% accuracy is often infinite in practice. It’s the difference between ‘promising demo’ and ‘production system.’”
Many AI startups occupy this gap, raising capital on demonstrations while awaiting capability that would make products actually functional.
Strategic Framework
The Timing Dilemma
Building for current versus anticipated MMF creates a strategic dilemma:
Arguments for Waiting:
- Building around absent MMF means betting on improvements outside one’s control
- Runway burns while model providers determine capability timelines
- The required capability might arrive differently than anticipated, or not at all within survival horizon
Arguments for Being Early:
- When MMF unlocks, success requires more than model capability:
- Domain-specific data pipelines
- Regulatory relationships built over years
- Customer trust
- Deep workflow integration
- Understanding of how professionals actually work
- Teams closest to problems shape how models get evaluated, fine-tuned, and deployed
The Dangerous Zone
Bustamante identifies the “dangerous zone” as MMF estimated at 24-36 months away:
- Close enough to seem imminent
- Far enough to burn through multiple funding rounds waiting
The resolution depends on market size. Healthcare and financial services markets are sufficiently massive that even Anthropic and OpenAI pursue them despite mixed current results. Expected value calculation:
expected_value = probability_of_MMF_arriving × market_size × likely_share
For trillion-dollar markets, the risk-reward calculation permits early positioning despite uncertain capability timelines.
The Agentic Threshold
Beyond raw intelligence, Bustamante identifies a second capability frontier: the ability to work autonomously over extended periods.
Current MMF Limitations
Existing MMF examples (legal document review, coding assistance) involve fundamentally short-horizon tasks. Prompt in, output out, maybe a few tool calls. Models produce useful output in seconds or minutes.
High-Value Work Requirements
The highest-value knowledge work operates differently:
| Role | Work Pattern |
|---|---|
| Financial analyst | Days building models, stress-testing assumptions, synthesizing dozens of sources |
| Strategy consultant | Weeks of research, interviews, and analysis producing iterative deliverables |
| Drug discovery researcher | Months designing and executing experimental campaigns |
Agentic Capability Requirements
The agentic threshold requires:
| Capability | Definition |
|---|---|
| Persistence | Maintaining goals and context across hours or days |
| Recovery | Recognizing failures, diagnosing problems, attempting alternative approaches |
| Coordination | Breaking complex objectives into subtasks, executing in sequence |
| Judgment | Knowing when to proceed versus when to stop and request guidance |
Current agents handle tasks measured in minutes. Tasks measured in days represent a phase change in capability, not an incremental improvement.
This explains why finance lacks MMF despite models being “good at reading documents.” Reading a 10-K is a 30-second task. Building an investment thesis is a multi-day workflow requiring data gathering, model building, scenario testing, and coherent synthesis across the entire process.
Complementary Framework: Benchmark Saturation vs. Capability Stagnation
Bustamante’s related article “Are LLMs Plateauing? No. You Are.” (October 2025) addresses the perception that LLM progress has stalled:
The Benchmark Saturation Problem
Many users perceive LLM plateau because they test on tasks earlier models already saturated:
- Translation: GPT-4o achieved ~100% accuracy; successors show no improvement because there is no room to improve
- Simple math, concept explanation, email rewriting: solved problems
The analogy: “measuring a rocket’s speed with a car speedometer. Once you hit the max reading, everything looks the same.”
Intelligence Manifests at the Frontier
Raw LLM intelligence continues exploding, but at the frontier on tasks pushing absolute reasoning limits:
Evidence: GPT-5-Pro produced novel mathematical proofs, including improving a bound in convex optimization from 1/L to 1.5/L. The model “reasoned for 17 minutes to generate a correct proof for an open problem.”
This represents creation of new knowledge, not solving known problems.
The Distinction: Intelligence vs. Usefulness
The critical insight: “Intelligence without application is just a party trick. Intelligence with tool use is the revolution.”
Current frontier models outperform most humans on most intellectual tasks:
- Legal analysis: better than most lawyers
- Medical diagnosis: better than most doctors
- Code review: better than most senior engineers
- Financial modeling: better than most analysts
What remains missing is tool orchestration and persistence—the ability to work over time toward goals using external resources.
Key Findings
- Model-Market Fit (MMF) serves as a prerequisite layer beneath product-market fit for AI startups; without it, markets cannot pull products regardless of demand intensity
- Capability thresholds are discrete, not continuous: markets dormant for years explode within months when models cross specific thresholds (legal AI with GPT-4, coding with Claude 3.5 Sonnet)
- The human-in-the-loop diagnostic distinguishes MMF presence (oversight as feature) from absence (compensation as crutch)
- Benchmark evidence quantifies MMF disparities: LegalBench at 87% vs Finance Agent at 56.55% explains why legal AI thrives while finance AI struggles
- The 80/99 accuracy gap is “infinite in practice” for regulated industries where partial accuracy creates liability rather than value
- The agentic threshold represents a second capability frontier: sustained autonomous operation over hours/days, not just prompt-response cycles
- Perceived LLM plateau reflects benchmark saturation, not capability stagnation; frontier performance continues advancing on genuinely difficult tasks
- Strategic positioning must balance the risk of building for absent MMF against the advantage of domain expertise when MMF eventually arrives
References
- Model-Market Fit - Nicolas Bustamante - January 19, 2026
- Are LLMs Plateauing? No. You Are. - Nicolas Bustamante - October 22, 2025
- The Only Thing That Matters - Marc Andreessen - June 25, 2007
- Vals.ai Finance Agent Benchmark - Accessed January 2026
- Fintool Technology - Fintool v4 benchmark results
- Nicolas Bustamante - About - Author background