Model-Market Fit: The Capability Threshold Framework for AI Startups

Research Date: 2026-01-26 Source URL: https://www.nicolasbustamante.com/p/model-market-fit

Reference URLs

Summary

Nicolas Bustamante, co-founder of Doctrine (AI legal tech) and Fintool (AI financial copilot), introduces Model-Market Fit (MMF) as a prerequisite layer beneath traditional product-market fit for AI startups. The framework extends Marc Andreessen’s influential 2007 essay “The Only Thing That Matters” by adding model capability as a determining variable in whether markets can adopt AI products at all.

The central thesis holds that when MMF exists—when underlying models can perform the core task a market demands—Andreessen’s framework applies perfectly and markets “pull the product out of the startup.” When MMF does not exist, no amount of engineering, UX design, or go-to-market strategy can compensate for models that cannot perform the fundamental job to be done.

Bustamante demonstrates this pattern through case studies in legal AI (which exploded after GPT-4 crossed the capability threshold in March 2023) and coding assistants (which became indispensable after Claude 3.5 Sonnet in June 2024), contrasted with domains where MMF remains absent: mathematical proof generation, high-stakes finance, and autonomous drug discovery.

The MMF Framework

Conceptual Foundation

The framework builds on Andy Rachleff’s insight (popularized by Andreessen) that market matters more than team or product because great markets pull products out of startups. Bustamante argues that for AI products specifically, model capability determines whether that gravitational pull can begin:

flowchart TD subgraph Prerequisites["Prerequisites for AI Startup Success"] ModelCapability[Model Capability] MarketDemand[Market Demand] end subgraph Assessment["MMF Assessment"] CheckMMF{Model-Market Fit Exists?} end subgraph Outcomes["Market Outcomes"] MarketPull[Market Pulls Product] PMF[Product-Market Fit Achievable] Success[Startup Success] Stagnation[Product Stagnation] NoAdoption[No Customer Adoption] end ModelCapability --> CheckMMF MarketDemand --> CheckMMF CheckMMF -->|Yes| MarketPull MarketPull --> PMF PMF --> Success CheckMMF -->|No| Stagnation Stagnation --> NoAdoption style Success fill:#2e7d32,color:#fff style NoAdoption fill:#c62828,color:#fff style CheckMMF fill:#1565c0,color:#fff

The MMF Test

Bustamante proposes a three-component test for determining whether MMF exists:

Component	Definition
Same inputs as human expert	Model receives what the human would receive—documents, data, context—without magical preprocessing
Output customer would pay for	Production-quality work solving a real problem, not a demo or proof of concept
Without significant human correction	Human may review, refine, or approve, but not rewrite 50% of output

The Human-in-the-Loop Diagnostic

A critical diagnostic for MMF presence versus absence lies in how “human-in-the-loop” functions within a product:

MMF Status	Human-in-the-Loop Role	Test
Present	Feature	Maintains quality, builds trust, handles edge cases. AI does work; human provides oversight.
Absent	Crutch	Hides that AI cannot perform core task. Human compensates, not augments.

The definitive test: “If all human correction were removed from this workflow, would customers still pay? If the answer is no, there’s no MMF. There’s only a demo.”

Case Studies: MMF Existence and Absence

Legal AI: GPT-4 (March 2023)

Legal AI exemplifies MMF unlocking a dormant market. Before 2023, legal tech AI companies struggled to cross $100M ARR despite market demand. Bustamante draws on firsthand experience founding Doctrine in 2016:

Pre-MMF State (Pre-2023):

BERT and similar transformer models excelled at classification tasks (document sorting, contract type identification, issue flagging)
Legal work requires generation and reasoning: drafting memos synthesizing case law, summarizing depositions while preserving nuanced arguments, generating tailored discovery requests
Traditional ML could categorize contracts but could not write coherent briefs explaining enforceability under specific state laws

Post-MMF State (Post-GPT-4):

Within 18 months of GPT-4 release, Silicon Valley legal startups raised hundreds of millions in funding
Thomson Reuters acquired Casetext for $650 million
Doctrine’s business grew substantially
Legal AI “minted more unicorns in 12 months than in the previous 10 years combined”

The market demand remained constant; model capability crossed the threshold.

Coding Assistants: Claude 3.5 Sonnet (June 2024)

Coding assistants demonstrate a similar pattern at a different threshold:

Pre-Sonnet: GitHub Copilot had millions of users, but the experience was “autocomplete that occasionally helps.” Bustamante reports trying Cursor early, finding it “meh,” and deleting it repeatedly.

Post-Sonnet: “Within a week, I couldn’t work without Cursor. Neither could anyone on my team. The product became the workflow.”

Cursor’s growth went vertical not due to new feature development but because Claude 3.5 Sonnet crossed the capability threshold for genuine codebase understanding and high-quality code generation.

Mathematical Proofs: MMF Absent

Mathematical proof generation represents a market where MMF remains uncrossed despite significant demand:

Research institutions, defense contractors, and tech companies would pay millions for genuine mathematical reasoning
Models can verify known proofs, assist with mechanical steps, and occasionally produce insights on bounded problems
Originating novel proofs on open problems remains beyond current capability

However, progress is occurring at the frontier. Bustamante cites Sébastien Bubeck’s experiment where GPT-5-Pro improved a bound in convex optimization from 1/L to 1.5/L, reasoning for 17 minutes to generate a correct proof. This suggests MMF for mathematical reasoning may be approaching.

High-Stakes Finance: MMF Absent

Financial analysis presents a stark capability gap:

Capability Challenges:

Excel output remains unreliable for complex financial models
AI struggles to combine quantitative analysis with qualitative insights from extensive documents
End-to-end reasoning that justifies million-dollar positions exceeds current capability

Benchmark Evidence (Vals.ai):

Benchmark	Top Model Accuracy	Top Performer
LegalBench	87.04%	Gemini 3 Pro
Finance Agent	56.55%	GPT 5.1

The 30-point accuracy gap between legal and finance benchmarks quantifies the MMF disparity. Legal has crossed the production-grade threshold; finance has not.

The 80/99 Accuracy Gap

Bustamante identifies a critical distinction in accuracy requirements across market types:

Market Type	Acceptable Accuracy	Rationale
Unregulated	~80%	AI writing drafts of marketing copy creates value even with heavy editing
Regulated	~99%	Contract review missing 20% of clauses creates liability, not value

“The gap between 80% and 99% accuracy is often infinite in practice. It’s the difference between ‘promising demo’ and ‘production system.’”

Many AI startups occupy this gap, raising capital on demonstrations while awaiting capability that would make products actually functional.

Strategic Framework

The Timing Dilemma

Building for current versus anticipated MMF creates a strategic dilemma:

Arguments for Waiting:

Building around absent MMF means betting on improvements outside one’s control
Runway burns while model providers determine capability timelines
The required capability might arrive differently than anticipated, or not at all within survival horizon

Arguments for Being Early:

When MMF unlocks, success requires more than model capability:
- Domain-specific data pipelines
- Regulatory relationships built over years
- Customer trust
- Deep workflow integration
- Understanding of how professionals actually work
Teams closest to problems shape how models get evaluated, fine-tuned, and deployed

The Dangerous Zone

Bustamante identifies the “dangerous zone” as MMF estimated at 24-36 months away:

Close enough to seem imminent
Far enough to burn through multiple funding rounds waiting

The resolution depends on market size. Healthcare and financial services markets are sufficiently massive that even Anthropic and OpenAI pursue them despite mixed current results. Expected value calculation:

expected_value = probability_of_MMF_arriving × market_size × likely_share

For trillion-dollar markets, the risk-reward calculation permits early positioning despite uncertain capability timelines.

The Agentic Threshold

Beyond raw intelligence, Bustamante identifies a second capability frontier: the ability to work autonomously over extended periods.

Current MMF Limitations

Existing MMF examples (legal document review, coding assistance) involve fundamentally short-horizon tasks. Prompt in, output out, maybe a few tool calls. Models produce useful output in seconds or minutes.

High-Value Work Requirements

The highest-value knowledge work operates differently:

Role	Work Pattern
Financial analyst	Days building models, stress-testing assumptions, synthesizing dozens of sources
Strategy consultant	Weeks of research, interviews, and analysis producing iterative deliverables
Drug discovery researcher	Months designing and executing experimental campaigns

Agentic Capability Requirements

The agentic threshold requires:

Capability	Definition
Persistence	Maintaining goals and context across hours or days
Recovery	Recognizing failures, diagnosing problems, attempting alternative approaches
Coordination	Breaking complex objectives into subtasks, executing in sequence
Judgment	Knowing when to proceed versus when to stop and request guidance

Current agents handle tasks measured in minutes. Tasks measured in days represent a phase change in capability, not an incremental improvement.

This explains why finance lacks MMF despite models being “good at reading documents.” Reading a 10-K is a 30-second task. Building an investment thesis is a multi-day workflow requiring data gathering, model building, scenario testing, and coherent synthesis across the entire process.

flowchart LR subgraph Current["Current Capability (Minutes)"] ReadDoc[Read Document] AnswerQ[Answer Question] GenCode[Generate Code] end subgraph Future["Required for MMF (Days)"] GatherData[Gather Data] BuildModels[Build Models] TestScenarios[Test Scenarios] Synthesize[Synthesize Conclusions] Iterate[Iterate on Feedback] end ReadDoc --> GatherData AnswerQ --> BuildModels GenCode --> TestScenarios GatherData --> Synthesize BuildModels --> Synthesize TestScenarios --> Synthesize Synthesize --> Iterate Iterate --> GatherData style ReadDoc fill:#1565c0,color:#fff style AnswerQ fill:#1565c0,color:#fff style GenCode fill:#1565c0,color:#fff style GatherData fill:#6a1b9a,color:#fff style BuildModels fill:#6a1b9a,color:#fff style TestScenarios fill:#6a1b9a,color:#fff style Synthesize fill:#6a1b9a,color:#fff style Iterate fill:#6a1b9a,color:#fff

Complementary Framework: Benchmark Saturation vs. Capability Stagnation

Bustamante’s related article “Are LLMs Plateauing? No. You Are.” (October 2025) addresses the perception that LLM progress has stalled:

The Benchmark Saturation Problem

Many users perceive LLM plateau because they test on tasks earlier models already saturated:

Translation: GPT-4o achieved ~100% accuracy; successors show no improvement because there is no room to improve
Simple math, concept explanation, email rewriting: solved problems

The analogy: “measuring a rocket’s speed with a car speedometer. Once you hit the max reading, everything looks the same.”

Intelligence Manifests at the Frontier

Raw LLM intelligence continues exploding, but at the frontier on tasks pushing absolute reasoning limits:

Evidence: GPT-5-Pro produced novel mathematical proofs, including improving a bound in convex optimization from 1/L to 1.5/L. The model “reasoned for 17 minutes to generate a correct proof for an open problem.”

This represents creation of new knowledge, not solving known problems.

The Distinction: Intelligence vs. Usefulness

The critical insight: “Intelligence without application is just a party trick. Intelligence with tool use is the revolution.”

Current frontier models outperform most humans on most intellectual tasks:

Legal analysis: better than most lawyers
Medical diagnosis: better than most doctors
Code review: better than most senior engineers
Financial modeling: better than most analysts

What remains missing is tool orchestration and persistence—the ability to work over time toward goals using external resources.

Key Findings

Model-Market Fit (MMF) serves as a prerequisite layer beneath product-market fit for AI startups; without it, markets cannot pull products regardless of demand intensity
Capability thresholds are discrete, not continuous: markets dormant for years explode within months when models cross specific thresholds (legal AI with GPT-4, coding with Claude 3.5 Sonnet)
The human-in-the-loop diagnostic distinguishes MMF presence (oversight as feature) from absence (compensation as crutch)
Benchmark evidence quantifies MMF disparities: LegalBench at 87% vs Finance Agent at 56.55% explains why legal AI thrives while finance AI struggles
The 80/99 accuracy gap is “infinite in practice” for regulated industries where partial accuracy creates liability rather than value
The agentic threshold represents a second capability frontier: sustained autonomous operation over hours/days, not just prompt-response cycles
Perceived LLM plateau reflects benchmark saturation, not capability stagnation; frontier performance continues advancing on genuinely difficult tasks
Strategic positioning must balance the risk of building for absent MMF against the advantage of domain expertise when MMF eventually arrives

References

Model-Market Fit - Nicolas Bustamante - January 19, 2026
Are LLMs Plateauing? No. You Are. - Nicolas Bustamante - October 22, 2025
The Only Thing That Matters - Marc Andreessen - June 25, 2007
Vals.ai Finance Agent Benchmark - Accessed January 2026
Fintool Technology - Fintool v4 benchmark results
Nicolas Bustamante - About - Author background