Model-Market Fit: The Capability Threshold Framework for AI Startups

Research Date: 2026-01-26 Source URL: https://www.nicolasbustamante.com/p/model-market-fit

Reference URLs

Summary

Nicolas Bustamante, co-founder of Doctrine (AI legal tech) and Fintool (AI financial copilot), introduces Model-Market Fit (MMF) as a prerequisite layer beneath traditional product-market fit for AI startups. The framework extends Marc Andreessen’s influential 2007 essay “The Only Thing That Matters” by adding model capability as a determining variable in whether markets can adopt AI products at all.

The central thesis holds that when MMF exists—when underlying models can perform the core task a market demands—Andreessen’s framework applies perfectly and markets “pull the product out of the startup.” When MMF does not exist, no amount of engineering, UX design, or go-to-market strategy can compensate for models that cannot perform the fundamental job to be done.

Bustamante demonstrates this pattern through case studies in legal AI (which exploded after GPT-4 crossed the capability threshold in March 2023) and coding assistants (which became indispensable after Claude 3.5 Sonnet in June 2024), contrasted with domains where MMF remains absent: mathematical proof generation, high-stakes finance, and autonomous drug discovery.

The MMF Framework

Conceptual Foundation

The framework builds on Andy Rachleff’s insight (popularized by Andreessen) that market matters more than team or product because great markets pull products out of startups. Bustamante argues that for AI products specifically, model capability determines whether that gravitational pull can begin:

The MMF Test

Bustamante proposes a three-component test for determining whether MMF exists:

ComponentDefinition
Same inputs as human expertModel receives what the human would receive—documents, data, context—without magical preprocessing
Output customer would pay forProduction-quality work solving a real problem, not a demo or proof of concept
Without significant human correctionHuman may review, refine, or approve, but not rewrite 50% of output

The Human-in-the-Loop Diagnostic

A critical diagnostic for MMF presence versus absence lies in how “human-in-the-loop” functions within a product:

MMF StatusHuman-in-the-Loop RoleTest
PresentFeatureMaintains quality, builds trust, handles edge cases. AI does work; human provides oversight.
AbsentCrutchHides that AI cannot perform core task. Human compensates, not augments.

The definitive test: “If all human correction were removed from this workflow, would customers still pay? If the answer is no, there’s no MMF. There’s only a demo.”

Case Studies: MMF Existence and Absence

Legal AI exemplifies MMF unlocking a dormant market. Before 2023, legal tech AI companies struggled to cross $100M ARR despite market demand. Bustamante draws on firsthand experience founding Doctrine in 2016:

Pre-MMF State (Pre-2023):

  • BERT and similar transformer models excelled at classification tasks (document sorting, contract type identification, issue flagging)
  • Legal work requires generation and reasoning: drafting memos synthesizing case law, summarizing depositions while preserving nuanced arguments, generating tailored discovery requests
  • Traditional ML could categorize contracts but could not write coherent briefs explaining enforceability under specific state laws

Post-MMF State (Post-GPT-4):

  • Within 18 months of GPT-4 release, Silicon Valley legal startups raised hundreds of millions in funding
  • Thomson Reuters acquired Casetext for $650 million
  • Doctrine’s business grew substantially
  • Legal AI “minted more unicorns in 12 months than in the previous 10 years combined”

The market demand remained constant; model capability crossed the threshold.

Coding Assistants: Claude 3.5 Sonnet (June 2024)

Coding assistants demonstrate a similar pattern at a different threshold:

Pre-Sonnet: GitHub Copilot had millions of users, but the experience was “autocomplete that occasionally helps.” Bustamante reports trying Cursor early, finding it “meh,” and deleting it repeatedly.

Post-Sonnet: “Within a week, I couldn’t work without Cursor. Neither could anyone on my team. The product became the workflow.”

Cursor’s growth went vertical not due to new feature development but because Claude 3.5 Sonnet crossed the capability threshold for genuine codebase understanding and high-quality code generation.

Mathematical Proofs: MMF Absent

Mathematical proof generation represents a market where MMF remains uncrossed despite significant demand:

  • Research institutions, defense contractors, and tech companies would pay millions for genuine mathematical reasoning
  • Models can verify known proofs, assist with mechanical steps, and occasionally produce insights on bounded problems
  • Originating novel proofs on open problems remains beyond current capability

However, progress is occurring at the frontier. Bustamante cites Sébastien Bubeck’s experiment where GPT-5-Pro improved a bound in convex optimization from 1/L to 1.5/L, reasoning for 17 minutes to generate a correct proof. This suggests MMF for mathematical reasoning may be approaching.

High-Stakes Finance: MMF Absent

Financial analysis presents a stark capability gap:

Capability Challenges:

  • Excel output remains unreliable for complex financial models
  • AI struggles to combine quantitative analysis with qualitative insights from extensive documents
  • End-to-end reasoning that justifies million-dollar positions exceeds current capability

Benchmark Evidence (Vals.ai):

BenchmarkTop Model AccuracyTop Performer
LegalBench87.04%Gemini 3 Pro
Finance Agent56.55%GPT 5.1

The 30-point accuracy gap between legal and finance benchmarks quantifies the MMF disparity. Legal has crossed the production-grade threshold; finance has not.

The 80/99 Accuracy Gap

Bustamante identifies a critical distinction in accuracy requirements across market types:

Market TypeAcceptable AccuracyRationale
Unregulated~80%AI writing drafts of marketing copy creates value even with heavy editing
Regulated~99%Contract review missing 20% of clauses creates liability, not value

“The gap between 80% and 99% accuracy is often infinite in practice. It’s the difference between ‘promising demo’ and ‘production system.’”

Many AI startups occupy this gap, raising capital on demonstrations while awaiting capability that would make products actually functional.

Strategic Framework

The Timing Dilemma

Building for current versus anticipated MMF creates a strategic dilemma:

Arguments for Waiting:

  • Building around absent MMF means betting on improvements outside one’s control
  • Runway burns while model providers determine capability timelines
  • The required capability might arrive differently than anticipated, or not at all within survival horizon

Arguments for Being Early:

  • When MMF unlocks, success requires more than model capability:
    • Domain-specific data pipelines
    • Regulatory relationships built over years
    • Customer trust
    • Deep workflow integration
    • Understanding of how professionals actually work
  • Teams closest to problems shape how models get evaluated, fine-tuned, and deployed

The Dangerous Zone

Bustamante identifies the “dangerous zone” as MMF estimated at 24-36 months away:

  • Close enough to seem imminent
  • Far enough to burn through multiple funding rounds waiting

The resolution depends on market size. Healthcare and financial services markets are sufficiently massive that even Anthropic and OpenAI pursue them despite mixed current results. Expected value calculation:

expected_value = probability_of_MMF_arriving × market_size × likely_share

For trillion-dollar markets, the risk-reward calculation permits early positioning despite uncertain capability timelines.

The Agentic Threshold

Beyond raw intelligence, Bustamante identifies a second capability frontier: the ability to work autonomously over extended periods.

Current MMF Limitations

Existing MMF examples (legal document review, coding assistance) involve fundamentally short-horizon tasks. Prompt in, output out, maybe a few tool calls. Models produce useful output in seconds or minutes.

High-Value Work Requirements

The highest-value knowledge work operates differently:

RoleWork Pattern
Financial analystDays building models, stress-testing assumptions, synthesizing dozens of sources
Strategy consultantWeeks of research, interviews, and analysis producing iterative deliverables
Drug discovery researcherMonths designing and executing experimental campaigns

Agentic Capability Requirements

The agentic threshold requires:

CapabilityDefinition
PersistenceMaintaining goals and context across hours or days
RecoveryRecognizing failures, diagnosing problems, attempting alternative approaches
CoordinationBreaking complex objectives into subtasks, executing in sequence
JudgmentKnowing when to proceed versus when to stop and request guidance

Current agents handle tasks measured in minutes. Tasks measured in days represent a phase change in capability, not an incremental improvement.

This explains why finance lacks MMF despite models being “good at reading documents.” Reading a 10-K is a 30-second task. Building an investment thesis is a multi-day workflow requiring data gathering, model building, scenario testing, and coherent synthesis across the entire process.

Complementary Framework: Benchmark Saturation vs. Capability Stagnation

Bustamante’s related article “Are LLMs Plateauing? No. You Are.” (October 2025) addresses the perception that LLM progress has stalled:

The Benchmark Saturation Problem

Many users perceive LLM plateau because they test on tasks earlier models already saturated:

  • Translation: GPT-4o achieved ~100% accuracy; successors show no improvement because there is no room to improve
  • Simple math, concept explanation, email rewriting: solved problems

The analogy: “measuring a rocket’s speed with a car speedometer. Once you hit the max reading, everything looks the same.”

Intelligence Manifests at the Frontier

Raw LLM intelligence continues exploding, but at the frontier on tasks pushing absolute reasoning limits:

Evidence: GPT-5-Pro produced novel mathematical proofs, including improving a bound in convex optimization from 1/L to 1.5/L. The model “reasoned for 17 minutes to generate a correct proof for an open problem.”

This represents creation of new knowledge, not solving known problems.

The Distinction: Intelligence vs. Usefulness

The critical insight: “Intelligence without application is just a party trick. Intelligence with tool use is the revolution.”

Current frontier models outperform most humans on most intellectual tasks:

  • Legal analysis: better than most lawyers
  • Medical diagnosis: better than most doctors
  • Code review: better than most senior engineers
  • Financial modeling: better than most analysts

What remains missing is tool orchestration and persistence—the ability to work over time toward goals using external resources.

Key Findings

  • Model-Market Fit (MMF) serves as a prerequisite layer beneath product-market fit for AI startups; without it, markets cannot pull products regardless of demand intensity
  • Capability thresholds are discrete, not continuous: markets dormant for years explode within months when models cross specific thresholds (legal AI with GPT-4, coding with Claude 3.5 Sonnet)
  • The human-in-the-loop diagnostic distinguishes MMF presence (oversight as feature) from absence (compensation as crutch)
  • Benchmark evidence quantifies MMF disparities: LegalBench at 87% vs Finance Agent at 56.55% explains why legal AI thrives while finance AI struggles
  • The 80/99 accuracy gap is “infinite in practice” for regulated industries where partial accuracy creates liability rather than value
  • The agentic threshold represents a second capability frontier: sustained autonomous operation over hours/days, not just prompt-response cycles
  • Perceived LLM plateau reflects benchmark saturation, not capability stagnation; frontier performance continues advancing on genuinely difficult tasks
  • Strategic positioning must balance the risk of building for absent MMF against the advantage of domain expertise when MMF eventually arrives

References

  1. Model-Market Fit - Nicolas Bustamante - January 19, 2026
  2. Are LLMs Plateauing? No. You Are. - Nicolas Bustamante - October 22, 2025
  3. The Only Thing That Matters - Marc Andreessen - June 25, 2007
  4. Vals.ai Finance Agent Benchmark - Accessed January 2026
  5. Fintool Technology - Fintool v4 benchmark results
  6. Nicolas Bustamante - About - Author background