Thorsten Ball on Code Review in the Age of AI Agents: Testing Over Reading

Research Date: 2026-02-06 Source URL: https://x.com/thorstenball/status/2015781695664832839 Author: Thorsten Ball (@thorstenball), Software Engineer at Sourcegraph (Amp)

Reference URLs

Summary

On January 26, 2026, Thorsten Ball posted a 5:17 video recorded outdoors in a snowstorm to X (formerly Twitter). The video’s central provocation: “If I know it works, what do I care about the code?” Ball argues that developers working with AI coding agents do not need to understand every line of generated code. Instead, he advocates for a testing-first verification methodology where outcome validation replaces traditional line-by-line code review.

The argument is documented across two issues of Ball’s newsletter Register Spill. Issue #71 contains the full essay-length version of the position, detailing his testing methodology and criteria for when deeper code inspection remains warranted. Issue #72, published after the video, addresses viewer reactions, extends the argument to question whether human-readable code formatting itself is becoming less important, and positions the debate within a broader thesis about AI adoption patterns among programmers.

Ball’s position draws from his professional experience building Amp, Sourcegraph’s AI coding agent, and from years of software engineering practice including work at Zed on editor tooling. The post received significant engagement: 184.9K views, 233 likes, 27 reposts, and 121 bookmarks.

The Core Argument: Outcome Verification Over Code Inspection

The Provocation

Ball’s title question, “If I know it works, what do I care about the code?”, deliberately inverts a deeply held assumption in software engineering: that understanding the code is a prerequisite for trusting it. His position is not that code quality is irrelevant, but that the verification method should shift from reading to testing when an AI agent produces the code.

The argument rests on a practical observation: when a developer writes code personally, reading it is the natural verification path because the developer constructed it line by line. When an agent writes code, the developer never had that construction context. Reading the output from scratch is slower and less reliable than verifying behavior directly.

Ball’s Testing Methodology

As detailed in Joy & Curiosity #71, Ball follows a structured verification approach:

  1. Pre-planning: Before the agent begins, the developer must know what the resulting code should accomplish and how to verify correctness. This is not optional. Without a clear expected outcome, no amount of testing compensates for a missing specification.

  2. Comprehensive testing: Once the agent completes its work, the developer tests using every available tool. Ball specifically lists:

    • Running unit tests
    • Manual browser testing
    • Database state inspection
    • Curl commands against APIs
    • Happy path validation
    • Edge case testing (no data, existing data, real data, fake data)
  3. Context-aware depth adjustment: Testing intensity scales with the code’s blast radius. A peripheral UI tweak warrants less scrutiny than a payment processing change. Ball does not provide a formula for this calibration but frames it as professional judgment.

  4. Mental model comparison: After testing, the developer compares actual behavior against their mental model of expected functionality. Discrepancies trigger deeper investigation.

When Code Review Still Matters

Ball does not advocate abandoning code review entirely. After successful testing, he performs spot-checks focused on specific risk categories:

Review TriggerRationale
Data storage patternsWhere and how data is persisted affects durability and privacy
Architectural choicesStructural decisions may constrain future changes
Security implicationsAuthentication, authorization, and data exposure require scrutiny
Unexpected dependenciesNew libraries or services may introduce supply chain risk

His formulation: “But do I know every line of code? Not if I don’t have to.” The word “have” carries the weight. Certain categories of code require line-level understanding; most do not.

The Philosophy: What Does “Caring About Code” Mean?

Code Readability in the Age of AI Explanation

In Joy & Curiosity #72, Ball extends the argument beyond review methodology to question a foundational principle of software engineering: that code must be written for human readability. His reasoning: if a developer can ask a model to explain any piece of code “in any language you want, with jokes and puns, as a poem or as a song,” the traditional investment in making code self-documenting loses some of its justification.

This is a stronger claim than the testing-over-reading argument. It suggests that the decades of pedagogical emphasis on clean code, meaningful variable names, and self-documenting structure may have been optimizing for a constraint (human reading speed and comprehension limits) that AI tools are relaxing.

Ball does not fully commit to this position. He frames it as an observation about where the field is heading rather than a prescriptive recommendation. The implication, however, is direct: if understanding code is no longer bottlenecked by reading it, the relative value of writing readable code decreases.

The One-Way Door of AI Adoption

Ball characterizes serious AI tool adoption as “a one way door.” Once a developer experiences effective agent-based workflows with frontier models, they do not return to dismissing the technology. He identifies several reasons why some programmers have not crossed this threshold:

  • Insufficient engagement: Trying ChatGPT once with a vague prompt and concluding “AI can’t code” does not constitute serious evaluation.
  • Inferior models: Testing with local or older models distorts understanding of current capabilities and trajectory.
  • Missing feedback loops: Copy-pasting code from a chat interface differs fundamentally from agent-based workflows where the model receives compiler output, test results, and linter feedback.
  • Evaluation bias: Judging AI-generated code by human writing standards (style, naming, structure) rather than by functional correctness.

The Philosophical Trajectory

Ball’s broader arc, traced across multiple essays, moves from viewing programming as a craft defined by manual skill to viewing it as a discipline defined by judgment and direction. In the Changelog podcast (#648), he states that developers should focus on the “what” and “why” rather than the mechanical “how.” He describes his own shift away from 15 years of expert Vim usage, acknowledging that “the age of fast mechanical movement in an editor… is kind of over” when models outpace human typing speed.

This is not a rejection of craft. In “There’s Beauty in AI,” Ball describes finding new aesthetic satisfaction in understanding latent space, training dynamics, and prompt engineering as a form of precise technical control. The craft shifts rather than disappears.

AI Agent Workflow: From Writing Code to Directing Agents

Amp and the Agent Architecture

Ball works on Amp at Sourcegraph, an AI coding agent. His understanding of agent internals informs his position on code review. In the Changelog podcast, he explains agent architecture as a loop: the model calls tools, receives feedback, and iterates until a task is complete. The developer’s role in this loop is not to write code but to:

  1. Define what needs to be built
  2. Set up feedback mechanisms (tests, linters, type checkers)
  3. Verify outcomes
  4. Intervene when the agent’s approach diverges from requirements

This model treats the developer more like a technical lead directing a junior engineer than a solo implementer. The code review question maps onto this analogy: a lead does not read every line a junior writes but does verify outcomes, review architectural decisions, and check for known risk patterns.

The Feedback Loop as Quality Guarantee

Ball emphasizes that effective agent workflows depend on tight feedback loops. An agent that receives compiler errors, failing tests, and linter warnings can self-correct. An agent operating without feedback (e.g., generating code in a chat window without execution) produces notably worse results. This explains why naive evaluations of AI coding ability often underestimate frontier model capabilities: the evaluation conditions lack the feedback infrastructure that makes agents effective.

Amp’s approach, as described by Ball, gives models sufficient tokens and computational resources rather than restricting them. The philosophy: let the agent work, provide it with comprehensive feedback, and evaluate the result rather than micromanaging the process.

Collaborative Coding as Precedent

In Joy & Curiosity #72, Ball notes that some viewers of his snowstorm video “lack collaborative coding experience.” This reference implies that his position is not specific to AI. In team-based development, engineers routinely trust code written by colleagues after verifying behavior through code review (focused on risk areas) and testing. The argument extends this existing practice to AI-generated code, with the modification that testing deserves even more emphasis because the “colleague” producing the code does not have the institutional context a human teammate would.

Reception and Reactions

The video generated 184.9K views and 27 replies (behind X’s login wall). Ball’s follow-up in #72 suggests mixed reception: some viewers did not watch the full video before responding, and others lacked the collaborative coding experience that makes his analogy intuitive.

The engagement metrics (233 likes, 121 bookmarks) indicate the topic resonated with the developer community, though the specific content of replies was not accessible for this analysis.

Contextual Position Within Ball’s Body of Work

This video and its surrounding essays represent one point on a trajectory Ball has documented publicly:

PeriodPositionSource
Pre-2024Viewed LLMs as “slot machines” requiring luck-based promptingThere’s Beauty in AI
Mid-2024Began using Cursor Tab at Zed, found it faster than 15 years of VimThere’s Beauty in AI
Late 2024Documented daily AI usage across debugging, testing, and writingHow I Use AI
Early 2025Argued all serious programmers use AI tools; questioned dismissalThey All Use It
Late 2025Left Zed for Sourcegraph to work on Amp, an AI coding agentX post, December 2025
Jan 2026Argued testing replaces reading for AI-generated code verificationThis video and #71
Jan 2026Questioned whether human-readable code formatting itself mattersJoy & Curiosity #72

Each position builds on the previous one. The snowstorm video is not an isolated provocation but a logical extension of Ball’s evolving framework for AI-assisted software development.

Key Findings

  • Ball’s position separates code verification (testing behavior) from code review (reading implementation), arguing the former is more reliable and efficient for AI-generated code.
  • The testing methodology is structured and context-sensitive, not a blanket “trust the AI” position. Specific risk categories (storage, architecture, security, dependencies) still warrant code-level review.
  • The broader philosophical argument questions whether decades of emphasis on human-readable code are optimizing for a constraint that AI tools are relaxing.
  • Ball frames AI adoption as a one-way door, asserting that serious engagement with frontier models and agent feedback loops consistently converts skeptics.
  • The argument draws from professional experience building Amp (a production AI coding agent) and from collaborative software engineering practices that predate AI tools.

References

  1. Thorsten Ball’s X Post - January 26, 2026
  2. Joy & Curiosity #71 - Register Spill newsletter
  3. Joy & Curiosity #72 - Register Spill newsletter
  4. Changelog Interviews #648 - Thorsten Ball on Amp and agent workflows
  5. There’s Beauty in AI - Register Spill essay
  6. How I Use AI - Register Spill essay
  7. They All Use It - Register Spill essay
  8. Thorsten Ball’s Personal Site - Author biography and blog