Agent-Native Architectures: Building Apps After Code Ends

Research Date: 2026-01-20
Publication Date: 2026-01-09
Source URL: https://every.to/guides/agent-native
Authors: Dan Shipper (Every) and Claude (Anthropic)

Reference URLs

Summary

This technical guide presents a framework for building “agent-native” software—applications where AI agents operate as first-class citizens rather than afterthought integrations. Co-authored by Dan Shipper and Claude, the document synthesizes principles from production applications (Reader, Anecdote) built at Every, combined with architectural patterns that emerged through collaborative development.

The core thesis posits that Claude Code demonstrated a fundamental insight: a capable coding agent is effectively a general-purpose agent. The same architecture enabling codebase refactoring can organize files, manage reading lists, or automate workflows. The Claude Code SDK makes this pattern accessible, allowing developers to build applications where features are outcomes described in prompts rather than logic written in code.

The guide distinguishes between tested patterns (marked as proven) and speculative contributions from Claude (marked as “needs validation”), providing transparency about the maturity level of different recommendations.

The Five Pillars of Agent-Native Design

flowchart TD subgraph Pillars["Five Core Principles"] PillarParity[Parity] PillarGranularity[Granularity] PillarComposability[Composability] PillarEmergent[Emergent Capability] PillarImprovement[Improvement Over Time] end PillarParity --> PillarGranularity PillarGranularity --> PillarComposability PillarComposability --> PillarEmergent PillarEmergent --> PillarImprovement PillarParity -.- ParityDesc["Agent achieves anything UI can"] PillarGranularity -.- GranularityDesc["Atomic tools, not bundled logic"] PillarComposability -.- ComposabilityDesc["New features via new prompts"] PillarEmergent -.- EmergentDesc["Unplanned capabilities emerge"] PillarImprovement -.- ImprovementDesc["Context accumulation + prompt refinement"]

Parity

The foundational principle: agents must achieve any outcome users can accomplish through the UI. Without parity, agents encounter dead ends when users request legitimate actions.

Test: Pick any UI action. Can the agent accomplish it?

Implementation discipline: When adding any UI capability, verify the agent can achieve the same outcome through available tools or tool combinations.

Granularity

Tools should be atomic primitives. Features are outcomes achieved by agents operating in loops with judgment—not choreographed sequences executed by code.

Test: To change behavior, do you edit prompts or refactor code?

Approach	Characteristics
Less granular	`classify_and_organize_files(files)` bundles judgment into tool; limits flexibility
More granular	`read_file`, `write_file`, `move_file`, `bash` primitives; agent decides; prompt describes outcome

Composability

With atomic tools and parity established, new features emerge from new prompts without code changes. This applies to both developers shipping features and users customizing behavior.

Example prompt-based feature:

“Review files modified this week. Summarize key changes. Based on incomplete items and approaching deadlines, suggest three priorities for next week.”

The agent composes list_files, read_file, and judgment to achieve the outcome.

Emergent Capability

Agents accomplish tasks not explicitly designed for. This creates a flywheel:

Build with atomic tools and parity
Users request unanticipated capabilities
Agent composes tools to accomplish them (or fails, revealing gaps)
Observe patterns in requests
Add domain tools or prompts for common patterns
Repeat

Test: Can the agent handle open-ended requests within your domain?

This reveals latent demand—instead of guessing features, developers observe what users actually request and formalize patterns that emerge.

Improvement Over Time

Agent-native applications improve without shipping code through:

Accumulated context: State persists across sessions via context files
Developer-level refinement: Ship updated prompts for all users
User-level customization: Users modify prompts for their workflows
Self-modification (advanced): Agents edit own prompts or code with safety rails

Files as the Universal Interface

Agents demonstrate native fluency with filesystem operations. Claude Code succeeds because bash + filesystem represents the most battle-tested agent interface.

flowchart LR subgraph Advantages["File-Based Architecture Benefits"] BenefitKnown[Already Known] BenefitInspect[Inspectable] BenefitPortable[Portable] BenefitSync[Syncs Across Devices] BenefitSelfDoc[Self-Documenting] end BenefitKnown --> KnownDetail["Agents know cat, grep, mv, mkdir"] BenefitInspect --> InspectDetail["Users see, edit, move, delete agent work"] BenefitPortable --> PortableDetail["Export and backup are trivial"] BenefitSync --> SyncDetail["iCloud sync without servers"] BenefitSelfDoc --> SelfDocDetail["Path structure is human-readable"]

Design principle: If a human can look at the file structure and understand what’s happening, an agent probably can too.

The context.md Pattern

A file providing portable working memory without code changes:

# Context

## Who I Am
Reading assistant for the Every app.

## What I Know About This User
- Interested in military history and Russian literature
- Prefers concise analysis
- Currently reading *War and Peace*

## What Exists
- 12 notes in /notes
- Three active projects
- User preferences at /preferences.md

## Recent Activity
- User created "Project kickoff" (two hours ago)
- Analyzed passage about Austerlitz (yesterday)

## My Guidelines
- Don't spoil books they're reading
- Use their interests to personalize insights

## Current State
- No pending tasks
- Last sync: 10 minutes ago

The agent reads this file at session start and updates it as state changes.

Files vs. Database

Use Files For	Use Database For
Content users should read/edit	High-volume structured data
Configuration benefiting from version control	Data requiring complex queries
Agent-generated content	Ephemeral state (sessions, caches)
Anything benefiting from transparency	Data with relationships
Large text content	Data requiring indexing

Principle: Files for legibility, databases for structure. When uncertain, prefer files—they provide transparency and user inspection.

From Primitives to Domain Tools

Start with pure primitives (bash, file operations, basic storage) to prove architecture and reveal actual agent needs. Add domain-specific tools deliberately as patterns emerge.

Reasons to add domain tools:

Vocabulary: A create_note tool teaches the agent what “note” means in your system
Guardrails: Some operations need validation beyond agent judgment
Efficiency: Common operations can be bundled for speed and cost

Rule for domain tools: They represent one conceptual action from the user’s perspective. Include mechanical validation, but judgment about what/whether to act belongs in prompts.

Critical: Keep primitives available. Domain tools are shortcuts, not gates. Unless specific security or data integrity concerns exist, agents should access underlying primitives for edge cases.

Agent Execution Patterns

Completion Signals

Agents need explicit completion mechanisms—not heuristic detection:

.success("Result")   // continue loop
.error("Message")    // continue (retry possible)
.complete("Done")    // stop loop

Completion is separate from success/failure. A tool can succeed and stop, or fail and signal continue for recovery.

Model Tier Selection

Match model capability to task complexity:

Task Type	Tier	Reasoning
Research agent	Balanced	Tool loops, good reasoning
Chat	Balanced	Fast enough for conversation
Complex synthesis	Powerful	Multi-source analysis
Quick classification	Fast	High volume, simple task

Partial Completion

For multi-step tasks, track progress at task level with states: pending, in_progress, completed, failed, skipped.

Context Limits

Design for bounded context from the start:

Tools support iterative refinement (summary → detail → full)
Provide mid-session consolidation (“summarize learnings and continue”)
Assume context will eventually fill

Mobile-Specific Patterns

Mobile presents unique constraints: agents are long-running while iOS apps are not. Apps may background after seconds and terminate for memory reclamation.

Checkpoint and Resume

What to checkpoint: Agent type, messages, iteration count, task list, custom state, timestamp

When to checkpoint: On app backgrounding, after each tool result, periodically during long operations

Resume flow: Load interrupted sessions → Filter by validity (one-hour default) → Show resume prompt → Restore messages and continue

iOS Storage Architecture

iCloud-first with local fallback:

1. iCloud Container (preferred)
   iCloud.com.{bundleId}/Documents/
   ├── Library/
   ├── Research/books/
   ├── Chats/
   └── Profile/

2. Local Documents (fallback)
   ~/Documents/

3. Migration layer
   Auto-migrate local → iCloud

Background Execution

iOS provides approximately 30 seconds of background time. Use it to:

Complete current tool call if possible
Checkpoint session state
Transition gracefully to backgrounded state

For truly long-running agents, consider server-side orchestration with mobile as viewer and input mechanism.

Anti-Patterns

Architectural Anti-Patterns

Pattern	Problem
Agent as router	Agent routes to functions rather than acting with judgment
Build app, then add agent	Agent limited to existing features; no emergent capability
Request/response thinking	Misses the loop; agents pursue outcomes through iterations
Defensive tool design	Over-constrained inputs prevent unanticipated capabilities
Happy path in code	Code handles edge cases; agent becomes mere caller

Specific Anti-Patterns

Workflow-shaped tools: analyze_and_organize bundles judgment; break into primitives
Orphan UI actions: User can do something agent cannot achieve
Context starvation: Agent lacks awareness of available resources
Gates without reason: Domain tools restrict access unintentionally
Heuristic completion detection: Detecting completion through iteration counts or output checks

Success Criteria

Architecture Checklist

Agent achieves anything users achieve through UI (parity)
Tools are atomic primitives; domain tools are shortcuts (granularity)
New features via new prompts (composability)
Agent accomplishes unplanned tasks (emergent capability)
Behavior changes through prompt edits, not code refactoring

Implementation Checklist

System prompt includes available resources and capabilities
Agent and user share the same data space
Agent actions reflect immediately in UI
Every entity has full CRUD capability
External APIs use dynamic capability discovery where appropriate
Agents explicitly signal completion

The Ultimate Test

Describe an outcome within the application’s domain that no specific feature was built for. Can the agent figure out how to accomplish it, operating in a loop until success?

If yes: The application is agent-native
If no: The architecture is too constrained

Key Findings

Agent-native architecture treats features as prompt-described outcomes rather than coded logic
The five pillars (parity, granularity, composability, emergent capability, improvement) form an interdependent system
Files provide the most robust agent interface due to existing LLM fluency with filesystem operations
Mobile requires explicit checkpoint/resume patterns due to iOS backgrounding constraints
Latent demand discovery—observing what users ask agents to do—replaces speculative feature development
Domain tools should be shortcuts enabling efficiency, not gates restricting capability

References

Agent-native Architectures Guide - Accessed 2026-01-20
Dan Shipper Twitter Announcement - 2026-01-09
Compound Engineering Plugin - GitHub Repository