Designing for Agents: Rules Make (or Break) Code Quality

Sep 12, 2025

Great agent output isn’t luck—it’s the product of programmable context. Treat design docs, goal intent, architecture, API contracts, user flows, and rules not as paperwork but as infrastructure your agents consume. The best teams don’t prompt harder; they architect the context.

Teams that front-load goal docs, lightweight architecture sketches, explicit API contracts, concrete user flows, and house rules see measurably fewer iterations and cleaner first-pass code. 

Below is a plug-and-play section you can drop into your article, plus tool-specific examples for Claude Code, OpenAI Codex, and Cursor.

Core Patterns (what the community says actually works)

1) Context Architecture (the Context Pyramid)

Successful teams layer context so the agent always has the right information at the right granularity.

  • Strategic ContextProject vision, success metrics, non-goals

  • System ContextArchitecture, decisions, tech stack, API/database contracts

  • Module ContextComponent purpose, dependencies, test coverage, change history

  • Task Contextworkplan, acceptance criteria, edge cases, constraints

  • Code Contextsimilar examples, recent changes, .cursorrules / repo rules

Minimal artifact set that maps to the pyramid

project-root/

├─ README.md                          # Vision, problem, success criteria, non-goals

├─ docs/

│  ├─ architecture.md                 # Components, Data flows, ADRs summary

│  ├─ decisions.md                    # ADR details (what/why/when)

│  ├─ api/

│  │  └─ openapi.yaml                 # API contracts; single source of truth

│  └─ db/

│     └─ schema.sql                   # DB schema / migrations reference

├─ src/

│  ├─ <module>/

│  │  ├─ AI_SUMMARY.md               # Module context (purpose, deps, invariants)

│  │  └─ examples/                   # “Good style” code patterns for the agent

│  └─ ...

├─ .cursorrules                       # Explicit rules & constraints (see below)

├─ CONTRIBUTING.md                    # Code style, tests, CI, commit/tag rules

└─ TASKS/

   └─ <YYYY-MM-DD>-<task-name>.md     # Workplans w/ acceptance criteria

Why this works: agents don’t need your whole codebase; they need curated context. This layout lets you hand each task a focused slice.

2) Task Decomposition (the Goldilocks rule)

Make tasks atomic & verifiable (15–30 minutes of work).

Good

  • “Add email validation to the signup form”

  • “Create Button component with hover + disabled states and tests”

Bad

  • “Build authentication system”

  • “Fix all bugs”

Each task file in TASKS/ should include: objective, constraints, steps, acceptance criteria, test plan, out-of-scope.

3) Progressive Enhancement (ship a skeleton, then strengthen)

Stage work in predictable layers:

  1. Foundation: files, types, minimal UI

  2. Core: happy path, basic tests, API wiring

  3. Robustness: validation, loading/error states, edge cases

  4. Optimization: perf, caching, lazy loading, bundle size

  5. Refinement: docs, a11y, polish, animations

This preserves a working state and gives agents immediate feedback.

4) Explicit Constraints (tell the agent what not to do)

Put constraints right next to the task:

## Constraints

- Do NOT modify existing tests

- Do NOT change the DB schema

- ONLY touch files in `src/auth/*` and `tests/auth/*`

- Preserve all public API signatures

This single section eliminates most “helpful refactors.”

Templates you can copy-paste

A. Project Intent (Strategic Context)

# Project Intent: Personal Weather Dashboard

## Vision

A clean, minimalist dashboard showing weather for multiple cities I care about.

## Problem

I check 3+ cities across different apps daily; I want one unified view.

## Success Criteria

- [ ] Shows current weather for 3+ cities

- [ ] Auto-updates every 30 minutes

- [ ] Desktop + mobile responsive

- [ ] TTI < 2s

## Constraints

- Technical: free weather APIs only

- Time: one weekend

- Resources: solo dev, modern web stack

## Non-Goals

- No accounts/personalization

- No >5-day forecasts

- No native apps

B. User Flow (System Context for UX tasks)

Bullet points are perfect—agents don’t need BPMN:

Homepage → Search/City List → City Weather → Add to Dashboard → Dashboard

Checkout flow:

- Add to cart

- Apply coupon (optional)

- Enter billing + address

- Fraud check (Team A)

- Confirmation

C. API Contract (Single Source of Truth)

Keep it in docs/api/openapi.yaml and reference it in prompts.

openapi: 3.0.3
info: { title: Weather API, version: 1.0.0 }
paths:
  /weather:
    get:
      parameters:
        - in: query
          name: city
          required: true
          schema: { type: string }
      responses:
        '200':
          content:
            application/json:
              schema:
                type: object
                required: [city, tempC, updatedAt]
                properties:
                  city: { type: string }
                  tempC: { type: number }
                  updatedAt: { type: string, format: date-time }
        '404':
          description: City not found

D. Task Workplan (Task Context)

# TASK: Add email validation to signup

## Objective

Reject invalid emails client-side and server-side, with helpful messaging.

## Constraints

- Do NOT modify DB schema

- ONLY edit `src/auth/signup/*` and `tests/auth/*`

## Steps

1) Add client-side `validateEmail(email)` w/ RFC-lite regex + tests

2) Show inline error state + aria-describedby

3) Add server-side validation in `POST /signup`

4) Tests: unit (client), integration (API), a11y snapshot

## Acceptance Criteria

- Invalid emails blocked client & server

- Error message meets a11y (role=alert)

- Tests pass: `pnpm test:auth`

Anti-patterns (avoid these traps)

  • Magic Prompt Fallacy: one mega-prompt won’t build your app → iterate with checkpoints

  • Context Overload: don’t paste the repo → curate module + task context

  • Yes-Man Trap: never accept changes without tests & review

  • Scope Creep Enablement: constraints stop “I also refactored X…”

Metrics (how you’ll know it’s working)

Goal: Make AI work observable and actionable. Track a few metrics, wire them to automatic checks, and define what happens when they drift.

Velocity

  • Iterations per task: % of tasks merged in ≤3 review cycles. Target: ≥80%.

  • Context prep time: % of task time spent on intent/flows/contracts/module summaries. Target: 10–20%.

Quality

  • Tests included: % of AI PRs that add/modify tests and keep coverage ≥ baseline. Target: ~100%, non-decreasing coverage.

  • Major revision rate: AI PRs needing substantial rework (>30% lines changed after first review or out-of-scope files touched). Target: <20%.

Process

  • Bug intro rate: Bugs per 1k LOC within 14 days of merge, AI vs human. Target: AI ≤ manual baseline.

  • Docs:Code ratio: LOC in docs/context vs LOC in code for each PR. Target: ≈1:4 (team-tunable band).

How to Track (minimal)

  • One PR per task, title TASK:<id>.

  • Label PRs source:AI vs source:human.

  • CI gates:

    • Block if code changed and no tests changed.

    • Coverage must not drop.

    • Post Doc:Code ratio on PR.

  • Count review cycles from CHANGES_REQUESTED events.

  • Attribute bugs to PRs; compute bugs/kloc by source label.

What To Do When Metrics Move

Signal

Likely Cause

First Action

Iterations >3

Tasks too big/fuzzy

Split into 15–30 min atoms; add explicit constraints

Major revisions >20%

Weak module context

Refresh AI_SUMMARY.md; add invariants/examples

Missing tests / coverage ↓

Process gap

Block merge; have agent generate tests first

AI bugs/kloc > human

Context gaps / big diffs

Require repro test first; add static analysis gates

Docs:Code << target

Context starvation

Add acceptance criteria, flows, and constraints

Docs:Code >> target

Doc bloat

Consolidate into OpenAPI/ADRs; remove stale docs

Tiny PR Template (keeps you honest)

## Task

TASK:<id> — <title>

## Acceptance

- [ ] …

## Evidence

- Tests: <files/test names>

- Coverage: <auto from CI>

- Scope paths touched: <auto>

- Docs updated: [ ] AI_SUMMARY.md [ ] OpenAPI [ ] ADR

Cadence: Review dashboard weekly (iterations, tests/coverage, bugs/kloc, Doc:Code). In the monthly retro, convert top outliers into rule/template updates.

Tool-specific implementations

A) Claude Code (Anthropic)

Claude Code natively reads CLAUDE.md files and can be shaped via allowed tools, slash commands, and headless mode. Use these to encode your design/goal docs and rules. Anthropic

Place these files:

  • CLAUDE.md (repo root): paste “GOAL / ARCH / RULES” summaries, common commands, test scripts, code style, gotchas; Claude will automatically load them. Anthropic

  • .claude/commands/*.md: reusable, parameterized runbooks (e.g., “fix issue #…”, “apply API client migration”). Anthropic

  • .mcp.json: pre-wire external tools (Puppeteer, Sentry, internal APIs) so every engineer—and Claude—has the same capabilities. Anthropic

Example: CLAUDE.md (excerpt)

# Code style

- TS strict; no any in app/

- Prefer adapter seams & feature flags for migrations

# Workflow

- Always run: npm run typecheck && npm run test:unit

- For session work: use FEATURE_SESSION_V2 flag; adapter ISessionClient

# Commands (use slash menu)

- /project:fix-github-issue <id>

- /project:migrate-session-v2

Permissions & safety: pre-allow Edit, git commit, and safe bash ranges via /permissions or settings; keep destructive tools on “ask”. Anthropic

Automate checks: run headless mode in CI to lint PR descriptions or triage issues with the same rules you coded in CLAUDE.md. Anthropic

Why this fits: Anthropic’s best-practices explicitly recommend CLAUDE.md for style/tests/workflow and curating allowed tools; teams also use custom commands to standardize multi-step tasks. Anthropic

B) OpenAI Codex (2025)

The latest Codex ships as a CLI + IDE extension (VS Code, Cursor, etc.) and can run locally or in an OpenAI cloud sandbox with approval modes. It consumes IDE context, so short prompts + strong repo docs work best. OpenAI+1

Wire in your docs/rules:

  • Keep GOAL.md, ARCH.md, FLOWS.md, and openapi.yaml at repo root; open them in the IDE so Codex ingests them as context.

  • Start in Read-only/Chat, then escalate to Agent/Full Access only after Codex restates the plan and fast-fail checks. Habr

  • Prefer sandbox runs for long tasks (test suites, migrations); promote PRs from the sandbox. Habr

Kickoff prompt (IDE chat):

Read GOAL.md, ARCH.md, and API/openapi.yaml.

Restate assumptions and propose a step-by-step plan (no edits yet).

List fast-fail checks (commands + expected signals). Wait for approval.

Approval prompt (after review):

Proceed in Agent mode, local environment.

Scope: ApiClient.ts, SessionService.ts only.

Outputs: minimal diff, updated tests, and a validation script.

Why this fits: Codex’s IDE extension uses open files and selections as implicit context; the new releases emphasize agentic workflows and multi-hour tasks—plans/approvals keep it aligned and safe. (Benchmarks and product notes also highlight agentic coding and IDE integration.) TechRadar+1

Prompt-engineering note: OpenAI’s guidance stresses concrete instructions, up-to-date models, and structured prompts—mirrored by the plan→validate pattern above. OpenAI Help Center

C) Cursor (IDE)

Cursor supports persistent Project Rules via .cursor/rules/*.mdc or the Rules UI. Treat these as your “policy as code” file set—encoding style, patterns, and step-by-step runbooks the agent and inline edits must follow. Cursor+1

Create rule files:
./cursor/rules/session-v2.mdc

---

description: Migrate to Session API v2 via adapter+flag

globs: ["app/**", "lib/**"]

alwaysApply: true

---

# Constraints

- Use ISessionClient adapter; do not touch UI text

- Guard with FEATURE_SESSION_V2; default false

- Validate via "pnpm test e2e:login_flow"

# Steps

1) Update ApiClient.ts to call /api/session/v2 (see API/openapi.yaml)

2) Implement ISessionClient.getMe(), .login()

3) Add contract tests using example payloads

4) Run fast-fail commands; paste logs in PR

Operationalize knowledge:

  • Use the Rules panel to attach or toggle rules per workspace; reference domain docs via @DocName for retrieval-augmented edits. Cursor - Community Forum+1

  • Keep a top-level core rule (“no any in app/”, “Zod at boundaries”, “Given–When–Then tests”). Community guidance shows .cursorrules/Rules reduces repetitive corrections. Medium+1

  • For Composer/Agent workflows, several teams run a PRD-driven loop (PRD in repo, rules force one story at a time, .ai/ folder for history). Cursor - Community Forum

Why this fits: Cursor’s docs and community threads describe Rules as system-level, persistent instructions that the Agent and Inline Edit follow—exactly the place to encode your guidelines and flows. Cursor+1

Validation scaffolding the agent can run

Add a machine-runnable checklist so agents can prove alignment:

# validate.sh

set -euo pipefail

pnpm install --frozen-lockfile

pnpm typecheck

pnpm test:unit

pnpm test:e2e --filter=login_flow

pnpm lint

echo "OK: typecheck, unit, e2e(login_flow), lint all green"

  • Claude: call from a custom slash command (/project:validate) or in headless CI. Anthropic

  • Codex: run in sandbox or local Agent task; attach logs to PR. Habr

  • Cursor: reference from a Rule or have the Agent run it post-edit. Cursor

Pitfalls this structure avoids

  • Spec ambiguity → rework: API schema & examples prevent hallucinated fields and “works on mock” regressions. codecentric AG+1

  • Style drift: Repo-level rules (CLAUDE.md, Cursor Rules) keep edits consistent across files and sessions. Anthropic+1

  • Unsafe automation: Tool allowlists (Claude) and approval modes (Codex) gate risky actions until the plan is reviewed. Anthropic+1