Building Autonomous SWE Agents for Claude Code: Beyond Human-in-the-Loop
Last updated on

Building Autonomous SWE Agents for Claude Code: Beyond Human-in-the-Loop

By

Profile

D3OXY

Most of us use AI coding tools interactively. Give it a task, watch it work, intervene when it goes off track. It’s pair programming with a robot. But what if you could define what “done” looks like and let the agent figure out the rest?

I spent a weekend building exactly that – a Senior SWE Agent System for Claude Code that handles GitHub issues from analysis to PR creation, tracks progress via issue comments, and can be picked up by another session (or another developer) if interrupted.

A note on what this is: This isn’t a “copy my prompts” post. I’m not selling a course or giving you a magic system to download. What I am giving you is the idea and the architecture. Take it, break it, rebuild it for your workflow. The specific commands and skills I share are starting points – iterate on them, throw out what doesn’t work, add what’s missing for your codebase.

The Problem with Interactive AI Coding

Interactive AI coding has a context window problem. You start a task, make progress, then:

When you come back, you’re starting from scratch. The AI has no memory of what was done, what was planned, or where you left off.

Inspiration: Ralph Wiggum

The approach I took is heavily inspired by Ralph Wiggum – a pattern created by Geoffrey Huntley for running AI coding CLIs in loops, letting them work autonomously on task lists. The core principles:

  1. Define the end state (PRD)
  2. Track progress in a file the agent can read/write
  3. Let the agent decide what to do next
  4. Loop until done

Matt Pocock’s insight took this further: use GitHub issues as the memory store. The AI Hero article breaks down the patterns in detail.

I combined both ideas – Ralph’s autonomous loop structure with GitHub issues as the persistent memory layer.

Why Claude Code?

There’s no shortage of AI coding tools: Cursor, Copilot CLI, Aider, OpenCode, Codex, and various agent frameworks like AutoGPT, CrewAI, or custom LangChain harnesses. I’ve tried most of them. Here’s why I landed on Claude Code:

Good Model + Right Harness

A great model with a bad harness is frustrating. A mediocre model with a great harness is… still mediocre. You need both.

As of today, Claude Opus 4.5 is the best model for agentic coding. But raw capability isn’t enough – you need the tooling to channel it:

I’ve run the same prompts through GPT 5.2 (it’s good, but the speed is awful), Gemini 3 Pro, and various open-source models. Opus produces the best code – but without the harness (commands, agents, skills, progress tracking), it’s still just interactive pair programming with a context window limit.

Built-in Primitives

Claude Code gives you the building blocks out of the box:

FeatureWhat It DoesWhy It Matters
Slash CommandsUser-defined entry pointsNo framework needed
Sub-AgentsIsolated 200k contextsComplex workflows without context pollution
SkillsAuto-discovered knowledgeDomain expertise without prompt stuffing
HooksEvent-driven automationRun scripts on tool events
MCP ServersExternal tool integrationConnect to anything

With other tools, you’re either building these primitives yourself or fighting the framework’s opinions. Claude Code’s primitives are minimal but composable.

No Extra Infrastructure

Once you have Claude Code installed, extending it is trivial:

# Add a new command
mkdir -p ~/.claude/commands
echo "your prompt" > ~/.claude/commands/my-command.md
# Use /my-command immediately

# Add a new agent
echo "agent config" > ~/.claude/agents/my-agent.md

# Add a skill
mkdir -p ~/.claude/skills/my-skill
echo "skill knowledge" > ~/.claude/skills/my-skill/SKILL.md

No additional services to deploy. No vector databases. No memory backends. No orchestration layer.

Everything I built runs locally. The “database” is GitHub issues. The “memory” is git history and issue comments. The “orchestration” is the commands and agents talking to gh CLI.

Personal Experience

I’ve shipped production features using this setup that would have taken 3-4x longer previously. The key wins:

  1. Context persistence: I can /continue-work after lunch, after a weekend, or after my laptop crashes
  2. Handoff ability: A teammate can pick up my issue with full context
  3. Audit trail: Every decision is documented in GitHub comments
  4. Interruption resilience: /adjust handles pivots without starting over

The closest alternative is probably running your own Ralph loop with any harness, but you lose the sub-agent isolation and skill auto-discovery that Claude Code provides natively.

The Architecture

The system has three layers:

LayerPurposeLocation
Slash CommandsUser entry points~/.claude/commands/
Sub-AgentsSpecialized workers~/.claude/agents/
SkillsDomain knowledge~/.claude/skills/

Commands (What You Invoke)

/start-work 123      → Analyze, plan, begin implementation
/continue-work 123   → Resume from last checkpoint
/adjust "reason"     → Mid-work pivot, propagates changes
/check-status 123    → View hierarchical status tree
/create-pr 123       → Create PR with proper formatting
/wrap-up 123         → Finalize and close
/sync-progress 123   → Reconcile plan with reality

Here’s what /start-work does under the hood:

/start-work 123


┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   ANALYST    │───►│   PLANNER    │───►│  TRACKER     │
│  Classify    │    │  Create PRD  │    │  Post        │
│  Scope       │    │  Decompose?  │    │  "STARTED"   │
│  Find gaps   │    │              │    │  comment     │
└──────────────┘    └──────────────┘    └──────────────┘

              ┌────────────┴────────────┐
              ▼                         ▼
        ┌───────────┐            ┌───────────┐
        │  SMALL/   │            │   LARGE   │
        │  MEDIUM   │            │  (decomp) │
        └───────────┘            └───────────┘
              │                         │
              │                         ▼
              │               ┌──────────────────┐
              │               │ Create sub-issues│
              │               │ gh issue create  │
              │               │ for each part    │
              │               └──────────────────┘
              │                         │
              └────────────┬────────────┘

                  ┌──────────────────┐
                  │  Append PRD to   │
                  │  issue body      │
                  │  gh issue edit   │
                  └──────────────────┘


                  ┌──────────────────┐
                  │  Create branch   │
                  │  gh issue develop│
                  │  --checkout      │
                  └──────────────────┘


                  ┌──────────────────┐
                  │  IMPLEMENTER     │
                  │  Begin coding    │
                  └──────────────────┘

Agents (Who Does the Work)

Each agent has its own 200k context window and specialized tools. This isolation is crucial – the main conversation stays clean while agents do heavy lifting in their own contexts.

# ~/.claude/agents/issue-analyst.md
---
name: issue-analyst
description: Classifies issues, assesses scope, identifies gaps
model: opus # Opus 4.5 - best for agentic tasks
allowed-tools: Read, Bash(gh issue view:*), Bash(gh issue list:*), Bash(git log:*), Bash(git status)
skills: issue-analysis
---

All agents use model: opus (Claude Opus 4.5) because it handles complex multi-step reasoning without getting lost. You could use Sonnet for simpler agents like progress-tracker, but I’ve found the quality difference is worth the cost for autonomous work.

The six agents and their responsibilities:

AgentPurposeTools
issue-analystClassify type, assess scope, find gapsgh issue view, gh issue list, git log, git status
plannerCreate PRDs, decompose issuesabove + gh issue create, gh issue edit, Write
implementerWrite code following standardsRead, Write, Edit, Bash
testerWrite and run testsRead, Write, pnpm test, vitest
pr-managerCreate PRs, handle reviewsgh pr create, gh pr view, git push
progress-trackerMonitor state, post updatesgh issue view, gh issue comment, git log

Tool restrictions matter. The issue-analyst can only read issues and git history – no file access, no editing. The planner can create/edit issues but not write code. The pr-manager can push and create PRs but not modify files. This follows the principle of least privilege – each agent gets exactly what it needs, nothing more.

Skills (What They Know)

Skills are auto-discovered knowledge that agents can tap into. Unlike commands (which you invoke explicitly), Skills are automatically loaded when Claude detects they’re relevant to your request.

~/.claude/skills/
├── issue-analysis/    → Classification, scope, gaps
│   ├── SKILL.md       → Main definition
│   ├── CLASSIFICATION.md → Bug vs feature vs enhancement
│   ├── SCOPE.md       → Small/medium/large assessment
│   └── DECOMPOSITION.md → How to split large issues
├── implementation/    → Code standards, commits, PRs
└── progress-sync/     → State detection, reconciliation

Each skill has a SKILL.md that describes when it should be used:

# ~/.claude/skills/issue-analysis/SKILL.md
---
name: issue-analysis
description: Analyzes GitHub issues to classify type, assess scope, and identify information gaps. Use when starting work on any issue.
---

The agents reference skills in their configuration:

skills: issue-analysis, implementation

This injects the skill’s knowledge into the agent’s context when it’s spawned.

The Secret Sauce: Progress in the Issue

Here’s where it gets interesting. Instead of a local progress file, the PRD lives in the GitHub issue itself:

[USER'S ORIGINAL DESCRIPTION - NEVER MODIFIED]

---

<!-- SWE_PRD_START -->

## Implementation Plan

### Classification

| Type    | Scope  | Labels  |
| ------- | ------ | ------- |
| feature | MEDIUM | feature |

### Implementation Steps

-   [x] Step 1: Define interfaces
-   [x] Step 2: Core implementation
-   [ ] Step 3: Error handling ← current
-   [ ] Step 4: Tests

### Current State

-   **Checkpoint**: IN_PROGRESS
-   **Branch**: `issue-123-feature`
-   **Last Activity**: 2026-01-11T10:30:00Z
<!-- SWE_PRD_END -->

The magic: everything between the markers is machine-readable. The original user description stays untouched above the separator.

Progress updates go into comments:

<!-- PROGRESS -->

## Progress Update

**Checkpoint**: IN_PROGRESS
**Timestamp**: 2026-01-11T10:30:00Z

### Completed

-   Created validation interfaces
-   Implemented core logic

### Context for Continuity

-   Using Strategy pattern for validators
-   Following existing patterns in src/auth/
<!-- /PROGRESS -->

Any agent (or human) can read these comments and know exactly where things stand.

The /continue-work Flow

When you run /continue-work 123, here’s what happens:

                    /continue-work 123


            ┌───────────────────────────────┐
            │  1. FETCH ISSUE STATE         │
            │  gh issue view 123 --json     │
            │  body, comments, state        │
            └───────────────────────────────┘


            ┌───────────────────────────────┐
            │  2. PARSE PRD SECTION         │
            │  Extract between markers:     │
            │  • Implementation steps       │
            │  • Current checkpoint         │
            │  • Branch name                │
            │  • Files to modify            │
            └───────────────────────────────┘


            ┌───────────────────────────────┐
            │  3. DETECT GIT STATE          │
            │  git branch --show-current    │
            │  git log main..HEAD           │
            │  git status --short           │
            └───────────────────────────────┘


            ┌───────────────────────────────┐
            │  4. COMPARE PLANNED vs ACTUAL │
            └───────────────────────────────┘

              ┌─────────────┴─────────────┐
              ▼                           ▼
        ┌───────────┐               ┌───────────┐
        │  MATCH    │               │ DIVERGE   │
        └───────────┘               └───────────┘
              │                           │
              │                           ▼
              │                 ┌───────────────────┐
              │                 │ Show differences  │
              │                 │ Offer /sync       │
              │                 └───────────────────┘
              │                           │
              └─────────────┬─────────────┘

            ┌───────────────────────────────┐
            │  5. RESUME FROM CHECKPOINT    │
            │  Display progress summary     │
            │  Continue implementation      │
            └───────────────────────────────┘

The beauty is that no local state is required. Everything lives in Git and GitHub. You can /continue-work from a different machine, a different Claude session, or even have a teammate pick up where you left off.

The /adjust Command: Mid-Work Pivots

This is my favorite part. Mid-implementation, you realize the approach won’t work. Instead of starting over:

/adjust "JWT won't work with our session system. Need to use cookies instead."

The command propagates changes through the entire issue hierarchy:

                     /adjust "JWT won't work..."


                 ┌────────────────────────┐
                 │   CURRENT ISSUE #102   │
                 │   • Update PRD steps   │
                 │   • Post ADJUSTMENT    │
                 │     comment            │
                 └────────────────────────┘

              ┌───────────────┴───────────────┐
              ▼                               ▼
┌────────────────────────┐       ┌────────────────────────┐
│   PARENT ISSUE #100    │       │  SIBLING ISSUES        │
│   • Update sub-issue   │       │                        │
│     table status       │       │  #101: No impact       │
│   • Update dependency  │       │  #103: Now BLOCKED     │
│     graph              │       │        (depends on     │
│   • Post notification  │       │         #102)          │
│     comment            │       │  #104: No impact       │
└────────────────────────┘       └────────────────────────┘

The adjustment comment documents everything:

<!-- ADJUSTMENT -->

## Plan Adjustment

**Reason**: JWT won't work with our session system. Need cookies.
**Timestamp**: 2026-01-11T10:30:00Z

### Impact Analysis

-   Current scope: Changed auth approach
-   Dependencies affected: #103 now blocked

### Propagated To

-   Parent #100: Updated sub-issue status
-   Sibling #103: Marked as blocked
<!-- /ADJUSTMENT -->

The whole hierarchy stays in sync.

Tips and Tricks

1. Use HTML Comment Markers

Wrap machine-readable sections in HTML comments:

<!-- SWE_PRD_START -->

...content...

<!-- SWE_PRD_END -->

GitHub renders these invisibly to humans, but agents can parse them reliably.

2. Reality Wins

When reconciling planned vs actual state, update the plan to match reality, not vice versa. If a user did manual work, respect it:

# In your sync-progress command

### Reconciliation Rule

-   Work done not in plan → Add to plan
-   Plan items done differently → Update plan
-   User changes → Always preserve

3. Decompose Large Issues

The scope assessment matrix I use:

ScopeFilesLOCSessionsComponents
SMALL1-3<10011
MEDIUM4-10100-5001-22-3
LARGE10+500+3+4+ (decompose)

For LARGE scope, the planner agent automatically creates sub-issues:

gh issue create \
  --title "[Sub] Database schema for users" \
  --body "Part of #100

## Objective
Create the database schema for user authentication.

## Dependencies
- None (this is first)

## Acceptance Criteria
- [ ] User table created
- [ ] Migration scripts work" \
  --label "sub-issue,feature"

The parent issue’s PRD tracks all sub-issues in a table:

### Sub-Issues

| #    | Title     | Status     | Blocked By | Branch                 | PR   |
| ---- | --------- | ---------- | ---------- | ---------------------- | ---- |
| #101 | DB Schema | ✅ Done    | -          | `issue-100/101-schema` | #150 |
| #102 | Auth API  | 🔄 Active  | #101       | `issue-100/102-api`    | -    |
| #103 | Frontend  | ⏳ Blocked | #102       | -                      | -    |

Dependency graphs ensure work happens in the right order:

#101 ──► #102 ──► #104
              └──► #103

Labels like epic (parent) and sub-issue (children) make the hierarchy visible in GitHub’s issue list.

The full decomposition structure looks like this:

┌─────────────────────────────────────────────────────────────────────┐
│                    PARENT ISSUE #100                                │
│                    Labels: epic, feature                            │
│─────────────────────────────────────────────────────────────────────│
│  User Description: "Implement user authentication"                  │
│─────────────────────────────────────────────────────────────────────│
│  PRD: Sub-issue table, dependency graph, progress 2/4               │
└─────────────────────────────────────────────────────────────────────┘

          ├────────────────────────────────────────────────┐
          │                                                │
          ▼                                                ▼
┌───────────────────┐  ┌───────────────────┐  ┌───────────────────┐
│ SUB-ISSUE #101    │  │ SUB-ISSUE #102    │  │ SUB-ISSUE #103    │
│ Labels: sub-issue │  │ Labels: sub-issue │  │ Labels: sub-issue │
│─────────────────  │  │─────────────────  │  │─────────────────  │
│ DB Schema         │  │ Auth API          │  │ Frontend          │
│ Status: ✅ Done   │  │ Status: 🔄 Active │  │ Status: ⏳ Blocked│
│ PR: #150 merged   │  │ Branch: 102-api   │  │ Blocked by: #102  │
│─────────────────  │  │─────────────────  │  │─────────────────  │
│ "Part of #100"    │  │ "Part of #100"    │  │ "Part of #100"    │
└───────────────────┘  └───────────────────┘  └───────────────────┘
          │                     │                      │
          ▼                     ▼                      ▼
    Branch: 101-schema    Branch: 102-api        (not created)
    PR: #150 ✓            PR: (pending)

4. Branch Naming Convention

Make it parseable:

issue-{id}-{slug}           # Standalone
issue-{parent}/{child}-{slug}  # Sub-issue
issue-{parent}/{a}-{b}-{slug}  # Grouped sub-issues

The agent can extract issue numbers from branch names to maintain context.

5. Checkpoints, Not Steps

Track state with checkpoints, not step numbers:

┌─────────┐    ┌────────────────┐    ┌─────────────┐    ┌───────────────────┐
│ STARTED │───►│ BRANCH_CREATED │───►│ IN_PROGRESS │───►│ CHANGES_COMMITTED │
└─────────┘    └────────────────┘    └─────────────┘    └───────────────────┘
     │                                      │                     │
     │                                      │                     ▼
     │                                      │           ┌────────────────┐
     │                                      │           │   PR_CREATED   │
     │                                      │           └────────────────┘
     │                                      │                     │
     │                                      ▼                     ▼
     │                               ┌───────────┐         ┌───────────┐
     └──────────────────────────────►│  BLOCKED  │         │ COMPLETED │
                                     └───────────┘         └───────────┘
                                           │                     ▲
                                           └─────────────────────┘
                                              (when unblocked)

Checkpoints are resilient to plan changes. Step numbers break when you add/remove steps.

6. Handle Manual Interventions Gracefully

The /sync-progress command compares planned state vs actual state:

# What the agent checks
git branch --show-current          # Are we on the right branch?
git log main..HEAD --oneline       # What commits exist?
git diff --name-only main..HEAD    # What files changed?
gh issue view 123 --json body      # What does the PRD say?

When divergence is detected (you did work manually), the agent updates the plan:

DivergenceResolution
Extra commits not in planAdd completed work to PRD
Files changed not in planAdd to files list
Steps done out of orderReorder and mark complete
Plan edited by userRespect user’s changes

The key principle: reality wins. The agent updates the PRD to match what actually happened, never the other way around.

7. Use GitHub CLI, Skip the MCP

The entire system runs on gh commands. No GitHub MCP server needed.

# Fetch issue with full context
gh issue view $ID --json title,body,labels,comments,state

# Append PRD to issue (preserving original content)
gh issue edit $ID --body "$UPDATED_BODY"

# Post progress comment
gh issue comment $ID --body "<!-- PROGRESS -->..."

# Create linked branch
gh issue develop $ID --checkout

# Create PR with issue reference
gh pr create --title "feat: ..." --body "Closes #$ID"

I see people adding GitHub MCP servers to their setup. Don’t. Modern models like Opus 4.5 are already excellent at using CLI tools. The gh CLI is well-documented, the model knows it, and it just works.

MCP adds complexity:

That last point matters more than people realize. Every MCP server you add injects its tool schemas into context. GitHub MCP alone can add thousands of tokens. Multiply that by a few MCP servers and you’re burning context on tool definitions instead of actual code.

The CLI is already there, already authenticated (you ran gh auth login once), and the model uses it reliably. Save MCP for tools that genuinely don’t have CLI equivalents.

When to Use This Approach

Good for:

Skip it for:

The Full Picture

┌─────────────────────────────────────────────────────────────────────────────┐
│                                    YOU                                      │
│  ┌─────────────┐ ┌───────────────┐ ┌────────┐ ┌───────────┐ ┌─────────────┐ │
│  │/start-work  │ │/continue-work │ │/adjust │ │/create-pr │ │/sync-progress│ │
│  └─────────────┘ └───────────────┘ └────────┘ └───────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│                            MAIN COORDINATOR                                 │
│                                                                             │
│   • Parses command arguments                                                │
│   • Delegates to specialized agents                                         │
│   • Orchestrates sequential workflow                                        │
│   • Propagates changes through issue hierarchy                              │
└─────────────────────────────────────────────────────────────────────────────┘

         ┌─────────────┬───────────────┼───────────────┬─────────────┐
         ▼             ▼               ▼               ▼             ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│   ANALYST   │ │   PLANNER   │ │ IMPLEMENTER │ │   TESTER    │ │  TRACKER    │
│─────────────│ │─────────────│ │─────────────│ │─────────────│ │─────────────│
│ Classify    │ │ Create PRD  │ │ Write code  │ │ Write tests │ │ Detect state│
│ Scope       │ │ Decompose   │ │ Commit      │ │ Run suites  │ │ Post updates│
│ Find gaps   │ │ Dependencies│ │ Follow stds │ │ Coverage    │ │ Sync changes│
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
│ Skills:     │ │ Skills:     │ │ Skills:     │ │ Skills:     │ │ Skills:     │
│ issue-      │ │ issue-      │ │ implement-  │ │ implement-  │ │ progress-   │
│ analysis    │ │ analysis,   │ │ ation       │ │ ation       │ │ sync        │
│             │ │ implement-  │ │             │ │             │ │             │
│             │ │ ation       │ │             │ │             │ │             │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
         │             │               │               │             │
         └─────────────┴───────────────┼───────────────┴─────────────┘


                          ┌────────────────────────┐
                          │      GitHub CLI        │
                          │  gh issue | gh pr      │
                          └────────────────────────┘

         ┌─────────────────────────────┼─────────────────────────────┐
         ▼                             ▼                             ▼
┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│  GITHUB ISSUE   │         │   GIT REPO      │         │  PULL REQUEST   │
│─────────────────│         │─────────────────│         │─────────────────│
│                 │         │                 │         │                 │
│ ┌─────────────┐ │         │ Branch:         │         │ Title + Body    │
│ │ User Desc   │ │         │ issue-123-feat  │         │ Closes #123     │
│ │ (untouched) │ │         │                 │         │                 │
│ └─────────────┘ │         │ Commits:        │         │ Reviews         │
│       ---       │         │ feat: add X     │         │ CI Checks       │
│ ┌─────────────┐ │         │ feat: add Y     │         │                 │
│ │ PRD Section │ │◄────────│ fix: handle Z   │────────►│ Linked to       │
│ │ (machine-   │ │         │                 │         │ Issue #123      │
│ │  readable)  │ │         └─────────────────┘         │                 │
│ └─────────────┘ │                                     └─────────────────┘
│                 │
│ Comments:       │
│ ┌─────────────┐ │
│ │ PROGRESS    │ │
│ │ updates     │ │
│ └─────────────┘ │
│ ┌─────────────┐ │
│ │ ADJUSTMENT  │ │
│ │ pivots      │ │
│ └─────────────┘ │
└─────────────────┘

Data Flow:

  1. You invoke a command → Coordinator delegates to agents
  2. Agents use Skills for domain knowledge
  3. All GitHub operations go through gh CLI
  4. State persists in issue (PRD + comments) and git (branches + commits)
  5. Any session can resume by reading the issue state

Get Started

The full implementation lives in ~/.claude/ with:

You can build something similar by starting with just two commands:

  1. /start-work - Creates PRD, posts initial progress
  2. /continue-work - Parses PRD, detects state, resumes

Then layer in the rest as you need them.

Credits

This approach wouldn’t exist without:

The key insight from all of them: define the end state, let the agent figure out how to get there, and track progress in a way that survives context switches.

Now go build your own version. Break things. Make it better.

And honestly, A year ago, this would have been a frustrating exercise in prompt engineering. Today, it just works.

Happy Vibe Coding. 🤖