Building Autonomous SWE Agents for Claude Code: Beyond Human-in-the-Loop
By
D3OXY
By
D3OXY
Most of us use AI coding tools interactively. Give it a task, watch it work, intervene when it goes off track. It’s pair programming with a robot. But what if you could define what “done” looks like and let the agent figure out the rest?
I spent a weekend building exactly that – a Senior SWE Agent System for Claude Code that handles GitHub issues from analysis to PR creation, tracks progress via issue comments, and can be picked up by another session (or another developer) if interrupted.
A note on what this is: This isn’t a “copy my prompts” post. I’m not selling a course or giving you a magic system to download. What I am giving you is the idea and the architecture. Take it, break it, rebuild it for your workflow. The specific commands and skills I share are starting points – iterate on them, throw out what doesn’t work, add what’s missing for your codebase.
Interactive AI coding has a context window problem. You start a task, make progress, then:
When you come back, you’re starting from scratch. The AI has no memory of what was done, what was planned, or where you left off.
The approach I took is heavily inspired by Ralph Wiggum – a pattern created by Geoffrey Huntley for running AI coding CLIs in loops, letting them work autonomously on task lists. The core principles:
Matt Pocock’s insight took this further: use GitHub issues as the memory store. The AI Hero article breaks down the patterns in detail.
I combined both ideas – Ralph’s autonomous loop structure with GitHub issues as the persistent memory layer.
There’s no shortage of AI coding tools: Cursor, Copilot CLI, Aider, OpenCode, Codex, and various agent frameworks like AutoGPT, CrewAI, or custom LangChain harnesses. I’ve tried most of them. Here’s why I landed on Claude Code:
A great model with a bad harness is frustrating. A mediocre model with a great harness is… still mediocre. You need both.
As of today, Claude Opus 4.5 is the best model for agentic coding. But raw capability isn’t enough – you need the tooling to channel it:
I’ve run the same prompts through GPT 5.2 (it’s good, but the speed is awful), Gemini 3 Pro, and various open-source models. Opus produces the best code – but without the harness (commands, agents, skills, progress tracking), it’s still just interactive pair programming with a context window limit.
Claude Code gives you the building blocks out of the box:
| Feature | What It Does | Why It Matters |
|---|---|---|
| Slash Commands | User-defined entry points | No framework needed |
| Sub-Agents | Isolated 200k contexts | Complex workflows without context pollution |
| Skills | Auto-discovered knowledge | Domain expertise without prompt stuffing |
| Hooks | Event-driven automation | Run scripts on tool events |
| MCP Servers | External tool integration | Connect to anything |
With other tools, you’re either building these primitives yourself or fighting the framework’s opinions. Claude Code’s primitives are minimal but composable.
Once you have Claude Code installed, extending it is trivial:
# Add a new command
mkdir -p ~/.claude/commands
echo "your prompt" > ~/.claude/commands/my-command.md
# Use /my-command immediately
# Add a new agent
echo "agent config" > ~/.claude/agents/my-agent.md
# Add a skill
mkdir -p ~/.claude/skills/my-skill
echo "skill knowledge" > ~/.claude/skills/my-skill/SKILL.md
No additional services to deploy. No vector databases. No memory backends. No orchestration layer.
Everything I built runs locally. The “database” is GitHub issues. The “memory” is git history and issue comments. The “orchestration” is the commands and agents talking to gh CLI.
I’ve shipped production features using this setup that would have taken 3-4x longer previously. The key wins:
/continue-work after lunch, after a weekend, or after my laptop crashes/adjust handles pivots without starting overThe closest alternative is probably running your own Ralph loop with any harness, but you lose the sub-agent isolation and skill auto-discovery that Claude Code provides natively.
The system has three layers:
| Layer | Purpose | Location |
|---|---|---|
| Slash Commands | User entry points | ~/.claude/commands/ |
| Sub-Agents | Specialized workers | ~/.claude/agents/ |
| Skills | Domain knowledge | ~/.claude/skills/ |
/start-work 123 → Analyze, plan, begin implementation
/continue-work 123 → Resume from last checkpoint
/adjust "reason" → Mid-work pivot, propagates changes
/check-status 123 → View hierarchical status tree
/create-pr 123 → Create PR with proper formatting
/wrap-up 123 → Finalize and close
/sync-progress 123 → Reconcile plan with reality
Here’s what /start-work does under the hood:
/start-work 123
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ ANALYST │───►│ PLANNER │───►│ TRACKER │
│ Classify │ │ Create PRD │ │ Post │
│ Scope │ │ Decompose? │ │ "STARTED" │
│ Find gaps │ │ │ │ comment │
└──────────────┘ └──────────────┘ └──────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ SMALL/ │ │ LARGE │
│ MEDIUM │ │ (decomp) │
└───────────┘ └───────────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ Create sub-issues│
│ │ gh issue create │
│ │ for each part │
│ └──────────────────┘
│ │
└────────────┬────────────┘
▼
┌──────────────────┐
│ Append PRD to │
│ issue body │
│ gh issue edit │
└──────────────────┘
│
▼
┌──────────────────┐
│ Create branch │
│ gh issue develop│
│ --checkout │
└──────────────────┘
│
▼
┌──────────────────┐
│ IMPLEMENTER │
│ Begin coding │
└──────────────────┘
Each agent has its own 200k context window and specialized tools. This isolation is crucial – the main conversation stays clean while agents do heavy lifting in their own contexts.
# ~/.claude/agents/issue-analyst.md
---
name: issue-analyst
description: Classifies issues, assesses scope, identifies gaps
model: opus # Opus 4.5 - best for agentic tasks
allowed-tools: Read, Bash(gh issue view:*), Bash(gh issue list:*), Bash(git log:*), Bash(git status)
skills: issue-analysis
---
All agents use model: opus (Claude Opus 4.5) because it handles complex multi-step reasoning without getting lost. You could use Sonnet for simpler agents like progress-tracker, but I’ve found the quality difference is worth the cost for autonomous work.
The six agents and their responsibilities:
| Agent | Purpose | Tools |
|---|---|---|
issue-analyst | Classify type, assess scope, find gaps | gh issue view, gh issue list, git log, git status |
planner | Create PRDs, decompose issues | above + gh issue create, gh issue edit, Write |
implementer | Write code following standards | Read, Write, Edit, Bash |
tester | Write and run tests | Read, Write, pnpm test, vitest |
pr-manager | Create PRs, handle reviews | gh pr create, gh pr view, git push |
progress-tracker | Monitor state, post updates | gh issue view, gh issue comment, git log |
Tool restrictions matter. The issue-analyst can only read issues and git history – no file access, no editing. The planner can create/edit issues but not write code. The pr-manager can push and create PRs but not modify files. This follows the principle of least privilege – each agent gets exactly what it needs, nothing more.
Skills are auto-discovered knowledge that agents can tap into. Unlike commands (which you invoke explicitly), Skills are automatically loaded when Claude detects they’re relevant to your request.
~/.claude/skills/
├── issue-analysis/ → Classification, scope, gaps
│ ├── SKILL.md → Main definition
│ ├── CLASSIFICATION.md → Bug vs feature vs enhancement
│ ├── SCOPE.md → Small/medium/large assessment
│ └── DECOMPOSITION.md → How to split large issues
├── implementation/ → Code standards, commits, PRs
└── progress-sync/ → State detection, reconciliation
Each skill has a SKILL.md that describes when it should be used:
# ~/.claude/skills/issue-analysis/SKILL.md
---
name: issue-analysis
description: Analyzes GitHub issues to classify type, assess scope, and identify information gaps. Use when starting work on any issue.
---
The agents reference skills in their configuration:
skills: issue-analysis, implementation
This injects the skill’s knowledge into the agent’s context when it’s spawned.
Here’s where it gets interesting. Instead of a local progress file, the PRD lives in the GitHub issue itself:
[USER'S ORIGINAL DESCRIPTION - NEVER MODIFIED]
---
<!-- SWE_PRD_START -->
## Implementation Plan
### Classification
| Type | Scope | Labels |
| ------- | ------ | ------- |
| feature | MEDIUM | feature |
### Implementation Steps
- [x] Step 1: Define interfaces
- [x] Step 2: Core implementation
- [ ] Step 3: Error handling ← current
- [ ] Step 4: Tests
### Current State
- **Checkpoint**: IN_PROGRESS
- **Branch**: `issue-123-feature`
- **Last Activity**: 2026-01-11T10:30:00Z
<!-- SWE_PRD_END -->
The magic: everything between the markers is machine-readable. The original user description stays untouched above the separator.
Progress updates go into comments:
<!-- PROGRESS -->
## Progress Update
**Checkpoint**: IN_PROGRESS
**Timestamp**: 2026-01-11T10:30:00Z
### Completed
- Created validation interfaces
- Implemented core logic
### Context for Continuity
- Using Strategy pattern for validators
- Following existing patterns in src/auth/
<!-- /PROGRESS -->
Any agent (or human) can read these comments and know exactly where things stand.
/continue-work FlowWhen you run /continue-work 123, here’s what happens:
/continue-work 123
│
▼
┌───────────────────────────────┐
│ 1. FETCH ISSUE STATE │
│ gh issue view 123 --json │
│ body, comments, state │
└───────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ 2. PARSE PRD SECTION │
│ Extract between markers: │
│ • Implementation steps │
│ • Current checkpoint │
│ • Branch name │
│ • Files to modify │
└───────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ 3. DETECT GIT STATE │
│ git branch --show-current │
│ git log main..HEAD │
│ git status --short │
└───────────────────────────────┘
│
▼
┌───────────────────────────────┐
│ 4. COMPARE PLANNED vs ACTUAL │
└───────────────────────────────┘
│
┌─────────────┴─────────────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ MATCH │ │ DIVERGE │
└───────────┘ └───────────┘
│ │
│ ▼
│ ┌───────────────────┐
│ │ Show differences │
│ │ Offer /sync │
│ └───────────────────┘
│ │
└─────────────┬─────────────┘
▼
┌───────────────────────────────┐
│ 5. RESUME FROM CHECKPOINT │
│ Display progress summary │
│ Continue implementation │
└───────────────────────────────┘
The beauty is that no local state is required. Everything lives in Git and GitHub. You can /continue-work from a different machine, a different Claude session, or even have a teammate pick up where you left off.
/adjust Command: Mid-Work PivotsThis is my favorite part. Mid-implementation, you realize the approach won’t work. Instead of starting over:
/adjust "JWT won't work with our session system. Need to use cookies instead."
The command propagates changes through the entire issue hierarchy:
/adjust "JWT won't work..."
│
▼
┌────────────────────────┐
│ CURRENT ISSUE #102 │
│ • Update PRD steps │
│ • Post ADJUSTMENT │
│ comment │
└────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌────────────────────────┐ ┌────────────────────────┐
│ PARENT ISSUE #100 │ │ SIBLING ISSUES │
│ • Update sub-issue │ │ │
│ table status │ │ #101: No impact │
│ • Update dependency │ │ #103: Now BLOCKED │
│ graph │ │ (depends on │
│ • Post notification │ │ #102) │
│ comment │ │ #104: No impact │
└────────────────────────┘ └────────────────────────┘
The adjustment comment documents everything:
<!-- ADJUSTMENT -->
## Plan Adjustment
**Reason**: JWT won't work with our session system. Need cookies.
**Timestamp**: 2026-01-11T10:30:00Z
### Impact Analysis
- Current scope: Changed auth approach
- Dependencies affected: #103 now blocked
### Propagated To
- Parent #100: Updated sub-issue status
- Sibling #103: Marked as blocked
<!-- /ADJUSTMENT -->
The whole hierarchy stays in sync.
Wrap machine-readable sections in HTML comments:
<!-- SWE_PRD_START -->
...content...
<!-- SWE_PRD_END -->
GitHub renders these invisibly to humans, but agents can parse them reliably.
When reconciling planned vs actual state, update the plan to match reality, not vice versa. If a user did manual work, respect it:
# In your sync-progress command
### Reconciliation Rule
- Work done not in plan → Add to plan
- Plan items done differently → Update plan
- User changes → Always preserve
The scope assessment matrix I use:
| Scope | Files | LOC | Sessions | Components |
|---|---|---|---|---|
| SMALL | 1-3 | <100 | 1 | 1 |
| MEDIUM | 4-10 | 100-500 | 1-2 | 2-3 |
| LARGE | 10+ | 500+ | 3+ | 4+ (decompose) |
For LARGE scope, the planner agent automatically creates sub-issues:
gh issue create \
--title "[Sub] Database schema for users" \
--body "Part of #100
## Objective
Create the database schema for user authentication.
## Dependencies
- None (this is first)
## Acceptance Criteria
- [ ] User table created
- [ ] Migration scripts work" \
--label "sub-issue,feature"
The parent issue’s PRD tracks all sub-issues in a table:
### Sub-Issues
| # | Title | Status | Blocked By | Branch | PR |
| ---- | --------- | ---------- | ---------- | ---------------------- | ---- |
| #101 | DB Schema | ✅ Done | - | `issue-100/101-schema` | #150 |
| #102 | Auth API | 🔄 Active | #101 | `issue-100/102-api` | - |
| #103 | Frontend | ⏳ Blocked | #102 | - | - |
Dependency graphs ensure work happens in the right order:
#101 ──► #102 ──► #104
└──► #103
Labels like epic (parent) and sub-issue (children) make the hierarchy visible in GitHub’s issue list.
The full decomposition structure looks like this:
┌─────────────────────────────────────────────────────────────────────┐
│ PARENT ISSUE #100 │
│ Labels: epic, feature │
│─────────────────────────────────────────────────────────────────────│
│ User Description: "Implement user authentication" │
│─────────────────────────────────────────────────────────────────────│
│ PRD: Sub-issue table, dependency graph, progress 2/4 │
└─────────────────────────────────────────────────────────────────────┘
│
├────────────────────────────────────────────────┐
│ │
▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ SUB-ISSUE #101 │ │ SUB-ISSUE #102 │ │ SUB-ISSUE #103 │
│ Labels: sub-issue │ │ Labels: sub-issue │ │ Labels: sub-issue │
│───────────────── │ │───────────────── │ │───────────────── │
│ DB Schema │ │ Auth API │ │ Frontend │
│ Status: ✅ Done │ │ Status: 🔄 Active │ │ Status: ⏳ Blocked│
│ PR: #150 merged │ │ Branch: 102-api │ │ Blocked by: #102 │
│───────────────── │ │───────────────── │ │───────────────── │
│ "Part of #100" │ │ "Part of #100" │ │ "Part of #100" │
└───────────────────┘ └───────────────────┘ └───────────────────┘
│ │ │
▼ ▼ ▼
Branch: 101-schema Branch: 102-api (not created)
PR: #150 ✓ PR: (pending)
Make it parseable:
issue-{id}-{slug} # Standalone
issue-{parent}/{child}-{slug} # Sub-issue
issue-{parent}/{a}-{b}-{slug} # Grouped sub-issues
The agent can extract issue numbers from branch names to maintain context.
Track state with checkpoints, not step numbers:
┌─────────┐ ┌────────────────┐ ┌─────────────┐ ┌───────────────────┐
│ STARTED │───►│ BRANCH_CREATED │───►│ IN_PROGRESS │───►│ CHANGES_COMMITTED │
└─────────┘ └────────────────┘ └─────────────┘ └───────────────────┘
│ │ │
│ │ ▼
│ │ ┌────────────────┐
│ │ │ PR_CREATED │
│ │ └────────────────┘
│ │ │
│ ▼ ▼
│ ┌───────────┐ ┌───────────┐
└──────────────────────────────►│ BLOCKED │ │ COMPLETED │
└───────────┘ └───────────┘
│ ▲
└─────────────────────┘
(when unblocked)
Checkpoints are resilient to plan changes. Step numbers break when you add/remove steps.
The /sync-progress command compares planned state vs actual state:
# What the agent checks
git branch --show-current # Are we on the right branch?
git log main..HEAD --oneline # What commits exist?
git diff --name-only main..HEAD # What files changed?
gh issue view 123 --json body # What does the PRD say?
When divergence is detected (you did work manually), the agent updates the plan:
| Divergence | Resolution |
|---|---|
| Extra commits not in plan | Add completed work to PRD |
| Files changed not in plan | Add to files list |
| Steps done out of order | Reorder and mark complete |
| Plan edited by user | Respect user’s changes |
The key principle: reality wins. The agent updates the PRD to match what actually happened, never the other way around.
The entire system runs on gh commands. No GitHub MCP server needed.
# Fetch issue with full context
gh issue view $ID --json title,body,labels,comments,state
# Append PRD to issue (preserving original content)
gh issue edit $ID --body "$UPDATED_BODY"
# Post progress comment
gh issue comment $ID --body "<!-- PROGRESS -->..."
# Create linked branch
gh issue develop $ID --checkout
# Create PR with issue reference
gh pr create --title "feat: ..." --body "Closes #$ID"
I see people adding GitHub MCP servers to their setup. Don’t. Modern models like Opus 4.5 are already excellent at using CLI tools. The gh CLI is well-documented, the model knows it, and it just works.
MCP adds complexity:
That last point matters more than people realize. Every MCP server you add injects its tool schemas into context. GitHub MCP alone can add thousands of tokens. Multiply that by a few MCP servers and you’re burning context on tool definitions instead of actual code.
The CLI is already there, already authenticated (you ran gh auth login once), and the model uses it reliably. Save MCP for tools that genuinely don’t have CLI equivalents.
✅ Good for:
❌ Skip it for:
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOU │
│ ┌─────────────┐ ┌───────────────┐ ┌────────┐ ┌───────────┐ ┌─────────────┐ │
│ │/start-work │ │/continue-work │ │/adjust │ │/create-pr │ │/sync-progress│ │
│ └─────────────┘ └───────────────┘ └────────┘ └───────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ MAIN COORDINATOR │
│ │
│ • Parses command arguments │
│ • Delegates to specialized agents │
│ • Orchestrates sequential workflow │
│ • Propagates changes through issue hierarchy │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌─────────────┬───────────────┼───────────────┬─────────────┐
▼ ▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ ANALYST │ │ PLANNER │ │ IMPLEMENTER │ │ TESTER │ │ TRACKER │
│─────────────│ │─────────────│ │─────────────│ │─────────────│ │─────────────│
│ Classify │ │ Create PRD │ │ Write code │ │ Write tests │ │ Detect state│
│ Scope │ │ Decompose │ │ Commit │ │ Run suites │ │ Post updates│
│ Find gaps │ │ Dependencies│ │ Follow stds │ │ Coverage │ │ Sync changes│
├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤ ├─────────────┤
│ Skills: │ │ Skills: │ │ Skills: │ │ Skills: │ │ Skills: │
│ issue- │ │ issue- │ │ implement- │ │ implement- │ │ progress- │
│ analysis │ │ analysis, │ │ ation │ │ ation │ │ sync │
│ │ │ implement- │ │ │ │ │ │ │
│ │ │ ation │ │ │ │ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │ │
└─────────────┴───────────────┼───────────────┴─────────────┘
│
▼
┌────────────────────────┐
│ GitHub CLI │
│ gh issue | gh pr │
└────────────────────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ GITHUB ISSUE │ │ GIT REPO │ │ PULL REQUEST │
│─────────────────│ │─────────────────│ │─────────────────│
│ │ │ │ │ │
│ ┌─────────────┐ │ │ Branch: │ │ Title + Body │
│ │ User Desc │ │ │ issue-123-feat │ │ Closes #123 │
│ │ (untouched) │ │ │ │ │ │
│ └─────────────┘ │ │ Commits: │ │ Reviews │
│ --- │ │ feat: add X │ │ CI Checks │
│ ┌─────────────┐ │ │ feat: add Y │ │ │
│ │ PRD Section │ │◄────────│ fix: handle Z │────────►│ Linked to │
│ │ (machine- │ │ │ │ │ Issue #123 │
│ │ readable) │ │ └─────────────────┘ │ │
│ └─────────────┘ │ └─────────────────┘
│ │
│ Comments: │
│ ┌─────────────┐ │
│ │ PROGRESS │ │
│ │ updates │ │
│ └─────────────┘ │
│ ┌─────────────┐ │
│ │ ADJUSTMENT │ │
│ │ pivots │ │
│ └─────────────┘ │
└─────────────────┘
Data Flow:
gh CLIThe full implementation lives in ~/.claude/ with:
You can build something similar by starting with just two commands:
/start-work - Creates PRD, posts initial progress/continue-work - Parses PRD, detects state, resumesThen layer in the rest as you need them.
This approach wouldn’t exist without:
The key insight from all of them: define the end state, let the agent figure out how to get there, and track progress in a way that survives context switches.
Now go build your own version. Break things. Make it better.
And honestly, A year ago, this would have been a frustrating exercise in prompt engineering. Today, it just works.
Happy Vibe Coding. 🤖