Ship PR Until Green: Anatomy of a Loop
Dissect Ship PR Until Green field by field — goal, exit condition, max-iterations, check command, and the kickoff prompt that drives Claude Code to a passing PR.
The most useful agent loop I run isn't clever. It does one boring thing extremely well: it opens a pull request and keeps working until every CI check passes. No watching the Actions tab. No "is it green yet?" Slack pings. The agent implements, pushes, reads the failing logs, fixes, pushes again, and stops the moment gh pr checks reports all green — or hits a hard cap of ten iterations and reports back.
This is "Ship PR Until Green," the flagship loop in the Loops channel. It looks like magic the first time you run it, but there's nothing magical underneath. It's four fields wired together: a goal, an exit condition, a max-iterations cap, and a deterministic check command. Get those four right and you have a reliable loop. Get any one wrong and you have an agent that either quits early or spins forever. This piece pulls the loop apart field by field so you can build your own.
Key Takeaways
- A loop is act → observe → decide → repeat until a verifiable exit condition holds or a max-iterations cap fires. Ship PR Until Green is the canonical example — see what loop engineering actually is.
- The exit condition must be objective and machine-checkable. "All PR checks are success" beats "the PR looks done" because a script can verify it. This is the heart of every loop.
- The check command runs between every iteration. For this loop it's
gh pr checks— the agent's eyes on reality, not its own optimism. - Max-iterations is the seatbelt, not the engine. Ten passes is enough for real fixes and short enough to cap your bill if something goes sideways.
- Anti-gaming guardrails stop the agent from faking green — no deleting tests, no skipping checks, no committing
--no-verify.
The Four Fields That Define the Loop
Every well-formed loop in the channel ships as a small spec before it ships a prompt. Ship PR Until Green's spec is short enough to read in one breath:
Loop: Ship PR Until Green
Goal: A PR is open with all CI checks passing
Exit condition: All PR checks report status "success"
Max iterations: 10
Check command: gh pr checks
Anti-gaming: Never delete/skip tests, never push with --no-verify,
never mark required checks as non-required
Those six lines are the entire contract. The kickoff prompt (below) is just prose that tells the agent to honor it. Let's take each field in turn.
The Goal: A Verifiable End State
The goal is not "fix the build." It's a state of the world you can point at: a PR is open and all of its CI checks are passing. The difference matters. "Fix the build" is an activity — the agent could do it forever and never know it's done. "All checks are success on the open PR" is a state — at any moment it's either true or false, and a command can tell you which.
This is the shift from one-shot prompting to closed-loop thinking, covered in from one-shot to closed-loop thinking. You're not describing steps; you're describing the finish line and letting the agent find its own path there. Claude Code's native /goal primitive is built exactly for this — it runs until a verifiable end-state holds.
The Exit Condition and Check Command
The exit condition is the goal expressed as something the check command can answer with a yes or no. Here they're tightly coupled:
- Exit condition: all PR checks report
success - Check command:
gh pr checks <pr>
gh pr checks is perfect for this because it's deterministic — same PR, same commit, same answer every time — and it exits non-zero while anything is pending or failing. The agent runs it at the top of every iteration, reads the result, and only declares victory when the command says all checks passed. It never trusts its own judgment about whether the code is "probably fine."
A bad check command is the number one reason loops misbehave — non-deterministic checks make an agent loop forever, which I unpack in why your agent loops forever. If your check returns "flaky pass" half the time, the agent will keep "fixing" code that was already correct.
Max Iterations: The Seatbelt
Ten iterations. That's the cap. Why ten?
Empirically, a real CI failure on a focused PR gets resolved in two to four passes: implement, watch one check fail, fix it, watch a second fail, fix that, done. Ten gives generous headroom for a stubborn flaky integration test or a lint rule the agent didn't anticipate. Beyond ten, you're almost never converging — you're thrashing, and you want a human to look.
The cap is a cost ceiling and a safety rail, not the thing that ends a healthy run. In a healthy run the exit condition fires first, around iteration three. The cap only matters on the bad days — and on the bad days you very much want it.
What Does the Kickoff Prompt Look Like?
The kickoff prompt is the paste-ready instruction you hand the agent. It encodes the four fields as plain instructions plus the guardrails. Here's the shape of it:
# Kickoff: Ship PR Until Green
claude "You are running the Ship PR Until Green loop.
GOAL: Open a pull request for the current branch and drive it until
every CI check passes.
LOOP (max 10 iterations):
1. Implement the change for the task described in TASK.md.
2. Run the test suite locally. Fix anything red.
3. Commit and push. Open a PR if one isn't open yet.
4. Run: gh pr checks --watch
5. If all checks are 'success' -> STOP and report the PR URL.
6. If any check failed -> read its logs, fix the cause, go to step 3.
7. If you reach iteration 10 without all-green -> STOP and report
which checks are still failing and why.
GUARDRAILS (violating any of these is a failed run):
- Never delete or skip a test to make checks pass.
- Never push with --no-verify or bypass pre-commit hooks.
- Never edit CI config to mark a required check as non-required.
- Each iteration must end by actually running gh pr checks.
Self-pace. Do not ask me for confirmation between iterations."
Notice what the prompt does and doesn't do. It tells the agent the loop body, the exit, the cap, and the rules — then gets out of the way. It does not spell out which file to edit or which test will fail, because the agent discovers that by observing reality each pass. That observation step is what makes it a loop rather than a script.
How Does It Compare to Running It By Hand?
| Dimension | Manual PR babysitting | Ship PR Until Green |
|---|---|---|
| Who watches CI | You, refreshing the tab | The agent, via gh pr checks |
| Context switches | Many (back to other work, back to CI) | Zero until it's done or capped |
| Stops when | You notice it's green | Exit condition fires automatically |
| Runaway risk | None, but slow | Capped at 10 iterations |
| Fakes success? | Never (you'd notice) | Blocked by anti-gaming guardrails |
| Typical wall-clock | Spread over hours | One unattended run |
The manual column is "safe but slow." The loop column trades a small amount of trust — backed by guardrails — for getting your attention back. If you want the deeper rationale for why this primitive is the right tool versus a raw prompt, see loop engineering vs prompt engineering.
You can run the loop today from its recipe page at /skills/ship-pr-until-green, which ships the kickoff prompt above ready to paste. The same skeleton powers its sibling, /skills/test-until-green, when you only care about the local suite and not the full PR.
Frequently Asked Questions
What happens if the loop hits 10 iterations without going green?
It stops and reports. The cap is a hard stop, not a retry-forever signal. The agent returns which checks are still failing and its best read on why, so you can take over with full context instead of starting cold. Hitting the cap is information: it usually means the failure is environmental (a flaky external service) or the task was underspecified.
Can the agent cheat by deleting failing tests?
That's exactly what the anti-gaming guardrails exist to prevent. The kickoff prompt forbids deleting or skipping tests, pushing with --no-verify, and downgrading required checks. Metric-gaming is a real failure mode for loops that optimize a number, which is why the exit condition is "checks genuinely pass," not "the checks command returns zero by any means."
Do I need GitHub Actions specifically?
No. The loop needs a deterministic check command. gh pr checks is the GitHub-native one, but the same anatomy works with any CI that exposes pass/fail to the CLI. Swap the check command, keep the goal-exit-cap structure.
How is this different from Claude Code's /loop command?
/loop is the generic primitive — it repeats a prompt on an interval or until a stop condition. Ship PR Until Green is a specific, opinionated configuration of that idea with a chosen goal, exit, cap, and guardrails. Read more on the three native primitives in /loop, /goal & /schedule in Claude Code.
Is ten iterations a hard rule?
No — it's a sensible default. Tune it to your CI's flakiness and your risk tolerance. Faster, more reliable suites can run with a lower cap; gnarly monorepos with slow integration stages might justify more. Just keep a cap. A loop with no max-iterations is a bug waiting to bill you.
Browse 150+ ready-to-run agent loops in the Loops channel, or explore the full skill catalog at aiskill.market.