PR Babysitter: Auto-Answer Review Comments
The PR Babysitter loop watches an open pull request, auto-fixes failing CI, and addresses reviewer comments iteration after iteration until the PR is merge-ready.
The last mile of shipping a pull request is death by a thousand small nudges. CI goes red over a lint rule. A reviewer asks for a rename. CI goes red again on a flaky integration test. Another reviewer wants a test added. None of it is hard, but it is scattered across hours and days, and every context switch back to the PR costs you focus you would rather spend elsewhere. The PR Babysitter loop exists to absorb that last mile.
PR Babysitter is an agent loop that watches a single open PR and keeps working it toward merge-ready: it detects failing CI and fixes it, reads reviewer comments and addresses them, pushes the changes, and loops back to watch again. It is a focused application of loop engineering — the exit condition is "this PR is merge-ready," and every iteration is grounded in the PR's observed state.
Key Takeaways
- PR Babysitter is a loop that watches one open PR and drives it to merge-ready by auto-fixing failing CI and addressing reviewer comments until it stops.
- It differs from a ship-PR-until-green loop by handling human review feedback, not just CI — it reads comments and responds with code changes.
- The exit condition is "merge-ready": all checks green and all review threads addressed, a verifiable state the loop can evaluate each pass.
- It needs strong guardrails — protected paths and a human merge gate — so it answers reviewers honestly rather than gaming the checks.
- The ready-made recipe lives at /skills/pr-babysitter with a goal, exit condition, iteration cap, and paste-ready kickoff prompt.
What does PR Babysitter actually do?
Each iteration of the loop runs the same arc against a single PR:
- Observe the PR state. Pull the latest CI status and any new review comments since the last pass.
- Decide what's blocking merge. Is CI red? Are there unresolved review threads? If neither, the PR is merge-ready and the loop exits.
- Act on the top blocker. Fix the failing check, or make the code change a reviewer requested, then push.
- Loop. Watch again — new CI runs, new comments — and repeat.
The key word is watches. Unlike a fire-and-forget loop, PR Babysitter is reactive: it sits on the PR and responds to what humans and CI throw at it over time. That makes it a close cousin of a self-improving agent, in that it incorporates external feedback into each successive pass.
How is it different from ship-PR-until-green?
These two loops look similar but solve different halves of the problem.
| Dimension | ship-PR-until-green | PR Babysitter |
|---|---|---|
| Watches | CI checks | CI checks and review comments |
| Reacts to | Failing builds/tests | Failing checks + human feedback |
| Exit condition | All checks green | Merge-ready (green + threads resolved) |
| Human in loop | Optional | Expected (reviewers feed it) |
| Best when | Mechanical PRs | PRs under active human review |
A ship-PR-until-green loop gets you to green CI. PR Babysitter goes one step further and gets you through the social part — the back-and-forth with reviewers that CI can't see. If your PR has no human reviewers, ship-PR-until-green is enough. If it does, PR Babysitter handles the conversation.
What does "merge-ready" mean as an exit condition?
The whole loop hinges on a crisp, verifiable definition of done. "Merge-ready" decomposes into conditions a machine can check each pass:
- All required CI checks pass. An objective signal, read straight from the PR's check status.
- All review threads are resolved or addressed. Each requested change has a corresponding code change or a reply, with no open blocking comments.
- No merge conflicts with the base branch.
When all three hold, the loop stops. This is why exit conditions are the heart of every loop: "merge-ready" sounds fuzzy until you decompose it into checkable parts, at which point the agent can evaluate it without judgment. The underlying watch-and-react mechanics follow the act-observe-decide-repeat cycle in the Agent SDK agent-loop guide.
What guardrails keep it honest with reviewers?
A loop optimizing for "merge-ready" has a tempting cheat: resolve review threads without actually addressing them, or weaken a test a reviewer flagged. Those make the PR look ready while making the codebase worse. Guardrails that prevent it:
- A human merge gate. PR Babysitter makes the PR mergeable; a person hits merge. The loop never self-merges.
- Protected paths and a coverage floor so it can't silence a reviewer by gutting tests.
- Address-don't-dismiss policy — resolving a thread requires a real code change or a substantive reply, not a silent close.
- A max-iterations cap so a PR stuck in an endless review cycle eventually pages a human instead of grinding forever.
With those, the loop becomes a tireless contributor that does the tedious follow-ups, while final judgment stays with you. For the broader pattern of loops that incorporate feedback, see why agents loop forever for the failure modes to avoid.
How do you run it?
The /skills/pr-babysitter recipe ships the full loop: a goal (drive PR #N to merge-ready), an exit condition (green CI plus resolved threads), a max-iterations cap, a check command for reading PR state, and a paste-ready kickoff prompt. Run it with Claude Code's /loop primitive, or put it on a schedule to poll the PR every few minutes. It sits alongside 150+ other recipes in the Loops channel. GitHub's own pull request review docs describe the review states the loop reads.
Frequently Asked Questions
Does PR Babysitter merge the PR itself?
It should not. The safe pattern is for the loop to make the PR merge-ready and leave the actual merge to a human or a branch-protection rule, keeping final judgment out of the loop.
How does it know what a reviewer wants?
It reads the review comments on the PR each pass and interprets them as change requests. Vague comments produce weaker results, so clear, specific review feedback makes the loop far more effective.
What stops it from gaming a reviewer's request?
Guardrails: protected paths and a coverage floor stop it from weakening tests, and an address-don't-dismiss policy means resolving a thread requires a real change or reply rather than a silent close.
Can it handle a PR with many reviewers?
Yes — it watches all review threads and works the open ones. A max-iterations cap ensures that a PR caught in genuinely conflicting feedback eventually escalates to a human instead of looping indefinitely.
Is this overkill for a solo project with no reviewers?
For solo work with no human review, a ship-PR-until-green loop usually suffices. PR Babysitter earns its keep when there are reviewers whose comments need answering.
Browse 150+ ready-to-run agent loops in the Loops channel, or explore the full skill catalog at aiskill.market.