UltraPlan and Plan Mode V2: Skills in the Loop

The most interesting shift happening in AI developer tooling is not about faster responses or bigger context windows. It is about planning becoming a first-class product layer. Recent CCLeaks analysis has surfaced two features -- UltraPlan and Plan Mode V2 -- that fundamentally change the relationship between skills and the agent execution lifecycle.

If you build skills today that only handle execution, you are building for yesterday's architecture. The next wave of valuable skills will participate in planning, execution, and verification as distinct phases.

A Note on Sources

The features described in this article come from CCLeaks community analysis of publicly available Claude Code artifacts. This content is AI-generated, may contain errors or misinterpretations, and is not affiliated with or endorsed by Anthropic. None of this should be treated as confirmed product features or roadmap items. We are reading signals and making educated inferences about architectural direction. The patterns described here are useful for skill design regardless of whether these specific features ship exactly as described.

Key Takeaways

UltraPlan dedicates up to 30 minutes of separate cloud compute to exploration and planning before any code is written
Plan Mode V2 structures work into 5 phases: Interview, Planning, Execution, Verification, and Review
Interview skills -- those that surface the right domain-specific questions before work begins -- are the highest-leverage opportunity in this architecture
Verification skills are the most underserved category; almost no one builds them, yet Plan Mode V2 makes verification a first-class phase
Skills that output structured, machine-readable data are directly useful to planning agents; prose-only output is not

UltraPlan: Planning as a Separate Compute Layer

The most striking finding is UltraPlan, which appears to work like this: Claude spins up a separate cloud instance that explores and plans for up to 30 minutes before returning a structured execution plan.

Read that again. Thirty minutes of dedicated planning compute before a single line of code is written.

This is a fundamental departure from the prompt-response paradigm. Today, when you ask Claude Code to implement a feature, it starts reading files and writing code almost immediately. Planning and execution are interleaved in a single stream of consciousness. UltraPlan separates them into distinct phases with dedicated resources.

Parallel Exploration Agents

The leaked details suggest UltraPlan uses parallel exploration agents -- three for Max/Enterprise tiers, one for standard users. These agents independently explore the codebase, investigate dependencies, map architectural patterns, and identify potential conflicts. Their findings are then synthesized into a unified plan. This pattern of multi-agent coordination -- multiple specialized agents working toward a shared goal -- is becoming a recurring theme in the ecosystem.

This is the pattern that experienced senior engineers follow instinctively. Before implementing a feature, they read the relevant code, check for similar patterns in the codebase, identify files that will need changes, and think about edge cases. UltraPlan automates this process and makes it systematic rather than ad hoc.

The Scratchpad System

A cross-worker knowledge exchange system called the scratchpad allows exploration agents to share findings with each other. This is related to the broader challenge of context management across long-running sessions. If Agent 1 discovers that the authentication module uses a specific pattern, Agent 2 can incorporate that knowledge when exploring the API layer.

This is significant for skill design because it means skills that produce structured, machine-readable analysis become inputs to the planning process. A skill that outputs "here are the 7 files you will need to modify and why" is directly useful to a planning agent in a way that a skill producing prose explanations is not.

Plan Mode V2: The Five-Phase Workflow

While UltraPlan handles the compute-intensive exploration phase, Plan Mode V2 structures the entire workflow into five distinct phases. Each phase has a clear purpose and clear boundaries.

Phase	Purpose	Skill Opportunity
Interview	Clarify requirements, surface ambiguities, establish constraints	Domain-specific question skills that surface the right constraints before work begins
Planning	Synthesize exploration into a concrete execution sequence	Architecture analysis and dependency mapping skills that provide structured input
Execution	Implement the plan step by step within established constraints	Constrained execution skills that produce verifiable, measurable outputs
Verification	Check whether implementation satisfies original requirements	Quality evaluation skills with clear criteria (compilation, structure, completeness, style)
Review	Present results for human review with decision audit trail	Summary and diff skills that map outputs back to original requirements

Phase 1: Interview

This is new and significant. Before planning begins, the system conducts an interview to clarify requirements, surface ambiguities, and establish constraints. This phase is feature-gated behind a flag called tengu_plan_mode_interview_phase, suggesting it is still being tested and refined.

The interview phase means the system asks you questions before it starts working. What is the scope? Are there constraints? What does success look like? This mirrors how a capable contractor operates -- they scope the work before they estimate it.

For skill builders: Skills that generate good interview questions for specific domains become extremely valuable in this architecture. A skill that knows the right questions to ask about database migration, or API design, or performance optimization, produces better plans because it surfaces constraints that would otherwise be discovered mid-execution.

Phase 2: Planning

With interview data in hand, the system produces a structured execution plan. This is where UltraPlan's exploration results are synthesized into a concrete sequence of steps.

The leaked details mention A/B testing of plan size variants: trim, cut, and cap. This kind of feature flag experimentation is a recurring pattern in how Anthropic iterates on Claude Code. This suggests Anthropic is actively experimenting with how detailed plans should be. Too detailed and the plan becomes rigid. Too abstract and it provides insufficient guidance during execution.

Phase 3: Execution

The plan is executed step by step. This is the phase that most current skills are designed for -- reading files, writing code, running tests. But in Plan Mode V2, execution happens within the constraints established by the planning phase. The agent follows the plan rather than improvising.

Phase 4: Verification

After execution completes, a separate verification phase checks whether the implementation actually satisfies the original requirements. This is not "did the tests pass" (though that is part of it). It is "does this implementation match what we planned, and does the plan match what was requested."

This phase is where skills that know how to evaluate quality become critical. A verification skill for API design might check for consistent error handling, proper authentication, and documentation coverage. A verification skill for frontend work might check for accessibility, responsive behavior, and design system compliance.

Phase 5: Review

The final phase presents results for human review. This is the handoff point where the system says "here is what I did, here is how it maps to what you asked for, and here are the decisions I made along the way."

UltraThink: Enhanced Reasoning Modes

Alongside UltraPlan, the analysis surfaced UltraThink Enhanced Reasoning with three modes: adaptive, enabled, and disabled. This appears to control how deeply the model reasons about individual steps within the execution phase.

In adaptive mode, the system decides how much reasoning each step requires. Simple file renames get minimal reasoning. Complex architectural decisions get deep analysis. This is resource-efficient -- not every operation needs extended thinking.

For skill builders, this means your skill's instructions can influence how much reasoning the model applies. A skill that clearly specifies when deep analysis is needed ("always evaluate security implications of database schema changes") versus when it is not ("use standard naming conventions without deliberation") helps the system allocate reasoning resources effectively.

Visual Feedback: The Rainbow Shimmer

A detail that might seem superficial but reveals product thinking: UltraPlan reportedly uses a 7-color rainbow shimmer animation during the planning phase. This is not decoration. When a system runs for up to 30 minutes before producing output, visual feedback is essential. Users need to know the system is working, not stuck.

Skill builders should internalize this principle. If your skill takes more than a few seconds to produce output, provide intermediate feedback. Progress indicators, status messages, even simple "analyzing file 3 of 12" updates prevent users from canceling a skill that is working correctly but silently.

What This Means for Skill Design

The five-phase architecture changes the value calculus for skills. Here is how to think about it.

Interview Skills: The Highest Leverage Opportunity

Most skills today are execution skills. They take an input and produce an output. But the interview phase creates a new category: skills that help the system ask better questions before it starts working.

Consider a skill designed for database migration planning. In the interview phase, it would surface questions like:

What is the current data volume in the affected tables?
Are there running jobs or cron processes that depend on the current schema?
What is the acceptable downtime window?
Are there downstream consumers of this data that need migration coordination?

These questions are domain-specific. A generic planning system would not know to ask them. A domain-specific skill would. And asking them before planning begins produces dramatically better plans.

Planning Skills: Research and Analysis

Skills that perform structured analysis become planning-phase inputs. A codebase architecture skill that maps dependencies, identifies patterns, and documents conventions gives the planning phase better raw material to work with.

The key design principle is structured output. Planning agents need machine-readable analysis, not prose. A skill that outputs a JSON structure of file dependencies is more useful in the planning loop than one that writes a paragraph explaining them.

Execution Skills: Constrained and Verifiable

Execution skills remain important, but they change character in a plan-first architecture. Instead of autonomous decision-making, execution skills work within plan constraints. This actually makes them easier to build -- the skill does not need to figure out what to do, only how to do it well.

The best execution skills in a plan-first world are the ones that produce verifiable outputs. If the plan says "add input validation to the user registration endpoint," the execution skill should produce code that a verification skill can objectively evaluate.

Verification Skills: The Missing Layer

Verification is the most underserved phase in the current skill ecosystem. Almost no one builds verification skills. But in Plan Mode V2, verification is a first-class phase, and skills that can evaluate whether an implementation meets its specification are extremely valuable.

Verification skills need clear evaluation criteria:

Does the code compile and pass tests? (Basic)
Does the implementation match the planned approach? (Structural)
Does it handle the edge cases identified during planning? (Completeness)
Does it maintain consistency with existing codebase patterns? (Style)
Does it satisfy the original user requirement? (Semantic)

Each of those evaluation levels is a potential skill. The market has barely begun to explore this space.

Building for the Planning Loop

If you want to build skills that participate in the planning loop rather than just the execution phase, here is the practical approach.

Design for Phase Awareness

Your skill should know which phase it is operating in. An architecture analysis skill behaves differently during planning (broad survey, dependency mapping) versus during verification (focused check against plan specifications). Build your skill with phase-appropriate behavior.

Output Structured Data

Planning agents consume structured data. Verification agents compare structured expectations against structured results. Free-form prose is useful for the review phase but not for planning or verification. Design your skill to output structured data by default with optional prose summaries.

Declare Your Phase Affinity

Make it clear in your skill's description and documentation which phases it supports. "This skill is designed for the planning phase -- it analyzes codebase architecture and outputs a dependency map." Users and planning systems both benefit from explicit phase declarations.

Make Verification Possible

If you build an execution skill, design its output to be verifiable. This means clear success criteria, measurable outcomes, and structured artifacts that a verification skill can evaluate programmatically.

The Bigger Picture

UltraPlan and Plan Mode V2 represent a maturation of AI developer tooling from "smart autocomplete" to "autonomous engineering partner." The five-phase workflow mirrors how experienced engineering teams operate: scope, plan, execute, verify, review.

Skills are not just "do this" anymore. They are "help me understand this, plan this, do this, and check this." The skill builders who recognize this shift and build for all five phases will capture disproportionate value in the marketplace.

The planning loop is the next frontier for AI skills. The tools to participate in it are emerging now. The question is whether you will build for it.

Frequently Asked Questions

What is UltraPlan in Claude Code?

UltraPlan is an unreleased feature discovered in CCLeaks source analysis where Claude spins up a separate cloud instance running Opus 4.6 that can explore and plan for up to 30 minutes. Users review the plan before execution begins locally.

What is Plan Mode V2?

Plan Mode V2 is a five-phase workflow: Interview (scope the problem), Planning (design the approach with parallel exploration agents), Execution (implement changes), Verification (validate results), and Review (assess outcomes). Max/Enterprise subscribers get 3 parallel exploration agents; standard users get 1.

How can skills participate in the planning phase?

Skills can participate by outputting structured data (dependency maps, architecture analyses, risk assessments) that planning agents consume. Declare your skill's phase affinity in its description and design phase-appropriate behaviour -- broad surveys during planning, focused checks during verification.

What are verification skills and why do they matter?

Verification skills evaluate whether an implementation meets its specification. They check criteria like compilation success, structural match to the plan, edge case handling, codebase pattern consistency, and semantic satisfaction of user requirements. They are the most underserved skill category today, yet Plan Mode V2 makes verification a first-class phase.

What is the scratchpad system?

The scratchpad is a session-specific directory at /tmp/claude-{uid}/{cwd}/{sessionId}/scratchpad/ with owner-only permissions (0o700). It enables cross-worker knowledge exchange during multi-agent planning sessions and is automatically cleaned up when the session ends.

Explore production-ready AI skills at aiskill.market/browse or submit your own skill to the marketplace.