Guardrails and Constraints: Keeping AI Skills on Track

AI skills are powerful tools that can automate complex workflows, generate content, and make decisions at scale. But with great power comes great responsibility. Without proper guardrails and constraints, skills can produce unexpected outputs, consume excessive resources, or even cause harm to your systems and data.

In this comprehensive guide, we will explore the art and science of implementing effective guardrails in your AI skills. You will learn how to define boundaries that keep skills focused, prevent runaway behavior, and ensure consistent, reliable outputs in production environments.

Why Guardrails Matter

Before diving into implementation details, let us understand why guardrails are essential for production-ready AI skills.

The Cost of Unconstrained AI

Consider a documentation generator skill without proper constraints. Given a simple request to "document this codebase," it might:

Attempt to process millions of lines of code simultaneously
Generate verbose documentation that overwhelms users
Access sensitive files it should not touch
Run for hours consuming API credits
Output inconsistent formats across different runs

Each of these scenarios represents a failure mode that guardrails can prevent. The goal is not to limit the skill's capabilities but to channel them productively.

Types of Guardrails

Guardrails fall into several categories, each addressing different aspects of skill behavior:

Input Validation: Ensuring inputs meet expected formats and constraints
Output Constraints: Limiting the scope and format of generated content
Resource Limits: Controlling token usage, API calls, and execution time
Safety Boundaries: Preventing access to sensitive data or dangerous operations
Behavioral Constraints: Guiding the skill's decision-making process

Input Validation Guardrails

The first line of defense is validating inputs before processing begins. Well-designed input validation catches problems early and provides clear feedback to users.

Schema-Based Validation

Define explicit schemas for expected inputs. This approach makes requirements clear and enables automatic validation.

# skill.md frontmatter
---
name: code-analyzer
description: Analyzes code for quality issues
arguments:
  - name: file_path
    type: string
    required: true
    pattern: "^[a-zA-Z0-9_/.-]+$"
    description: Path to the file to analyze
  - name: depth
    type: integer
    required: false
    default: 3
    min: 1
    max: 10
    description: Analysis depth level
  - name: categories
    type: array
    items:
      type: string
      enum: ["security", "performance", "style", "bugs"]
    description: Categories to check
---

The schema above enforces several constraints:

File paths must match a safe pattern (no special characters that could enable path traversal)
Depth is bounded between 1 and 10
Categories must be from a predefined list

Semantic Validation

Beyond format checking, validate that inputs make semantic sense for the task at hand.

## Input Validation

Before processing, verify:

1. **File Existence**: Confirm the specified file exists and is readable
2. **File Type**: Ensure the file extension matches expected types (.py, .js, .ts, etc.)
3. **File Size**: Reject files larger than 100KB to prevent overwhelming analysis
4. **Content Check**: Verify the file contains actual code, not binary data

If validation fails, respond with a clear error message explaining:
- What was expected
- What was received
- How to fix the issue

Defensive Parsing

When skills accept structured input like JSON or YAML, implement defensive parsing that handles malformed data gracefully.

## Parsing Instructions

When parsing user-provided configuration:

1. Use strict parsing mode - reject unknown fields
2. Provide default values for optional fields
3. If parsing fails, explain the specific syntax error
4. Never execute or evaluate user-provided strings as code
5. Sanitize all string inputs before using in prompts

Output Constraints

Controlling what skills produce is equally important as controlling what they accept. Output constraints ensure consistency and prevent runaway generation.

Length Limits

Specify explicit length constraints for generated content.

## Output Requirements

Generate documentation following these constraints:

- **Title**: Maximum 60 characters
- **Description**: 1-3 sentences, maximum 200 characters
- **Each Section**: 100-500 words
- **Total Document**: Maximum 2000 words
- **Code Examples**: Maximum 30 lines each

If content would exceed limits, prioritize the most important information and note that additional details are available upon request.

Format Specifications

Define exact output formats to ensure consistency across runs.

## Output Format

Structure your response as follows:

```json
{
  "summary": "One-sentence summary",
  "severity": "low" | "medium" | "high" | "critical",
  "issues": [
    {
      "line": number,
      "type": "string from: security|performance|style|bug",
      "message": "Description under 100 chars",
      "suggestion": "How to fix, under 200 chars"
    }
  ],
  "metrics": {
    "linesAnalyzed": number,
    "issuesFound": number,
    "estimatedFixTime": "string in format: Xh Ym"
  }
}

Do not include additional fields. Do not wrap in markdown code blocks.


### Content Policies

Define what content is and is not appropriate for the skill to generate.

```markdown
## Content Guidelines

When generating content:

**Do:**
- Focus on technical accuracy
- Use professional language
- Cite sources when making claims
- Acknowledge uncertainty

**Do Not:**
- Generate personal opinions on controversial topics
- Include placeholder or lorem ipsum text
- Make claims about performance without data
- Reference external URLs that might be broken

Resource Limits

AI skills can consume significant resources. Implementing resource limits protects both your budget and system stability.

Token Budgets

Control how many tokens skills can use for input and output.

## Resource Constraints

This skill operates under the following token budgets:

- **Input Context**: Maximum 50,000 tokens
- **Output Generation**: Maximum 4,000 tokens
- **Total Conversation**: Maximum 100,000 tokens

If the input exceeds the context limit:
1. Summarize or chunk the input
2. Process in multiple passes
3. Inform the user that full context was not possible

If approaching output limits:
1. Prioritize essential information
2. Use concise language
3. Offer to continue in a follow-up

Execution Timeouts

Prevent skills from running indefinitely.

## Timing Constraints

- **Maximum execution time**: 2 minutes
- **API call timeout**: 30 seconds per call
- **Maximum retries**: 3 attempts per operation

If timeout approaches:
1. Save progress to state
2. Provide partial results
3. Offer to resume in next invocation

Never run background processes or polling loops.

Rate Limiting

Control how frequently skills can perform certain operations.

## Rate Limits

To prevent abuse and ensure fair resource usage:

- **File reads**: Maximum 50 files per invocation
- **External API calls**: Maximum 10 per invocation
- **Database queries**: Maximum 20 per invocation
- **Generated files**: Maximum 5 per invocation

If limits are reached, prioritize the most important operations and report what was skipped.

Safety Boundaries

Safety guardrails prevent skills from accessing or modifying things they should not touch.

File System Boundaries

Define what parts of the file system skills can access.

## File Access Rules

**Allowed Paths:**
- Current working directory and subdirectories
- /tmp for temporary files
- Explicitly specified paths in configuration

**Forbidden Paths:**
- Home directory hidden files (.*rc, .ssh, .aws)
- System directories (/etc, /usr, /bin)
- Other users' directories
- Paths containing ".." (no directory traversal)

**File Operations:**
- Read: Allowed for source code, config, and doc files
- Write: Only to explicitly specified output paths
- Delete: Never, unless explicitly confirmed by user
- Execute: Never execute files or run shell commands

Data Sensitivity

Handle sensitive data appropriately.

## Sensitive Data Handling

**Never include in output:**
- API keys, tokens, or secrets
- Passwords or credentials
- Personal identifying information
- Private keys or certificates
- Database connection strings

**When sensitive data is detected:**
1. Replace with placeholder: [REDACTED]
2. Note that sensitive data was detected
3. Suggest secure handling alternatives

**In logs and error messages:**
- Truncate file paths to basenames
- Hash or omit sensitive values
- Never log full request/response bodies

Permission Escalation Prevention

Ensure skills cannot gain elevated permissions.

## Permission Boundaries

This skill operates with the permissions of the invoking user.

**Prohibited Actions:**
- Requesting additional permissions
- Modifying permission settings
- Accessing resources requiring elevated privileges
- Acting on behalf of other users
- Modifying authentication configurations

If an operation would require elevated permissions:
1. Stop the operation
2. Explain what permission is needed
3. Provide instructions for the user to perform it manually

Behavioral Constraints

Beyond technical limits, behavioral constraints guide how skills make decisions and handle edge cases.

Decision Boundaries

Define the scope of decisions skills are allowed to make.

## Decision Authority

**Skill May Decide:**
- Code formatting and style
- Documentation structure
- Test organization
- Naming conventions within established patterns

**Skill Must Ask User:**
- Architectural changes
- Dependency additions
- Breaking API changes
- Deletion of any code
- Changes to security-related code

**Skill Must Refuse:**
- Bypassing tests or linting
- Ignoring type errors
- Suppressing warnings without explanation
- Implementing known anti-patterns

Uncertainty Handling

Guide skills on what to do when uncertain.

## Handling Uncertainty

When confidence is low:

1. **Express uncertainty clearly**: "I am not certain, but..."
2. **Provide alternatives**: Offer 2-3 possible interpretations
3. **Ask for clarification**: Request specific information needed
4. **Default to safe options**: Choose the more conservative approach
5. **Document assumptions**: Clearly state what was assumed

Never present uncertain information as fact. Use hedge words appropriately: "likely," "possibly," "appears to," "might be."

Failure Modes

Define how skills should behave when things go wrong.

## Failure Handling

When an error occurs:

1. **Catch and classify**: Identify if the error is recoverable
2. **Preserve state**: Save any partial progress
3. **Provide context**: Explain what was being attempted
4. **Suggest recovery**: Offer specific next steps
5. **Fail gracefully**: Never crash or hang

**Error Response Format:**
```json
{
  "success": false,
  "error": {
    "type": "validation|resource|permission|external|internal",
    "message": "Human-readable explanation",
    "recoverable": boolean,
    "suggestions": ["List of recovery options"]
  },
  "partialResults": {} // Any usable output
}


## Implementing Guardrails in Practice

Now let us look at how to implement these concepts in a real skill.

### Example: Constrained Code Reviewer

Here is a complete example of a code review skill with comprehensive guardrails:

```markdown
---
name: safe-code-reviewer
description: Reviews code for quality issues with safety constraints
version: 1.0.0
---

# Safe Code Reviewer

You are a code review assistant with strict operational boundaries.

## Input Constraints

Accept only:
- File paths matching: ^[a-zA-Z0-9_/.-]+\.(py|js|ts|jsx|tsx|go|rs|java)$
- Maximum 10 files per review
- Maximum 500 lines per file
- Files must be in the current project directory

Reject requests that:
- Target files outside the project
- Include binary files
- Exceed size limits
- Use path traversal patterns

## Review Scope

Focus ONLY on:
1. Obvious bugs and errors
2. Security vulnerabilities (OWASP Top 10)
3. Performance anti-patterns
4. Missing error handling
5. Code style inconsistencies

Do NOT comment on:
- Architectural decisions (suggest separate discussion)
- Personal style preferences
- Hypothetical future problems
- Unrelated files or systems

## Output Constraints

For each issue found:
- Severity: low, medium, high, critical
- Location: file:line
- Description: Maximum 100 characters
- Suggestion: Maximum 200 characters

Maximum issues to report: 20 per file
If more issues exist, note the count and suggest focusing on high-severity first.

## Safety Rules

NEVER:
- Suggest changes to authentication code without security review flag
- Recommend removing validation or sanitization
- Propose disabling security features
- Generate code that executes user input

ALWAYS:
- Flag potential security issues for human review
- Recommend security best practices
- Preserve existing safety checks
- Note when uncertain about security implications

## Resource Limits

- Maximum execution: 60 seconds
- Maximum API tokens: 8,000 output
- Maximum file reads: 10

If limits approached, prioritize by severity and report partial results.

Testing Guardrails

Guardrails should be tested just like any other code. Create test cases for each constraint.

## Guardrail Test Cases

### Input Validation Tests
- [ ] Reject path traversal: "../../../etc/passwd"
- [ ] Reject absolute paths outside project: "/home/user/other"
- [ ] Reject binary files: "image.png"
- [ ] Accept valid project paths: "src/utils/helpers.ts"
- [ ] Enforce file count limit: 11 files should fail
- [ ] Enforce line limit: 501 line file should fail

### Output Constraint Tests
- [ ] Verify JSON format is valid
- [ ] Check issue descriptions under 100 chars
- [ ] Confirm maximum 20 issues per file
- [ ] Validate severity values are from allowed set

### Safety Boundary Tests
- [ ] Verify secrets are redacted in output
- [ ] Confirm no access to dotfiles
- [ ] Check security issues get flagged
- [ ] Ensure no code execution suggestions

### Resource Limit Tests
- [ ] Timeout after 60 seconds
- [ ] Stop at token limit with partial results
- [ ] Respect file read count limit

Common Guardrail Patterns

Over time, certain guardrail patterns emerge as particularly useful. Here are patterns you can adapt for your skills.

The Allowlist Pattern

Instead of blocking bad inputs, explicitly define what is allowed.

## Allowed Operations

This skill may only perform the following operations:

1. Read files with extensions: .py, .js, .ts, .md, .json, .yaml
2. Write files to: ./output/, ./docs/, ./tests/
3. Call APIs: GitHub (read-only), npm (read-only)

Any operation not explicitly listed above is prohibited.

The Budget Pattern

Allocate resources in a budget that depletes with use.

## Resource Budget

Starting budget: 100 units

Costs:
- File read: 1 unit
- File write: 5 units
- API call: 10 units
- Complex analysis: 20 units

When budget reaches 10 units:
- Warn user about remaining capacity
- Prioritize essential operations
- Offer to stop and report progress

When budget reaches 0:
- Stop all operations
- Report what was completed
- Provide summary of remaining work

The Circuit Breaker Pattern

Stop operation when too many errors occur.

## Circuit Breaker

Track consecutive failures:
- After 3 failures: Enter cautious mode, slow down operations
- After 5 failures: Stop and diagnose
- After any success: Reset counter

In cautious mode:
- Add extra validation
- Double-check assumptions
- Reduce batch sizes
- Increase verbosity for debugging

The Audit Trail Pattern

Log decisions for review and accountability.

## Decision Logging

For each significant decision, record:

1. What was decided
2. Why (reasoning)
3. What alternatives existed
4. Confidence level
5. Timestamp

Format:
[TIMESTAMP] DECISION: {what} REASON: {why} CONFIDENCE: {level}

This log should be included in final output for transparency.

Guardrails for Different Skill Types

Different types of skills require different guardrail emphases.

Generation Skills

Skills that create content need strong output constraints.

## Generation Guardrails

- Define exact output format with schema
- Set strict length limits
- Specify content policies
- Require source attribution
- Validate generated code syntax
- Check for common generation errors (repetition, hallucination)

Analysis Skills

Skills that examine data need input protection.

## Analysis Guardrails

- Validate input format and size
- Set processing time limits
- Define scope boundaries clearly
- Protect sensitive information in output
- Handle malformed input gracefully
- Limit scope creep during analysis

Integration Skills

Skills that connect to external systems need robust safety measures.

## Integration Guardrails

- Whitelist allowed endpoints
- Validate all external data
- Set strict timeouts
- Implement retry limits
- Log all external calls
- Never pass credentials in URLs
- Sanitize data before sending

Modification Skills

Skills that change files or state need the most careful guardrails.

## Modification Guardrails

- Require explicit confirmation for changes
- Create backups before modifying
- Limit scope of changes per run
- Validate changes before applying
- Provide rollback instructions
- Never delete without confirmation
- Log all modifications with before/after

Conclusion

Guardrails and constraints are not limitations on your skills—they are features that make skills reliable, trustworthy, and production-ready. Well-designed guardrails:

Prevent accidents before they happen
Build trust with users who rely on consistent behavior
Enable automation by making skills predictable
Reduce support burden through clear error messages
Protect resources from runaway consumption

Start with conservative guardrails and loosen them as you gain confidence. It is always easier to relax constraints than to add them after something goes wrong.

Remember: the best guardrail is one that users never notice because it prevents problems silently in the background. Design your guardrails to be invisible during normal operation but informative when triggered.

As you build more skills, develop a library of guardrail patterns that work for your use cases. Document what has worked and what has not. Share your learnings with the community so we can all build safer, more reliable AI systems together.

Guardrails and Constraints: Keeping AI Skills on Track

Why Guardrails Matter

Before diving into implementation details, let us understand why guardrails are essential for production-ready AI skills.

The Cost of Unconstrained AI

Consider a documentation generator skill without proper constraints. Given a simple request to "document this codebase," it might:

Attempt to process millions of lines of code simultaneously
Generate verbose documentation that overwhelms users
Access sensitive files it should not touch
Run for hours consuming API credits
Output inconsistent formats across different runs

Each of these scenarios represents a failure mode that guardrails can prevent. The goal is not to limit the skill's capabilities but to channel them productively.

Types of Guardrails

Guardrails fall into several categories, each addressing different aspects of skill behavior:

Input Validation: Ensuring inputs meet expected formats and constraints
Output Constraints: Limiting the scope and format of generated content
Resource Limits: Controlling token usage, API calls, and execution time
Safety Boundaries: Preventing access to sensitive data or dangerous operations
Behavioral Constraints: Guiding the skill's decision-making process

Input Validation Guardrails

The first line of defense is validating inputs before processing begins. Well-designed input validation catches problems early and provides clear feedback to users.

Schema-Based Validation

Define explicit schemas for expected inputs. This approach makes requirements clear and enables automatic validation.

# skill.md frontmatter
---
name: code-analyzer
description: Analyzes code for quality issues
arguments:
  - name: file_path
    type: string
    required: true
    pattern: "^[a-zA-Z0-9_/.-]+$"
    description: Path to the file to analyze
  - name: depth
    type: integer
    required: false
    default: 3
    min: 1
    max: 10
    description: Analysis depth level
  - name: categories
    type: array
    items:
      type: string
      enum: ["security", "performance", "style", "bugs"]
    description: Categories to check
---

The schema above enforces several constraints:

File paths must match a safe pattern (no special characters that could enable path traversal)
Depth is bounded between 1 and 10
Categories must be from a predefined list

Semantic Validation

Beyond format checking, validate that inputs make semantic sense for the task at hand.

## Input Validation

Before processing, verify:

1. **File Existence**: Confirm the specified file exists and is readable
2. **File Type**: Ensure the file extension matches expected types (.py, .js, .ts, etc.)
3. **File Size**: Reject files larger than 100KB to prevent overwhelming analysis
4. **Content Check**: Verify the file contains actual code, not binary data

If validation fails, respond with a clear error message explaining:
- What was expected
- What was received
- How to fix the issue

Defensive Parsing

When skills accept structured input like JSON or YAML, implement defensive parsing that handles malformed data gracefully.

## Parsing Instructions

When parsing user-provided configuration:

1. Use strict parsing mode - reject unknown fields
2. Provide default values for optional fields
3. If parsing fails, explain the specific syntax error
4. Never execute or evaluate user-provided strings as code
5. Sanitize all string inputs before using in prompts

Output Constraints

Controlling what skills produce is equally important as controlling what they accept. Output constraints ensure consistency and prevent runaway generation.

Length Limits

Specify explicit length constraints for generated content.

## Output Requirements

Generate documentation following these constraints:

- **Title**: Maximum 60 characters
- **Description**: 1-3 sentences, maximum 200 characters
- **Each Section**: 100-500 words
- **Total Document**: Maximum 2000 words
- **Code Examples**: Maximum 30 lines each

If content would exceed limits, prioritize the most important information and note that additional details are available upon request.

Format Specifications

Define exact output formats to ensure consistency across runs.

## Output Format

Structure your response as follows:

```json
{
  "summary": "One-sentence summary",
  "severity": "low" | "medium" | "high" | "critical",
  "issues": [
    {
      "line": number,
      "type": "string from: security|performance|style|bug",
      "message": "Description under 100 chars",
      "suggestion": "How to fix, under 200 chars"
    }
  ],
  "metrics": {
    "linesAnalyzed": number,
    "issuesFound": number,
    "estimatedFixTime": "string in format: Xh Ym"
  }
}

Do not include additional fields. Do not wrap in markdown code blocks.


### Content Policies

Define what content is and is not appropriate for the skill to generate.

```markdown
## Content Guidelines

When generating content:

**Do:**
- Focus on technical accuracy
- Use professional language
- Cite sources when making claims
- Acknowledge uncertainty

**Do Not:**
- Generate personal opinions on controversial topics
- Include placeholder or lorem ipsum text
- Make claims about performance without data
- Reference external URLs that might be broken

Resource Limits

AI skills can consume significant resources. Implementing resource limits protects both your budget and system stability.

Token Budgets

Control how many tokens skills can use for input and output.

## Resource Constraints

This skill operates under the following token budgets:

- **Input Context**: Maximum 50,000 tokens
- **Output Generation**: Maximum 4,000 tokens
- **Total Conversation**: Maximum 100,000 tokens

If the input exceeds the context limit:
1. Summarize or chunk the input
2. Process in multiple passes
3. Inform the user that full context was not possible

If approaching output limits:
1. Prioritize essential information
2. Use concise language
3. Offer to continue in a follow-up

Execution Timeouts

Prevent skills from running indefinitely.

## Timing Constraints

- **Maximum execution time**: 2 minutes
- **API call timeout**: 30 seconds per call
- **Maximum retries**: 3 attempts per operation

If timeout approaches:
1. Save progress to state
2. Provide partial results
3. Offer to resume in next invocation

Never run background processes or polling loops.

Rate Limiting

Control how frequently skills can perform certain operations.

## Rate Limits

To prevent abuse and ensure fair resource usage:

- **File reads**: Maximum 50 files per invocation
- **External API calls**: Maximum 10 per invocation
- **Database queries**: Maximum 20 per invocation
- **Generated files**: Maximum 5 per invocation

If limits are reached, prioritize the most important operations and report what was skipped.

Safety Boundaries

Safety guardrails prevent skills from accessing or modifying things they should not touch.

File System Boundaries

Define what parts of the file system skills can access.

## File Access Rules

**Allowed Paths:**
- Current working directory and subdirectories
- /tmp for temporary files
- Explicitly specified paths in configuration

**Forbidden Paths:**
- Home directory hidden files (.*rc, .ssh, .aws)
- System directories (/etc, /usr, /bin)
- Other users' directories
- Paths containing ".." (no directory traversal)

**File Operations:**
- Read: Allowed for source code, config, and doc files
- Write: Only to explicitly specified output paths
- Delete: Never, unless explicitly confirmed by user
- Execute: Never execute files or run shell commands

Data Sensitivity

Handle sensitive data appropriately.

## Sensitive Data Handling

**Never include in output:**
- API keys, tokens, or secrets
- Passwords or credentials
- Personal identifying information
- Private keys or certificates
- Database connection strings

**When sensitive data is detected:**
1. Replace with placeholder: [REDACTED]
2. Note that sensitive data was detected
3. Suggest secure handling alternatives

**In logs and error messages:**
- Truncate file paths to basenames
- Hash or omit sensitive values
- Never log full request/response bodies

Permission Escalation Prevention

Ensure skills cannot gain elevated permissions.

## Permission Boundaries

This skill operates with the permissions of the invoking user.

**Prohibited Actions:**
- Requesting additional permissions
- Modifying permission settings
- Accessing resources requiring elevated privileges
- Acting on behalf of other users
- Modifying authentication configurations

If an operation would require elevated permissions:
1. Stop the operation
2. Explain what permission is needed
3. Provide instructions for the user to perform it manually

Behavioral Constraints

Beyond technical limits, behavioral constraints guide how skills make decisions and handle edge cases.

Decision Boundaries

Define the scope of decisions skills are allowed to make.

## Decision Authority

**Skill May Decide:**
- Code formatting and style
- Documentation structure
- Test organization
- Naming conventions within established patterns

**Skill Must Ask User:**
- Architectural changes
- Dependency additions
- Breaking API changes
- Deletion of any code
- Changes to security-related code

**Skill Must Refuse:**
- Bypassing tests or linting
- Ignoring type errors
- Suppressing warnings without explanation
- Implementing known anti-patterns

Uncertainty Handling

Guide skills on what to do when uncertain.

## Handling Uncertainty

When confidence is low:

1. **Express uncertainty clearly**: "I am not certain, but..."
2. **Provide alternatives**: Offer 2-3 possible interpretations
3. **Ask for clarification**: Request specific information needed
4. **Default to safe options**: Choose the more conservative approach
5. **Document assumptions**: Clearly state what was assumed

Never present uncertain information as fact. Use hedge words appropriately: "likely," "possibly," "appears to," "might be."

Failure Modes

Define how skills should behave when things go wrong.

## Failure Handling

When an error occurs:

1. **Catch and classify**: Identify if the error is recoverable
2. **Preserve state**: Save any partial progress
3. **Provide context**: Explain what was being attempted
4. **Suggest recovery**: Offer specific next steps
5. **Fail gracefully**: Never crash or hang

**Error Response Format:**
```json
{
  "success": false,
  "error": {
    "type": "validation|resource|permission|external|internal",
    "message": "Human-readable explanation",
    "recoverable": boolean,
    "suggestions": ["List of recovery options"]
  },
  "partialResults": {} // Any usable output
}


## Implementing Guardrails in Practice

Now let us look at how to implement these concepts in a real skill.

### Example: Constrained Code Reviewer

Here is a complete example of a code review skill with comprehensive guardrails:

```markdown
---
name: safe-code-reviewer
description: Reviews code for quality issues with safety constraints
version: 1.0.0
---

# Safe Code Reviewer

You are a code review assistant with strict operational boundaries.

## Input Constraints

Accept only:
- File paths matching: ^[a-zA-Z0-9_/.-]+\.(py|js|ts|jsx|tsx|go|rs|java)$
- Maximum 10 files per review
- Maximum 500 lines per file
- Files must be in the current project directory

Reject requests that:
- Target files outside the project
- Include binary files
- Exceed size limits
- Use path traversal patterns

## Review Scope

Focus ONLY on:
1. Obvious bugs and errors
2. Security vulnerabilities (OWASP Top 10)
3. Performance anti-patterns
4. Missing error handling
5. Code style inconsistencies

Do NOT comment on:
- Architectural decisions (suggest separate discussion)
- Personal style preferences
- Hypothetical future problems
- Unrelated files or systems

## Output Constraints

For each issue found:
- Severity: low, medium, high, critical
- Location: file:line
- Description: Maximum 100 characters
- Suggestion: Maximum 200 characters

Maximum issues to report: 20 per file
If more issues exist, note the count and suggest focusing on high-severity first.

## Safety Rules

NEVER:
- Suggest changes to authentication code without security review flag
- Recommend removing validation or sanitization
- Propose disabling security features
- Generate code that executes user input

ALWAYS:
- Flag potential security issues for human review
- Recommend security best practices
- Preserve existing safety checks
- Note when uncertain about security implications

## Resource Limits

- Maximum execution: 60 seconds
- Maximum API tokens: 8,000 output
- Maximum file reads: 10

If limits approached, prioritize by severity and report partial results.

Testing Guardrails

Guardrails should be tested just like any other code. Create test cases for each constraint.

## Guardrail Test Cases

### Input Validation Tests
- [ ] Reject path traversal: "../../../etc/passwd"
- [ ] Reject absolute paths outside project: "/home/user/other"
- [ ] Reject binary files: "image.png"
- [ ] Accept valid project paths: "src/utils/helpers.ts"
- [ ] Enforce file count limit: 11 files should fail
- [ ] Enforce line limit: 501 line file should fail

### Output Constraint Tests
- [ ] Verify JSON format is valid
- [ ] Check issue descriptions under 100 chars
- [ ] Confirm maximum 20 issues per file
- [ ] Validate severity values are from allowed set

### Safety Boundary Tests
- [ ] Verify secrets are redacted in output
- [ ] Confirm no access to dotfiles
- [ ] Check security issues get flagged
- [ ] Ensure no code execution suggestions

### Resource Limit Tests
- [ ] Timeout after 60 seconds
- [ ] Stop at token limit with partial results
- [ ] Respect file read count limit

Common Guardrail Patterns

Over time, certain guardrail patterns emerge as particularly useful. Here are patterns you can adapt for your skills.

The Allowlist Pattern

Instead of blocking bad inputs, explicitly define what is allowed.

## Allowed Operations

This skill may only perform the following operations:

1. Read files with extensions: .py, .js, .ts, .md, .json, .yaml
2. Write files to: ./output/, ./docs/, ./tests/
3. Call APIs: GitHub (read-only), npm (read-only)

Any operation not explicitly listed above is prohibited.

The Budget Pattern

Allocate resources in a budget that depletes with use.

## Resource Budget

Starting budget: 100 units

Costs:
- File read: 1 unit
- File write: 5 units
- API call: 10 units
- Complex analysis: 20 units

When budget reaches 10 units:
- Warn user about remaining capacity
- Prioritize essential operations
- Offer to stop and report progress

When budget reaches 0:
- Stop all operations
- Report what was completed
- Provide summary of remaining work

The Circuit Breaker Pattern

Stop operation when too many errors occur.

## Circuit Breaker

Track consecutive failures:
- After 3 failures: Enter cautious mode, slow down operations
- After 5 failures: Stop and diagnose
- After any success: Reset counter

In cautious mode:
- Add extra validation
- Double-check assumptions
- Reduce batch sizes
- Increase verbosity for debugging

The Audit Trail Pattern

Log decisions for review and accountability.

## Decision Logging

For each significant decision, record:

1. What was decided
2. Why (reasoning)
3. What alternatives existed
4. Confidence level
5. Timestamp

Format:
[TIMESTAMP] DECISION: {what} REASON: {why} CONFIDENCE: {level}

This log should be included in final output for transparency.

Guardrails for Different Skill Types

Different types of skills require different guardrail emphases.

Generation Skills

Skills that create content need strong output constraints.

## Generation Guardrails

- Define exact output format with schema
- Set strict length limits
- Specify content policies
- Require source attribution
- Validate generated code syntax
- Check for common generation errors (repetition, hallucination)

Analysis Skills

Skills that examine data need input protection.

## Analysis Guardrails

- Validate input format and size
- Set processing time limits
- Define scope boundaries clearly
- Protect sensitive information in output
- Handle malformed input gracefully
- Limit scope creep during analysis

Integration Skills

Skills that connect to external systems need robust safety measures.

## Integration Guardrails

- Whitelist allowed endpoints
- Validate all external data
- Set strict timeouts
- Implement retry limits
- Log all external calls
- Never pass credentials in URLs
- Sanitize data before sending

Modification Skills

Skills that change files or state need the most careful guardrails.

## Modification Guardrails

- Require explicit confirmation for changes
- Create backups before modifying
- Limit scope of changes per run
- Validate changes before applying
- Provide rollback instructions
- Never delete without confirmation
- Log all modifications with before/after

Conclusion

Guardrails and constraints are not limitations on your skills—they are features that make skills reliable, trustworthy, and production-ready. Well-designed guardrails:

Prevent accidents before they happen
Build trust with users who rely on consistent behavior
Enable automation by making skills predictable
Reduce support burden through clear error messages
Protect resources from runaway consumption

Start with conservative guardrails and loosen them as you gain confidence. It is always easier to relax constraints than to add them after something goes wrong.