MCP Best Practices for Skill Builders
Design reliable Model Context Protocol tools that integrate cleanly with AI skills. Practical patterns for error handling, schema design, and testing MCP servers.
Model Context Protocol servers are the connective tissue between AI agents and external services. When they work well, they're invisible. When they fail, everything built on top of them breaks. The difference between reliable MCP tools and fragile ones comes down to a handful of design decisions made early in development.
After reviewing hundreds of MCP implementations across skill registries and open-source repositories, clear patterns emerge. The best MCP servers share common traits. The broken ones share common mistakes.
Key Takeaways
- Schema design determines usability -- AI agents can only use tools they understand, and understanding comes from clear parameter schemas
- Error handling must be agent-friendly, returning structured messages that AI can interpret and act on, not stack traces
- Idempotent operations prevent cascading failures when AI agents retry tools after timeouts or partial failures
- Testing MCP servers requires testing with actual AI agents, not just unit tests against the function signatures
- Resource management is critical because AI agents don't manage connections, file handles, or rate limits the way human developers do
Why MCP Design Matters for Skills
A skill tells an AI agent what to do. An MCP server gives it the tools to do it. When you build a skill that depends on MCP tools, the quality of those tools directly determines the quality of the skill experience.
Consider a debugging skill that uses an MCP server to connect to a running application's logs. If the MCP tool returns unstructured log text, the AI has to parse it heuristically. If it returns structured log entries with severity levels, timestamps, and stack frames, the AI can reason about them precisely. Same skill, dramatically different outcomes based on tool design.
Schema Design: Make Tools Self-Documenting
Write Descriptions Like You're Teaching
Every MCP tool has a name, description, and parameter schema. The description isn't documentation for humans reading your README. It's the primary source of information the AI agent uses to decide when and how to call your tool.
Bad descriptions tell the AI what the tool does in developer shorthand:
{
"name": "query_db",
"description": "Query the database"
}
Good descriptions tell the AI when to use the tool, what it needs, and what to expect back:
{
"name": "query_database",
"description": "Execute a read-only SQL query against the application database. Use this when the user needs to inspect data, check record counts, or verify database state. Returns rows as JSON arrays. Maximum 1000 rows per query. Does not support INSERT, UPDATE, or DELETE."
}
The second description gives the AI enough context to make correct decisions about tool selection without additional prompting.
Use Enums and Constraints Aggressively
AI agents work best with constrained parameter spaces. Instead of accepting a freeform string for a log level, define an enum:
{
"level": {
"type": "string",
"enum": ["debug", "info", "warn", "error", "fatal"],
"description": "Minimum log severity to include in results"
}
}
Enums prevent the AI from inventing values like "critical" or "WARNING" that your server doesn't handle. Every invalid parameter value is a wasted round trip and a degraded user experience.
Provide Defaults That Make Sense
Not every parameter needs to be required. Provide sensible defaults for parameters that have obvious common values:
{
"limit": {
"type": "integer",
"default": 50,
"description": "Maximum number of results to return. Default 50, maximum 500."
}
}
Defaults reduce the cognitive load on the AI agent and make tools usable without extensive configuration in the skill instructions.
Error Handling: Think Like an Agent
Return Structured Errors
When an MCP tool fails, the AI agent needs to understand what went wrong and whether to retry, try a different approach, or report the failure to the user. Stack traces don't help with this. Structured error responses do.
{
"error": {
"code": "RATE_LIMITED",
"message": "API rate limit exceeded. 429 response from GitHub API.",
"retryable": true,
"retryAfterSeconds": 60,
"suggestion": "Wait 60 seconds before retrying, or reduce the scope of the query."
}
}
The retryable field and suggestion field are particularly valuable. They give the AI agent actionable information instead of forcing it to guess whether a retry is appropriate.
Distinguish Between User Errors and System Errors
An AI agent should handle a missing required parameter differently than a database connection failure. Your error responses should make this distinction clear:
- Validation errors (4xx equivalent): The request was malformed. The AI should fix the parameters and retry.
- Authorization errors: The user needs to provide credentials. The AI should ask the user.
- System errors (5xx equivalent): Something broke on the server side. The AI should report the problem.
- Rate limit errors: The request was valid but throttled. The AI should wait and retry.
Each category requires a different agent response. If your error messages don't distinguish between them, the AI has to guess, and it often guesses wrong.
Idempotency: Design for Retries
AI agents retry operations. Sometimes the agent retries because it received a timeout but the operation actually succeeded. Sometimes it retries because the user asked for the same thing again. Sometimes it retries because a previous step in a multi-step workflow failed and the agent is re-executing from the beginning.
Every write operation in your MCP server should be idempotent or should clearly document when it isn't. Use idempotency keys, check-before-write patterns, or upsert semantics to ensure that retrying a tool call doesn't create duplicate records, send duplicate messages, or trigger duplicate side effects.
For tools that genuinely cannot be idempotent -- sending an email, for instance -- document this clearly in the tool description so the AI agent can confirm with the user before execution.
Resource Management
Connection Pooling
AI agents don't manage database connections. They call your tool, get a result, and move on. If every tool call opens a new database connection, you'll exhaust your connection pool within minutes during an active AI session.
Use connection pooling in your MCP server. Initialize connections when the server starts, reuse them across calls, and handle reconnection gracefully when connections drop.
Rate Limiting
If your MCP server wraps an external API with rate limits, implement client-side rate limiting in your server rather than relying on the AI agent to pace its requests. The AI doesn't know your API's rate limit, and even if you tell it, it can't reliably time its requests.
Implement a token bucket or sliding window rate limiter that queues requests when the limit is approached and returns a structured error when the queue is full.
Cleanup and Timeouts
Set timeouts on all external calls. An MCP tool that hangs indefinitely blocks the entire AI session. A tool that times out after 30 seconds and returns a structured error allows the AI to try a different approach.
Testing MCP Servers
Unit Tests Are Necessary But Insufficient
Standard unit tests verify that your MCP tool functions produce correct results given correct inputs. That's the minimum. The harder question is whether an AI agent can use your tools effectively.
Test With Real AI Agents
The only reliable way to test MCP tool usability is to have an AI agent use the tools. Set up test scenarios where the agent needs to accomplish a task using your MCP server. Watch for:
- Tool selection errors: Does the agent pick the right tool for the task?
- Parameter construction errors: Does the agent provide valid parameters?
- Error recovery: Does the agent handle error responses appropriately?
- Multi-step coordination: Can the agent chain multiple tool calls to accomplish complex tasks?
If the agent consistently struggles with your tools, the problem is usually in your schema descriptions, not in the agent's reasoning.
Load Testing for AI Workloads
AI agents generate different load patterns than human users. A human might call your API once per minute. An AI agent might call it twenty times in ten seconds while working through a problem. Test your MCP server under burst conditions, not just sustained load.
Integration With Skills
When building AI skills that depend on MCP tools, follow these patterns:
Declare dependencies explicitly. Your skill documentation should list every MCP server it requires, with installation instructions. Don't assume the user has anything installed beyond the base AI assistant.
Provide fallback behavior. When an MCP tool isn't available, the skill should degrade gracefully rather than failing entirely. A code review skill that can't connect to GitHub through MCP should still be able to review code from the local filesystem.
Document the interaction model. Explain in your skill instructions how the AI should use the MCP tools. Don't just list the tools -- describe the workflow: "First query the database for recent errors, then fetch the relevant log entries, then analyze the pattern."
Common Mistakes
Returning Too Much Data
AI agents have context limits. An MCP tool that returns a 50,000-line log file is worse than useless -- it overwhelms the agent's context and degrades performance for the rest of the session. Always paginate large results and let the AI request more if needed.
Mixing Read and Write Operations
A single tool that both reads data and modifies it is dangerous in AI workflows. The agent might call it intending to read, not realizing it's also writing. Separate read operations from write operations into distinct tools.
Ignoring Authentication Lifecycle
MCP servers that require authentication need to handle token refresh, expired sessions, and re-authentication prompts gracefully. An AI agent can't click a browser login button. Design your auth flow so the agent can detect auth failures and instruct the user to re-authenticate.
FAQ
How many tools should a single MCP server expose?
Keep it focused. Five to fifteen tools covering a single domain is ideal. Beyond twenty tools, AI agents struggle with tool selection. If you need more, split into multiple focused servers.
Should MCP tools return raw data or formatted text?
Return structured data (JSON) whenever possible. Let the AI agent format the data for the user. Raw text is harder for agents to parse and reason about.
How do I version MCP tools without breaking existing skills?
Add new tools rather than modifying existing ones. Use semantic naming like query_database_v2 when behavior changes significantly. Deprecate old tools by updating their descriptions to point to the replacement.
Can MCP tools call other MCP tools?
Technically yes, but avoid it. Tool-to-tool calls create hidden dependencies that are hard to debug. If two operations need to happen together, create a composite tool that handles both, or let the AI agent orchestrate the sequence.
What's the best language for building MCP servers?
TypeScript and Python have the most mature MCP SDKs. Choose whichever your team is more comfortable maintaining. The protocol is language-agnostic, so the choice doesn't affect compatibility.
Sources
- Model Context Protocol Specification - Official MCP protocol documentation
- Claude Code MCP Integration Guide - Anthropic's guide to MCP server development
- MCP Server Design Patterns - Community patterns for reliable MCP implementations
Explore production-ready AI skills at aiskill.market/browse or submit your own skill to the marketplace.