Headless Agents: Deploy Skills as MCP Servers — Full Guide from Concept to Three Clients
Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.
Abstract AlgorithmsMore actions⌄
Reading progress
33 min left
Metadata and pacing⌄
Total read
33 min
Sections
1
◴ On this page⌄
✣ Need another angle?⌄
Switch the article companion into a lower-complexity framing, then quiz yourself when you are ready.
1. Overview
Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.
Why it matters
TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three client deployment in 11 concrete steps.
Show high-level concept flow⌄
Ai Agents
Starting point
Mcp
Next concept
Llm
Next concept
Developer Tools
Next concept
Python
Outcome
At a glance
System lens
See Headless Agents: Deploy Skills as MCP Servers — Full Guide from Concept to Three Clients as a living topology.
Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.
Ai Agents
Ingress and assumptions
Mcp
State transition
Llm
State transition
Developer Tools
State transition
Python
Outcome and guarantees
Narrative transition
Move from explanation to operating judgment.
Use these checkpoints as the conceptual pacing layer before continuing into the full article.
!Why this matters
TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three client deployment in 11 concrete steps.
#Key section to watch
Use the first sections to identify the main mechanism and its constraints.
?Interview angle
Be ready to explain Ai Agents and Mcp with one concrete example and one tradeoff.
Tradeoff path 1
Ai Agents: speed-first
TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three client deployment in 11 concrete steps.
Tradeoff path 2
Mcp: reliability-first
📖 The Trapped Skill Problem: When a Great LLM Tool Can't Leave the IDE It Was Born In You spent an afternoon building a beautiful skill inside GitHub Copilot CLI: given a repository URL, it summarises the codebase, identifies the top changed files, and drafts a pull request description.
Failure rehearsal
Pressure-test the mental model.
Ai Agents misunderstood
High model quality can still produce incorrect outputs without grounding and verification.
Mitigation: Revisit the first principles and validate assumptions.
Risk 68%
Mcp tradeoff missed
Low latency does not automatically mean high throughput under contention.
Mitigation: Document the tradeoff and add an operational check.
Risk 58%
Back to the article
Continue into the authored sections with the topology in mind: each heading should now answer what changes, what can fail, and what guarantee the system is trying to preserve.
TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three-client deployment in 11 concrete steps.
📖 The Trapped Skill Problem: When a Great LLM Tool Can't Leave the IDE It Was Born In
You spent an afternoon building a beautiful skill inside GitHub Copilot CLI: given a repository URL, it summarises the codebase, identifies the top changed files, and drafts a pull-request description. It works every time you run it.
Then your teammate on Cursor asks if they can use it. Another colleague on Claude Desktop wants access too. You look at your implementation — a tightly coupled async function registered directly inside Copilot's extension API — and realise there is no clean way to share it. You would have to rewrite it for Cursor's tool format, then rewrite it again for Claude's function-calling schema, and maintain three versions forever.
This is the trapped skill problem: a useful LLM capability locked inside one tool's runtime. A developer pastes their summarize_pr_diff function into a Slack message because their teammate uses Cursor and can't call a Copilot skill. The function works perfectly. The sharing mechanism is broken.
The Model Context Protocol (MCP) is the solution. MCP is an open standard — originally developed by Anthropic and now implemented by Cursor, Claude Desktop, GitHub Copilot, and VS Code agent mode — that defines a single wire format for exposing tools, resources, and prompts from a server process. Write your skill as an MCP server once, and any MCP-aware client can discover and invoke it. No rewrites. No per-client adapters.
This post is the complete guide: understanding MCP's three-layer model, building a server step by step with the Python SDK, adding auth and Docker, and registering the same skill across three different clients — every command, every config file, every failure mode.
🔍 MCP Fundamentals: Protocol, Transports, and the Server Lifecycle
MCP has three moving parts: a client (the AI assistant — Cursor, Claude Desktop, Copilot), a server (your Python process exposing tools), and a transport (the channel that connects them).
The protocol is intentionally thin. At its core, MCP defines:
- Tool registration — the server advertises a list of callable tools with JSON Schema-typed parameters.
- Resource registration — the server can expose read-only data sources (files, database rows, API responses) the client can fetch.
- Prompt templates — reusable prompt fragments the client can request.
MCP transports come in two flavours:
| Transport | Mechanism | Best for |
| stdio | Client spawns the server as a child process; messages flow over stdin/stdout | Local, single-client, same machine |
| HTTP + SSE | Server runs as an HTTP daemon; client POSTs requests and receives Server-Sent Events back | Remote, multi-client, Docker/cloud |
The server lifecycle is predictable: on startup, the server sends a capabilities object declaring which tools it supports. The client stores this manifest and routes user intent to the right tool function. On each tool call, the server receives a JSON-RPC 2.0 request, executes the handler, and streams or returns the result.
What "registered" means in practice: Each MCP client maintains a config file — a JSON file in a platform-specific directory — that maps a server name to either a command to spawn (for stdio) or a URL to connect to (for HTTP+SSE). When you open the client, it reads this config, attempts to start or connect to every listed server, and calls tools/list to discover available tools. If your server is running and returns a valid schema, the tools appear in the client's tool picker within seconds. If anything in the chain fails silently, the tools simply don't appear — no error dialog, no log by default. That silence is why testing with the MCP Inspector (Step 4) and the debugging checklist (Step 11) are so important.
⚙️ Building Your MCP Server: From Empty Directory to Locally Tested Tool
These four steps take you from nothing to a fully verified local MCP server. Every subsequent section builds on this foundation.
Step 1 — Set Up the Project Structure
Create the directory layout and install dependencies:
mkdir mcp-pr-summarizer && cd mcp-pr-summarizer
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install "mcp>=1.0" fastmcp openai
Your directory should look like this:
mcp-pr-summarizer/
├── server.py
├── pyproject.toml
└── Dockerfile
The pyproject.toml declares the package and its runtime dependencies:
# pyproject.toml
[project]
name = "pr-summarizer-mcp"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = ["mcp>=1.0", "fastmcp", "openai"]
Step 2 — Write the Tool Function Using the Bare MCP SDK
The Server class from the MCP SDK is the core registry. You declare tools using two decorators: @server.list_tools() for the capability announcement and @server.call_tool() for the dispatcher. This section uses the bare SDK deliberately — it makes the wire format explicit. Step 10 shows the FastMCP shorthand that collapses this into a single decorator.
# server.py
import asyncio
import os
from mcp.server import Server
from mcp.server.stdio import stdio_server
import mcp.types as types
from openai import AsyncOpenAI
server = Server("pr-summarizer")
client = AsyncOpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
@server.list_tools()
async def list_tools() -> list[types.Tool]:
return [
types.Tool(
name="summarize_pr_diff",
description=(
"Summarize a GitHub PR diff into a human-readable description. "
"Returns a structured summary with an overview, list of key changes, "
"and suggested testing notes."
),
inputSchema={
"type": "object",
"properties": {
"diff": {
"type": "string",
"description": "The raw git diff content from the PR"
},
"target_audience": {
"type": "string",
"description": "Who will read this summary",
"default": "engineering team"
}
},
"required": ["diff"]
}
)
]
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
if name == "summarize_pr_diff":
return await _summarize_pr_diff(
diff=arguments["diff"],
target_audience=arguments.get("target_audience", "engineering team")
)
raise ValueError(f"Unknown tool: {name}")
async def _summarize_pr_diff(diff: str, target_audience: str) -> list[types.TextContent]:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"You write PR descriptions for a {target_audience}."},
{"role": "user", "content": f"Summarize this diff:\n\n{diff}"}
]
)
return [types.TextContent(type="text", text=response.choices[0].message.content)]
async def main():
async with stdio_server() as streams:
await server.run(*streams, server.create_initialization_options())
if __name__ == "__main__":
asyncio.run(main())
Three things to notice. First, @server.list_tools() is the capability announcement — it tells every client exactly what parameters to expect. Second, @server.call_tool() is the dispatcher — every tool call routes through this single handler. Third, the transport is wired in main() — swapping stdio_server for the HTTP+SSE transport leaves all tool logic untouched.
Step 3 — Add Input Validation with McpError
Raw exceptions must never propagate out of an MCP handler. Clients interpret unhandled exceptions as protocol errors and may silently drop the tool from their registry for the rest of the session. Always raise McpError with a structured ErrorData payload:
# server.py (updated call_tool and helper)
from mcp.shared.exceptions import McpError
from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
if name == "summarize_pr_diff":
return await _summarize_pr_diff(
diff=arguments.get("diff", ""),
target_audience=arguments.get("target_audience", "engineering team")
)
raise McpError(ErrorData(code=INVALID_PARAMS, message=f"Unknown tool: {name}"))
async def _summarize_pr_diff(diff: str, target_audience: str) -> list[types.TextContent]:
if not diff.strip():
raise McpError(ErrorData(code=INVALID_PARAMS, message="diff cannot be empty"))
if len(diff) > 100_000:
raise McpError(ErrorData(code=INVALID_PARAMS, message="diff exceeds 100 KB limit"))
try:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"You write PR descriptions for a {target_audience}."},
{"role": "user", "content": f"Summarize this diff:\n\n{diff}"}
]
)
return [types.TextContent(type="text", text=response.choices[0].message.content)]
except Exception as exc:
raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"LLM call failed: {exc}")) from exc
The pattern is deliberate: validate inputs first, wrap the LLM call in try/except, and always raise McpError — never a bare ValueError or RuntimeError. The error code constants (INVALID_PARAMS, INTERNAL_ERROR) are standard JSON-RPC 2.0 error codes that clients know how to surface cleanly in their UI.
Step 4 — Test Locally with MCP Inspector
The MCP Inspector is a browser-based debugging UI that connects directly to your server over stdio. It shows your tool schema, lets you send test calls, and displays raw request/response JSON — all before you configure a single client.
# Install MCP CLI tools if not already present
pip install "mcp[cli]"
# Launch Inspector against your server
mcp dev server.py
The command spawns your server as a child process, then opens http://localhost:5173 in your browser. You will see four key panels:
- Tools tab — your
summarize_pr_difftool listed with its full input schema rendered as a human-readable form - Call panel — a form pre-populated from the schema; fill in
diffand click Run to send a realtools/callJSON-RPC request - Messages tab — full JSON-RPC traffic log; paste raw request/response here when a mysterious client failure needs debugging
- Resources and Prompts tabs — if your server exposes resources or prompt templates, they appear here for interactive testing
If the tool does not appear in the Tools tab, the schema is malformed. The most common cause is a missing "type": "object" at the top level of inputSchema. Confirm the tool works with valid and invalid inputs in Inspector before configuring any client — if it fails here, it will fail everywhere.
Workflow tip: Keep Inspector open while you edit
server.py. Themcp devprocess supports hot-reload (added in MCP SDK 1.2), so schema description changes take effect without restarting the command. Iterate on descriptions and error messages here first.
📊 Visual Reference
flowchart TD
Pretrained["Pretrained
Model
(Frozen)"]
LoRA["LoRA Adapter
(Trainable)"]
Finetune["Fine-tune on
Custom Data"]
Result["Fine-tuned
Model"]
Pretrained --> LoRA
LoRA --> Finetune
Finetune --> Result
🧠 Deep Dive: How MCP Routes Messages Under the Hood
The Internals: JSON-RPC 2.0, Capability Negotiation, and Message Framing
MCP's wire format is JSON-RPC 2.0 — the same protocol powering Language Server Protocol (LSP). Every message has three fields: jsonrpc: "2.0", a method string, and either params (for requests) or result/error (for responses).
The initialization handshake is a two-step exchange:
- The client sends
initializewith its own capabilities (protocol version, supported features). - The server replies with
InitializeResultcontaining itscapabilities(the tools, resources, and prompt templates it can serve).
This capability negotiation means clients never need to hard-code what a server can do — they discover it at runtime. If a server is updated to expose a new tool, any connected client sees it on the next session without configuration changes. It is a deliberate design choice: the schema is live documentation.
Message framing on stdio uses newline-delimited JSON (NDJSON): each message is a single JSON object terminated by \n. The SDK handles framing automatically, but understanding it helps when debugging — you can attach a simple pipe logger between client and server to inspect raw traffic.
Message framing on HTTP+SSE works differently. The client POSTs a JSON-RPC request to /message. The server writes data: <json>\n\n chunks to the SSE stream. The connection stays open for the life of the session, which means long-running tool calls stream progress updates back incrementally rather than blocking until completion.
Tool Schema Design: The Four Fields That Determine Whether Your Tool Gets Called
When a client calls tools/list, your server returns a JSON array of tool descriptors. Each descriptor has three fields: name, description, and inputSchema. The inputSchema field is a standard JSON Schema object. Clients use it to: generate the invocation form in their UI; validate arguments before sending; and — critically — feed the description to the LLM that decides which tool to call for a user's intent.
The four fields that matter most, in order of impact:
| Field | Where it matters | What breaks without it |
inputSchema.type: "object" | All clients | Tool is rejected or silently skipped by strict parsers |
description (top-level) | LLM tool selection | LLM cannot match user intent to tool; tool is never invoked |
properties[x].description | Claude Desktop form, LLM prompt injection | User sees blank form fields; LLM uses wrong arguments |
required array | All clients | Optional fields treated as required; calls fail with missing param errors |
Why missing descriptions cause silent failures: When the LLM decides which tool to invoke, it reads the tool's description field as part of its context. A blank or vague description like "summarize" competes poorly against tools with rich descriptions. The tool exists in the registry but is functionally invisible to the model. The fix is always a concrete sentence that describes input, output, and use case — exactly what the Step 2 example shows.
The required array is your API contract: If you omit required, some clients assume all fields are required. Others assume none are. Behavior is unpredictable across clients. Always declare exactly which fields must be present, even if it is a single field. Treat additions to required as breaking changes with the same discipline you would apply to a public REST API.
Performance Analysis: Cold Starts, Schema Overhead, and SSE Connection Pooling
Understanding transport performance characteristics before you hit load helps you choose correctly from the start.
| Transport | Cold start latency | Per-call overhead | Max concurrent clients |
| stdio | 150–400 ms (Python process spawn + import) | ~0.1 ms (IPC) | 1 per spawning client |
| HTTP + SSE | 5–30 ms (HTTP connect to running process) | ~1–5 ms (TCP + headers) | Hundreds (asyncio event loop) |
| Docker + SSE | 500–2000 ms (first container start) | Same as HTTP + SSE | Same as HTTP + SSE |
Schema parsing overhead is negligible: the tools/list response is typically under 2 KB even for servers with ten tools. Schema parsing completes in under 1 ms on every client. Optimize your tool handler, not your schema.
SSE connection pooling: HTTP+SSE maintains a persistent connection per client session. If you run behind a reverse proxy (nginx, Caddy), configure proxy_read_timeout to at least 300 seconds to prevent the proxy from closing idle SSE connections during long LLM calls. A closed SSE connection looks like a normal disconnect to the client — it silently retries, but mid-call reconnects lose in-flight responses.
The dominant cost in any MCP server is always the tool handler itself. A gpt-4o-mini call takes 1–4 seconds. Optimizing transport overhead is like optimizing the envelope while the postal system takes three days. Focus on caching repeated LLM calls (functools.lru_cache for deterministic inputs, Redis for shared state) and using async HTTP clients everywhere.
For HTTP+SSE servers under real load, the Python asyncio event loop is single-threaded. CPU-bound work inside a handler will block other concurrent requests. The fix is asyncio.to_thread() for synchronous blocking calls, or splitting compute-heavy work into a background task queue (Celery, ARQ) that the handler awaits.
📊 Visual Reference
flowchart TD
Task["Task Type"]
Changing{Requires frequent
knowledge updates?}
RAG["Use RAG
(Dynamic retrieval)"]
Finetune["Use Fine-tuning
(Static weights)"]
Task --> Changing
Changing -->|Yes| RAG
Changing -->|No| Finetune
📊 From Local to Headless: HTTP+SSE Transport, Auth Middleware, and Docker
With a locally tested server in hand, these three steps lift it into a shared deployment that any client on any machine can reach. The call flow diagram at the end of this section shows how both transports converge at the same handler.
Step 5 — Switch to HTTP+SSE Transport
Replace the stdio_server entrypoint with SSE transport. The tool handlers are unchanged — only main() changes:
# server.py — updated main() for SSE
import uvicorn
from mcp.server.sse import SseServerTransport
from starlette.applications import Starlette
from starlette.routing import Route
sse = SseServerTransport("/messages/")
async def handle_sse(request):
async with sse.connect_sse(request.scope, request.receive, request._send) as streams:
await server.run(*streams, server.create_initialization_options())
starlette_app = Starlette(routes=[Route("/sse", endpoint=handle_sse)])
if __name__ == "__main__":
uvicorn.run(starlette_app, host="0.0.0.0", port=8080)
Test it immediately with curl before adding auth:
python server.py &
curl -N http://localhost:8080/sse
# Should emit: data: {"type":"endpoint","uri":"/messages/?session_id=..."}
Step 6 — Add Bearer Token Authentication
Wrap the SSE route with a Starlette middleware that checks the Authorization header. Never expose a running MCP server on any network interface without auth — an unauthenticated HTTP MCP server is effectively an open code-execution endpoint.
# server.py — auth middleware
import os
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse
MCP_AUTH_TOKEN = os.environ.get("MCP_AUTH_TOKEN", "")
class BearerTokenMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
if not MCP_AUTH_TOKEN:
return await call_next(request)
auth = request.headers.get("Authorization", "")
if auth != f"Bearer {MCP_AUTH_TOKEN}":
return JSONResponse({"error": "Unauthorized"}, status_code=401)
return await call_next(request)
starlette_app = Starlette(
routes=[Route("/sse", endpoint=handle_sse)],
middleware=[Middleware(BearerTokenMiddleware)]
)
Set MCP_AUTH_TOKEN in your environment before starting the server:
export MCP_AUTH_TOKEN="my-secret-token"
python server.py
Step 7 — Package as a Multi-Stage Docker Container
Use a multi-stage build to keep the image small. The first stage installs all dependencies; the second stage copies only the runtime artifacts and application code:
# Dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir "mcp>=1.0" fastmcp openai uvicorn starlette
FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY server.py .
EXPOSE 8080
CMD ["python", "server.py"]
Build and run:
docker build -t pr-summarizer-mcp .
docker run -d -p 8080:8080 \
-e OPENAI_API_KEY=sk-... \
-e MCP_AUTH_TOKEN=secret \
pr-summarizer-mcp
Verify the container is responding before configuring any clients:
curl -N -H "Authorization: Bearer secret" http://localhost:8080/sse
The MCP Call Flow: Both Transports to One Handler
The diagram below shows the complete round-trip from a user's intent in Cursor or Claude Desktop to the tool result returned by your Python server. Both transport paths converge at the same call_tool() handler — this is the architectural guarantee that makes switching transports zero-cost in terms of business logic.
flowchart TD
A[User types intent in Cursor / Claude Desktop] --> B[Client LLM resolves tool name + params]
B --> C{Transport?}
C -->|stdio local| D[Client spawns server as child process]
C -->|HTTP+SSE remote| E[Client POSTs to /message endpoint]
D --> F[JSON-RPC request over stdin]
E --> F
F --> G[MCP Server: capability check + dispatch]
G --> H["Tool handler: call_tool()"]
H --> I[Skill logic: LLM call / API / file I/O]
I --> J[TextContent result]
J --> K[JSON-RPC response over stdout / SSE stream]
K --> L[Client renders result to user]
Reading the diagram: the left branch (stdio) means the client manages the server's entire process lifecycle. The right branch (HTTP+SSE) means your server runs independently as a daemon; the client calls it over HTTP. Tool logic is identical in both cases.
🌍 Registering in Three Clients: Claude Desktop, Cursor, and VS Code
With the server running — either locally via stdio or in Docker via SSE — these three steps write the exact config entries needed for each client. The diagram below maps the full journey from Steps 1–11 so you can see where you are in the process.
flowchart TD
A[Write tool function - Steps 1 and 2] --> B[Add error handling - Step 3]
B --> C[Test with MCP Inspector - Step 4]
C --> D{Which transport?}
D -->|"Local / single dev"| E[Register via stdio - Steps 8 to 10]
D -->|"Shared / remote"| F[Switch to HTTP+SSE - Step 5]
F --> G[Add bearer token auth - Step 6]
G --> H[Package as Docker container - Step 7]
H --> I[docker run -p 8080:8080]
I --> J[Register via SSE URL - Steps 8 to 10]
E --> K[Claude Desktop sees tool]
J --> K
K --> L[Cursor sees tool]
L --> M[VS Code Copilot sees tool]
M --> N[Debug failures - Step 11]
The decision diamond is the key branch: stdio registration skips the Docker steps entirely and goes straight to config files. SSE registration requires the running server before the config files can point anywhere useful.
Step 8 — Register in Claude Desktop
Claude Desktop reads its server registry from a JSON config file in the OS application support directory. The file location is platform-specific:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
For stdio (local Python script):
{
"mcpServers": {
"pr-summarizer": {
"command": "python",
"args": ["/path/to/mcp-pr-summarizer/server.py"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
For SSE (Docker container or remote host):
{
"mcpServers": {
"pr-summarizer-remote": {
"url": "http://localhost:8080/sse",
"headers": {
"Authorization": "Bearer secret"
}
}
}
}
After saving the file, fully quit and reopen Claude Desktop (Cmd+Q on Mac, not just close the window). The tool should appear in the tool picker within the first new conversation. Claude Desktop caches the tools/list manifest aggressively — a window reload is not enough.
Step 9 — Register in Cursor
Cursor reads MCP config from ~/.cursor/mcp.json in your home directory (global, applies to all projects) or from .cursor/mcp.json in your project root (workspace-scoped, only that project). The workspace-scoped file takes precedence when both exist.
For stdio (global or workspace):
{
"mcpServers": {
"pr-summarizer": {
"command": "python",
"args": ["server.py"],
"cwd": "/path/to/mcp-pr-summarizer",
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
The cwd field is important for stdio: Cursor spawns the process with this as the working directory, so relative paths in args resolve correctly. For the SSE variant, use the same url / headers format as Claude Desktop — Cursor respects both formats. Reload Cursor's window after saving the file (Cmd+Shift+P → "Developer: Reload Window").
Step 10 — Register in VS Code and GitHub Copilot Agent Mode
VS Code reads MCP config from .vscode/mcp.json in your workspace root. Note the slightly different schema: VS Code uses "type": "stdio" as an explicit discriminator field, and supports ${workspaceFolder} variable substitution in paths — useful for portable configs that work on any developer's machine regardless of their absolute path setup.
For stdio:
{
"servers": {
"pr-summarizer": {
"type": "stdio",
"command": "python",
"args": ["server.py"],
"cwd": "${workspaceFolder}/mcp-pr-summarizer",
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}
For SSE:
{
"servers": {
"pr-summarizer-remote": {
"type": "sse",
"url": "http://localhost:8080/sse",
"headers": {
"Authorization": "Bearer secret"
}
}
}
}
The tool becomes available in Copilot's @workspace agent mode. In the VS Code chat panel, open agent mode and type a request involving your tool — Copilot will show the tool in its available tools list and invoke it when the intent matches.
How Teams Are Using This Today
Once your server is registered in all three clients, you unlock patterns that were previously impossible without per-client rewrites:
The platform-agnostic code review assistant. A mid-size engineering team with developers split across Cursor, Claude Desktop, and VS Code built a single MCP server wrapping three tools: summarize_pr_diff, lint_findings_summary, and test_coverage_report. The server runs as a Railway-hosted Docker container. Each developer's config file points to the same SSE URL. Whether someone uses Cursor's inline chat or Claude Desktop's sidebar, they call the same tools against the same backend — no configuration drift, no "works on my machine" breakdowns.
The private codebase search skill. A fintech team cannot send their internal codebase to external LLM APIs for semantic search. They run a local MCP server with a search_codebase tool that queries an internal Elasticsearch index over stdio transport. The tool runs entirely on the developer's machine and never touches an external network — OS process isolation is the security boundary.
The CI/CD summary bot. A DevOps team registered their MCP server in a GitHub Actions environment. The same summarize_deployment_diff tool that developers use interactively from their IDEs is also called headlessly in CI, generating a plain-English deployment summary posted as a PR comment. One registration, two usage modes.
Step 11 — Debug Common Registration Failures
When a tool does not appear after registration, work through this checklist in order — skipping steps wastes time:
- Check server startup: Run
python server.pymanually in a terminal. Any import error or missing environment variable will be visible immediately. - Run MCP Inspector:
mcp dev server.py. Confirm the tool appears in the Tools tab and the Call panel returns valid results before touching any client config. - Check the config file path: Claude Desktop will silently ignore a misplaced config file. Use the exact OS-specific path listed in Step 8.
- Check JSON syntax: A single misplaced comma in the config JSON causes the entire registry to fail silently. Validate the JSON with a linter before blaming anything else.
- Restart the client fully: Not reload — fully quit and reopen. Claude Desktop especially caches manifests across sessions.
- Check SSE reachability: If using SSE, run
curl -N <url>from the same machine as the client before blaming the config. A connection refused here means the server is not running or the port is wrong.
⚖️ Trade-offs and Failure Modes: What Breaks When You Register Across Three Clients
Every cross-client MCP deployment surfaces failure modes that do not appear in local testing. The table below covers the six most common, with the exact symptom you will see, the underlying cause, and the fix:
| Failure | Symptom | Root Cause | Fix |
| Tool not appearing | Tool absent from client UI after restart | Malformed inputSchema (missing "type": "object") or server crash on startup | Run mcp dev server.py; check Tools tab |
| Schema mismatch | "Missing required parameter" error on every call | required array lists a field your handler treats as optional | Align required array with handler defaults; test with Inspector |
| Connection refused | "Failed to connect to MCP server" in Cursor/Claude | SSE server not running when client starts, or wrong port in config | Confirm docker run is active; verify port matches config URL |
| Auth 401 | Tool call returns "Unauthorized" or silent empty response | Bearer token in config does not match MCP_AUTH_TOKEN env var | Re-check token in headers vs. server env; tokens are case-sensitive |
| Silent schema truncation | Tool appears but description is empty in UI | description field was null or omitted in list_tools return | Add a non-empty string to description in the Tool constructor |
| Stale tool list after update | Old tool signature showing after server update | Client cached the capability manifest from the previous session | Restart the client fully; some clients cache tools/list aggressively |
The most dangerous failure is the last one. Claude Desktop and Cursor both cache tool manifests across sessions. If you update your tool's inputSchema — add a parameter, change a description — restart the entire client application, not just the MCP connection. A running server with a new schema next to a cached old manifest produces unpredictable argument-passing behaviour that is very hard to trace.
Beyond these specific failure modes, the broader trade-offs between transports deserve explicit framing:
stdio — simplicity at the cost of scale. stdio is the easiest transport and the safest from a security perspective (no network surface). The failure mode is isolation: each client spawns its own copy of the server process. Five developers opening Claude Desktop simultaneously means five Python processes, five cold-start LLM calls, five independent caches. If your tool has warm-up cost (model loading, database connection pooling), stdio amplifies it linearly with users.
HTTP+SSE — power at the cost of operational complexity. The SSE connection is persistent, which means network interruptions (firewalls closing idle connections, load balancer timeouts) will silently drop the stream. Misconfigured reverse proxies with short idle timeouts are a common production surprise — set proxy_read_timeout to at least 300 seconds.
Versioning and schema drift. Because tools are discovered at runtime, a server upgrade that removes or renames a tool will silently break any client that cached the old capability manifest. Use semantic versioning in your server name (pr-summarizer-v2) and maintain backwards-compatible parameter aliases during transition windows: arguments.get("repo_url") or arguments.get("repo") as a migration shim costs nothing.
🧭 Decision Guide: stdio, SSE, or Container — Choosing Once and Choosing Right
| Situation | Recommendation |
| Use stdio when | The tool is for your own local use, runs on the same machine as the client, and you need zero infrastructure. Configuration is a single JSON entry; no ports, no auth. |
| Use HTTP+SSE when | Multiple developers need the same tool, or the server must run on a remote host. SSE supports hundreds of concurrent clients and persistent streaming responses. |
| Containerize when | The server needs to be available outside business hours, deployed to a shared environment, or reproduced identically across dev/staging/prod. Docker eliminates Python version and dependency drift. |
| Avoid SSE without auth when | The server is exposed on any network interface beyond localhost. An unauthenticated MCP server on a shared LAN is an open code-execution endpoint. |
| Avoid serverless (Lambda/Cloud Run) when | The tool has significant warm-up cost (model loading, connection pool establishment), your SSE sessions last more than 15 minutes, or latency SLOs are under 200 ms. |
| Use both transports in parallel when | You want local stdio for fast personal iteration and a shared SSE container for the team. The same server.py supports both — switch at startup via an environment variable: if os.environ.get("USE_SSE"): ... |
The simplest production pattern for a small team: one Railway or Fly.io container running SSE on port 8080, bearer token authentication via environment variable, and a shared config snippet that each developer pastes into their client config file. Total infrastructure cost: one small container at roughly $5/month.
🧪 Complete Worked Example: The Repo Summarizer From One Python File to Three Live Clients
Here is the complete repo summarizer skill as a single deployable file, callable from Cursor, Claude Desktop, and GitHub Copilot simultaneously. This is the production-ready version combining everything from the preceding steps.
The Complete Server File
# repo_summarizer_server.py
import asyncio
import os
from mcp.server import Server
from mcp.server.sse import SseServerTransport
from mcp import types
from mcp.shared.exceptions import McpError
from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS
import httpx
app = Server("repo-summarizer")
@app.list_tools()
async def list_tools() -> list[types.Tool]:
return [
types.Tool(
name="summarize_repo",
description="Fetch recent commits and open PRs for a GitHub repo and return a structured summary.",
inputSchema={
"type": "object",
"properties": {
"repo": {"type": "string", "description": "owner/repo (e.g. anthropics/mcp)"},
"days": {"type": "integer", "default": 7, "description": "Lookback window in days"}
},
"required": ["repo"]
}
)
]
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
if name != "summarize_repo":
raise McpError(ErrorData(code=INVALID_PARAMS, message=f"Unknown tool: {name}"))
repo = arguments.get("repo", "").strip()
if not repo or "/" not in repo:
raise McpError(ErrorData(code=INVALID_PARAMS, message="repo must be in owner/repo format"))
days = arguments.get("days", 7)
token = os.environ.get("GITHUB_TOKEN", "")
headers = {"Authorization": f"Bearer {token}"} if token else {}
try:
async with httpx.AsyncClient(headers=headers) as client:
commits_resp = await client.get(
f"https://api.github.com/repos/{repo}/commits",
params={"per_page": 20}
)
prs_resp = await client.get(
f"https://api.github.com/repos/{repo}/pulls",
params={"state": "open", "per_page": 10}
)
commits = [c["commit"]["message"].split("\n")[0] for c in commits_resp.json()[:10]]
prs = [f"#{p['number']}: {p['title']}" for p in prs_resp.json()[:5]]
summary = (
f"## {repo} — last {days} days\n\n"
f"**Recent commits ({len(commits)}):**\n"
+ "\n".join(f"- {c}" for c in commits)
+ f"\n\n**Open PRs ({len(prs)}):**\n"
+ "\n".join(f"- {p}" for p in prs)
)
return [types.TextContent(type="text", text=summary)]
except Exception as exc:
raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"GitHub API call failed: {exc}")) from exc
Dockerfile for Headless Deployment
FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir "mcp>=1.0" httpx uvicorn starlette
FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY repo_summarizer_server.py .
EXPOSE 8080
CMD ["python", "repo_summarizer_server.py"]
Deploy to Railway with railway up. The resulting URL (https://repo-summarizer.railway.app) is the endpoint you register in each client config.
Registering in Three Clients
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"repo-summarizer": {
"url": "https://repo-summarizer.railway.app/sse",
"headers": { "Authorization": "Bearer YOUR_TOKEN" }
}
}
}
Cursor (.cursor/mcp.json in the project root):
{
"mcpServers": {
"repo-summarizer": {
"url": "https://repo-summarizer.railway.app/sse",
"headers": { "Authorization": "Bearer YOUR_TOKEN" }
}
}
}
VS Code / GitHub Copilot (.vscode/mcp.json):
{
"servers": {
"repo-summarizer": {
"type": "sse",
"url": "https://repo-summarizer.railway.app/sse",
"headers": { "Authorization": "Bearer YOUR_TOKEN" }
}
}
}
All three now call the same Docker container. One deploy, three clients, zero duplicate skill code.
📊 Visual Reference
flowchart TD
Q["User Question"]
Step1["Step 1: Extract
Key Information"]
Step2["Step 2: Apply
Logic/Formula"]
Step3["Step 3: Verify
Reasoning"]
Answer["Final Answer"]
Q --> Step1
Step1 --> Step2
Step2 --> Step3
Step3 --> Answer
🛠️ FastMCP and MCP Inspector: Two Tools That Save Hours Every Week
FastMCP: Type Hints as JSON Schema
The raw mcp SDK is explicit and flexible, but its decorator pattern can feel ceremonial for small servers. FastMCP (jlowin/fastmcp) provides a @mcp.tool() decorator that mirrors FastAPI's ergonomics — Python type hints become the JSON Schema automatically, and you skip the list_tools / call_tool split entirely.
The same repo summarizer in FastMCP:
# fast_server.py
from fastmcp import FastMCP
import httpx, os
mcp = FastMCP("repo-summarizer")
@mcp.tool()
async def summarize_repo(repo: str, days: int = 7) -> str:
"""Fetch recent commits and open PRs for a GitHub repository and return a structured summary."""
token = os.environ.get("GITHUB_TOKEN", "")
headers = {"Authorization": f"Bearer {token}"} if token else {}
async with httpx.AsyncClient(headers=headers) as client:
commits = (await client.get(
f"https://api.github.com/repos/{repo}/commits",
params={"per_page": 20}, headers=headers
)).json()
prs = (await client.get(
f"https://api.github.com/repos/{repo}/pulls",
params={"state": "open", "per_page": 10}, headers=headers
)).json()
lines = [f"## {repo}", "", "**Commits:**"]
lines += [f"- {c['commit']['message'].split(chr(10))[0]}" for c in commits[:10]]
lines += ["", "**Open PRs:**"]
lines += [f"- #{p['number']}: {p['title']}" for p in prs[:5]]
return "\n".join(lines)
if __name__ == "__main__":
mcp.run()
FastMCP extracts the function signature, docstring, and type hints to build the full JSON Schema automatically. The mcp.run() call defaults to stdio but accepts a transport="sse" argument for HTTP deployment. Both approaches produce identical wire output; the bare SDK makes the schema structure explicit, which is useful when you need fine-grained control over descriptions and defaults. FastMCP is the right default for new servers.
Also worth knowing: Anthropic maintains modelcontextprotocol/servers, a reference implementations repo with production-ready servers for Postgres, filesystem, GitHub, Slack, and more — useful both as callable tools and as code templates.
MCP Inspector: Your First Line of Defence Against Silent Failures
The MCP Inspector (mcp dev) is the single most useful tool in the MCP development workflow. It ships with the mcp[cli] package, runs entirely locally, and connects directly to your server over stdio — giving you a canonical client view of your tool before any real client sees it.
What each tab shows and why it matters:
- Tools tab: Every tool your server advertises via
list_tools, with the full JSON Schema rendered as a form. If a tool is missing here, the real client will never see it. Debug schema issues here before touching any config file. - Call panel: A form pre-filled from the schema. Submitting it sends a real
tools/callJSON-RPC request and displays the raw response. Use this to confirm yourMcpErrorhandling works — send an emptydiffand verify you get a structured error, not a stack trace. - Messages tab: Full JSON-RPC traffic log for the session. When a client call fails mysteriously in production, the raw request/response from this tab gives you the ground truth to debug from.
- Resources and Prompts tabs: If your server exposes resources or prompt templates, they appear here for interactive testing using the same workflow.
The Inspector does not test client-specific behaviour — it uses a canonical MCP client implementation. If a tool works in Inspector but fails in Claude Desktop, the issue is almost always the config file (wrong path, wrong JSON key, wrong transport type), not the server itself. This distinction saves hours of misdirected debugging.
📊 Visual Reference
flowchart TD
Pretrained["Pretrained
Model
(Frozen)"]
LoRA["LoRA Adapter
(Trainable)"]
Finetune["Fine-tune on
Custom Data"]
Result["Fine-tuned
Model"]
Pretrained --> LoRA
LoRA --> Finetune
Finetune --> Result
📚 Lessons Learned: Six Things That Break in Production and How to Prevent Them
1. Write the tool description before the implementation. The description field is the single most important field in your schema — not for humans, but for the LLM that routes calls to your tool. A vague description like "summarizes diffs" competes poorly against "Summarize a GitHub PR diff into a human-readable description with overview, key changes, and testing notes". Write the description first, test it in Inspector's simulated tool selection, then implement the handler.
2. Never let raw exceptions leave a handler. An unhandled exception in call_tool() causes some clients to mark the tool as failed and stop invoking it for the rest of the session. Always wrap in McpError. This is not defensive programming — it is the MCP contract. Every production handler should have a try/except Exception as exc: raise McpError(ErrorData(code=INTERNAL_ERROR, ...)) from exc at its outermost level.
3. Test both transports before shipping. A tool that works perfectly over stdio may fail over SSE if it reads from environment variables that the Docker container does not have, or if it relies on a file path that exists locally but not in the container. Run mcp dev server.py for stdio, then docker run for SSE, before registering in any client config.
4. Restart clients fully after every schema change. Claude Desktop and Cursor both cache the tools/list response across sessions. Adding a parameter to a tool and only restarting the server means the client calls the old schema for the rest of the session. Always do a full client restart after any inputSchema change — this is non-obvious and responsible for a surprising number of "it's not working" reports on teams.
5. The required array is your public API contract. Treat additions to required with the same discipline as breaking changes to a REST API. Once a client caches your schema, removing a field from required is backwards-compatible. Adding a field to required will break any client that cached the old schema and does not provide the new field. Maintain backwards-compatible shims during transition: arguments.get("repo_url") or arguments.get("repo").
6. Use environment variables for all secrets — never hardcode in config files. The stdio config files (claude_desktop_config.json, .cursor/mcp.json, .vscode/mcp.json) are sometimes checked into version control by teams who want portable configs. An OPENAI_API_KEY or MCP_AUTH_TOKEN hardcoded in the env block of a JSON config that lands in a public repo is an expensive mistake. Use a .env file loaded by the server process, and document the required variables in your README instead.
📌 TLDR: What You Can Build After Reading This Post
- MCP is a universal adapter: any Python function exposed as an MCP server is automatically callable from Cursor, Claude Desktop, GitHub Copilot, and VS Code agent mode — without per-client rewrites.
- Two transports, two use cases: stdio for local/personal use (simpler, more secure, zero infrastructure); HTTP+SSE for remote/multi-client deployment (hundreds of concurrent clients, persistent streaming, shareable via URL).
- The Python SDK wires tools in three parts:
list_tools()for schema advertisement,call_tool()for dispatch, and a transport context manager for the wire layer. Only the transport context changes between stdio and SSE — the handlers are identical. - FastMCP removes boilerplate: type hints become JSON Schema automatically; the docstring becomes the
description;@mcp.tool()is the only decorator you need for most servers. - Steps 1–4 build the server; Steps 5–7 make it headless; Steps 8–10 register it in all three clients. Step 11 is the fallback when something goes wrong.
- MCP Inspector (
mcp dev) is always Step 1 when something breaks. If the tool works in Inspector, config file issues explain 90% of client-specific failures. - Production pitfalls: schema drift breaks clients silently; SSE connections drop under idle reverse-proxy timeouts; uninstrumented tool handlers are impossible to debug at scale; secrets in config JSON files leak when repos go public.
Key takeaways
- ✓TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three client deployment in 11 concrete steps.
- ✓📖 The Trapped Skill Problem: When a Great LLM Tool Can't Leave the IDE It Was Born In You spent an afternoon building a beautiful skill inside GitHub Copilot CLI: given a repository URL, it summarises the codebase, identifies the top changed files, and drafts a pull request description.
- ✓Then your teammate on Cursor asks if they can use it.
- ✓Another colleague on Claude Desktop wants access too.
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.
Reader feedback
Was this article useful?
Rate it before you leave, then follow or subscribe for the next deep dive.
Continue learning

Written by
Abstract Algorithms
@abstractalgorithms
Related deep dives

