Advanced33 min readAi AgentsMcpLlm

Headless Agents: Deploy Skills as MCP Servers — Full Guide from Concept to Three Clients

Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.

LLM Engineering

Abstract Algorithms

·Mar 28, 2026·33 min read

More actions⌄

Practice Interview Mock Discussion

Reading progress

33 min left

Metadata and pacing⌄

Total read

33 min

Sections

◴ On this page⌄

✣ Need another angle?⌄

Switch the article companion into a lower-complexity framing, then quiz yourself when you are ready.

Advanced33 min readAi AgentsMcpLlm

Headless Agents: Deploy Skills as MCP Servers — Full Guide from Concept to Three Clients

Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.

Abstract Algorithms

Mar 28, 2026 · 33 min read

Interview

Helpful?

Key learning issue

TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three client deployment in 11 concrete steps.

1. Overview

Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.

⌁

Why it matters

Show high-level concept flow⌄

Ai Agents

Starting point

→

Mcp

Next concept

→

Llm

Next concept

→

Developer Tools

Next concept

→

Python

Outcome

Committed

At a glance

DifficultyAdvanced ▥

Concepts5

Estimated time33 min

PrerequisitesAi Agents, Mcp

System lens

See Headless Agents: Deploy Skills as MCP Servers — Full Guide from Concept to Three Clients as a living topology.

Build a Python MCP server, add auth and Docker, and register it in Claude Desktop, Cursor, and VS Code — deploy once, call from everywhere.

Ai Agents

Ingress and assumptions

Mcp

State transition

Llm

State transition

Developer Tools

State transition

Python

Outcome and guarantees

The article becomes easier when every section maps to a state change, a guarantee, or a failure boundary.

Narrative transition

Move from explanation to operating judgment.

Use these checkpoints as the conceptual pacing layer before continuing into the full article.

!Why this matters

#Key section to watch

Use the first sections to identify the main mechanism and its constraints.

?Interview angle

Be ready to explain Ai Agents and Mcp with one concrete example and one tradeoff.

Tradeoff path 1

Ai Agents: speed-first

Tradeoff path 2

Mcp: reliability-first

📖 The Trapped Skill Problem: When a Great LLM Tool Can't Leave the IDE It Was Born In You spent an afternoon building a beautiful skill inside GitHub Copilot CLI: given a repository URL, it summarises the codebase, identifies the top changed files, and drafts a pull request description.

Failure rehearsal

Pressure-test the mental model.

Ai Agents misunderstood

High model quality can still produce incorrect outputs without grounding and verification.

Mitigation: Revisit the first principles and validate assumptions.

Risk 68%

Mcp tradeoff missed

Low latency does not automatically mean high throughput under contention.

Mitigation: Document the tradeoff and add an operational check.

Risk 58%

Back to the article

Continue into the authored sections with the topology in mind: each heading should now answer what changes, what can fail, and what guarantee the system is trying to preserve.

TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three-client deployment in 11 concrete steps.

📖 The Trapped Skill Problem: When a Great LLM Tool Can't Leave the IDE It Was Born In

You spent an afternoon building a beautiful skill inside GitHub Copilot CLI: given a repository URL, it summarises the codebase, identifies the top changed files, and drafts a pull-request description. It works every time you run it.

Then your teammate on Cursor asks if they can use it. Another colleague on Claude Desktop wants access too. You look at your implementation — a tightly coupled async function registered directly inside Copilot's extension API — and realise there is no clean way to share it. You would have to rewrite it for Cursor's tool format, then rewrite it again for Claude's function-calling schema, and maintain three versions forever.

This is the trapped skill problem: a useful LLM capability locked inside one tool's runtime. A developer pastes their summarize_pr_diff function into a Slack message because their teammate uses Cursor and can't call a Copilot skill. The function works perfectly. The sharing mechanism is broken.

The Model Context Protocol (MCP) is the solution. MCP is an open standard — originally developed by Anthropic and now implemented by Cursor, Claude Desktop, GitHub Copilot, and VS Code agent mode — that defines a single wire format for exposing tools, resources, and prompts from a server process. Write your skill as an MCP server once, and any MCP-aware client can discover and invoke it. No rewrites. No per-client adapters.

This post is the complete guide: understanding MCP's three-layer model, building a server step by step with the Python SDK, adding auth and Docker, and registering the same skill across three different clients — every command, every config file, every failure mode.

🔍 MCP Fundamentals: Protocol, Transports, and the Server Lifecycle

MCP has three moving parts: a client (the AI assistant — Cursor, Claude Desktop, Copilot), a server (your Python process exposing tools), and a transport (the channel that connects them).

The protocol is intentionally thin. At its core, MCP defines:

Tool registration — the server advertises a list of callable tools with JSON Schema-typed parameters.
Resource registration — the server can expose read-only data sources (files, database rows, API responses) the client can fetch.
Prompt templates — reusable prompt fragments the client can request.

MCP transports come in two flavours:

Transport	Mechanism	Best for
stdio	Client spawns the server as a child process; messages flow over stdin/stdout	Local, single-client, same machine
HTTP + SSE	Server runs as an HTTP daemon; client POSTs requests and receives Server-Sent Events back	Remote, multi-client, Docker/cloud

The server lifecycle is predictable: on startup, the server sends a capabilities object declaring which tools it supports. The client stores this manifest and routes user intent to the right tool function. On each tool call, the server receives a JSON-RPC 2.0 request, executes the handler, and streams or returns the result.

What "registered" means in practice: Each MCP client maintains a config file — a JSON file in a platform-specific directory — that maps a server name to either a command to spawn (for stdio) or a URL to connect to (for HTTP+SSE). When you open the client, it reads this config, attempts to start or connect to every listed server, and calls tools/list to discover available tools. If your server is running and returns a valid schema, the tools appear in the client's tool picker within seconds. If anything in the chain fails silently, the tools simply don't appear — no error dialog, no log by default. That silence is why testing with the MCP Inspector (Step 4) and the debugging checklist (Step 11) are so important.

⚙️ Building Your MCP Server: From Empty Directory to Locally Tested Tool

These four steps take you from nothing to a fully verified local MCP server. Every subsequent section builds on this foundation.

Step 1 — Set Up the Project Structure

Create the directory layout and install dependencies:

mkdir mcp-pr-summarizer && cd mcp-pr-summarizer
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install "mcp>=1.0" fastmcp openai

Your directory should look like this:

mcp-pr-summarizer/
├── server.py
├── pyproject.toml
└── Dockerfile

The pyproject.toml declares the package and its runtime dependencies:

# pyproject.toml
[project]
name = "pr-summarizer-mcp"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = ["mcp>=1.0", "fastmcp", "openai"]

Step 2 — Write the Tool Function Using the Bare MCP SDK

The Server class from the MCP SDK is the core registry. You declare tools using two decorators: @server.list_tools() for the capability announcement and @server.call_tool() for the dispatcher. This section uses the bare SDK deliberately — it makes the wire format explicit. Step 10 shows the FastMCP shorthand that collapses this into a single decorator.

# server.py
import asyncio
import os
from mcp.server import Server
from mcp.server.stdio import stdio_server
import mcp.types as types
from openai import AsyncOpenAI

server = Server("pr-summarizer")
client = AsyncOpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

@server.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="summarize_pr_diff",
            description=(
                "Summarize a GitHub PR diff into a human-readable description. "
                "Returns a structured summary with an overview, list of key changes, "
                "and suggested testing notes."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "diff": {
                        "type": "string",
                        "description": "The raw git diff content from the PR"
                    },
                    "target_audience": {
                        "type": "string",
                        "description": "Who will read this summary",
                        "default": "engineering team"
                    }
                },
                "required": ["diff"]
            }
        )
    ]

@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "summarize_pr_diff":
        return await _summarize_pr_diff(
            diff=arguments["diff"],
            target_audience=arguments.get("target_audience", "engineering team")
        )
    raise ValueError(f"Unknown tool: {name}")

async def _summarize_pr_diff(diff: str, target_audience: str) -> list[types.TextContent]:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"You write PR descriptions for a {target_audience}."},
            {"role": "user", "content": f"Summarize this diff:\n\n{diff}"}
        ]
    )
    return [types.TextContent(type="text", text=response.choices[0].message.content)]

async def main():
    async with stdio_server() as streams:
        await server.run(*streams, server.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())

Three things to notice. First, @server.list_tools() is the capability announcement — it tells every client exactly what parameters to expect. Second, @server.call_tool() is the dispatcher — every tool call routes through this single handler. Third, the transport is wired in main() — swapping stdio_server for the HTTP+SSE transport leaves all tool logic untouched.

Step 3 — Add Input Validation with McpError

Raw exceptions must never propagate out of an MCP handler. Clients interpret unhandled exceptions as protocol errors and may silently drop the tool from their registry for the rest of the session. Always raise McpError with a structured ErrorData payload:

# server.py (updated call_tool and helper)
from mcp.shared.exceptions import McpError
from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS

@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "summarize_pr_diff":
        return await _summarize_pr_diff(
            diff=arguments.get("diff", ""),
            target_audience=arguments.get("target_audience", "engineering team")
        )
    raise McpError(ErrorData(code=INVALID_PARAMS, message=f"Unknown tool: {name}"))

async def _summarize_pr_diff(diff: str, target_audience: str) -> list[types.TextContent]:
    if not diff.strip():
        raise McpError(ErrorData(code=INVALID_PARAMS, message="diff cannot be empty"))
    if len(diff) > 100_000:
        raise McpError(ErrorData(code=INVALID_PARAMS, message="diff exceeds 100 KB limit"))
    try:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": f"You write PR descriptions for a {target_audience}."},
                {"role": "user", "content": f"Summarize this diff:\n\n{diff}"}
            ]
        )
        return [types.TextContent(type="text", text=response.choices[0].message.content)]
    except Exception as exc:
        raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"LLM call failed: {exc}")) from exc

The pattern is deliberate: validate inputs first, wrap the LLM call in try/except, and always raise McpError — never a bare ValueError or RuntimeError. The error code constants (INVALID_PARAMS, INTERNAL_ERROR) are standard JSON-RPC 2.0 error codes that clients know how to surface cleanly in their UI.

Step 4 — Test Locally with MCP Inspector

The MCP Inspector is a browser-based debugging UI that connects directly to your server over stdio. It shows your tool schema, lets you send test calls, and displays raw request/response JSON — all before you configure a single client.

# Install MCP CLI tools if not already present
pip install "mcp[cli]"

# Launch Inspector against your server
mcp dev server.py

The command spawns your server as a child process, then opens http://localhost:5173 in your browser. You will see four key panels:

Tools tab — your summarize_pr_diff tool listed with its full input schema rendered as a human-readable form
Call panel — a form pre-populated from the schema; fill in diff and click Run to send a real tools/call JSON-RPC request
Messages tab — full JSON-RPC traffic log; paste raw request/response here when a mysterious client failure needs debugging
Resources and Prompts tabs — if your server exposes resources or prompt templates, they appear here for interactive testing

If the tool does not appear in the Tools tab, the schema is malformed. The most common cause is a missing "type": "object" at the top level of inputSchema. Confirm the tool works with valid and invalid inputs in Inspector before configuring any client — if it fails here, it will fail everywhere.

Workflow tip: Keep Inspector open while you edit server.py. The mcp dev process supports hot-reload (added in MCP SDK 1.2), so schema description changes take effect without restarting the command. Iterate on descriptions and error messages here first.

📊 Visual Reference

flowchart TD
    Pretrained["Pretrained
Model
(Frozen)"]
    LoRA["LoRA Adapter
(Trainable)"]
    Finetune["Fine-tune on
Custom Data"]
    Result["Fine-tuned
Model"]

    Pretrained --> LoRA
    LoRA --> Finetune
    Finetune --> Result

🧠 Deep Dive: How MCP Routes Messages Under the Hood

The Internals: JSON-RPC 2.0, Capability Negotiation, and Message Framing

MCP's wire format is JSON-RPC 2.0 — the same protocol powering Language Server Protocol (LSP). Every message has three fields: jsonrpc: "2.0", a method string, and either params (for requests) or result/error (for responses).

The initialization handshake is a two-step exchange:

The client sends initialize with its own capabilities (protocol version, supported features).
The server replies with InitializeResult containing its capabilities (the tools, resources, and prompt templates it can serve).

This capability negotiation means clients never need to hard-code what a server can do — they discover it at runtime. If a server is updated to expose a new tool, any connected client sees it on the next session without configuration changes. It is a deliberate design choice: the schema is live documentation.

Message framing on stdio uses newline-delimited JSON (NDJSON): each message is a single JSON object terminated by \n. The SDK handles framing automatically, but understanding it helps when debugging — you can attach a simple pipe logger between client and server to inspect raw traffic.

Message framing on HTTP+SSE works differently. The client POSTs a JSON-RPC request to /message. The server writes data: <json>\n\n chunks to the SSE stream. The connection stays open for the life of the session, which means long-running tool calls stream progress updates back incrementally rather than blocking until completion.

Tool Schema Design: The Four Fields That Determine Whether Your Tool Gets Called

When a client calls tools/list, your server returns a JSON array of tool descriptors. Each descriptor has three fields: name, description, and inputSchema. The inputSchema field is a standard JSON Schema object. Clients use it to: generate the invocation form in their UI; validate arguments before sending; and — critically — feed the description to the LLM that decides which tool to call for a user's intent.

The four fields that matter most, in order of impact:

Field	Where it matters	What breaks without it
`inputSchema.type: "object"`	All clients	Tool is rejected or silently skipped by strict parsers
`description` (top-level)	LLM tool selection	LLM cannot match user intent to tool; tool is never invoked
`properties[x].description`	Claude Desktop form, LLM prompt injection	User sees blank form fields; LLM uses wrong arguments
`required` array	All clients	Optional fields treated as required; calls fail with missing param errors

Why missing descriptions cause silent failures: When the LLM decides which tool to invoke, it reads the tool's description field as part of its context. A blank or vague description like "summarize" competes poorly against tools with rich descriptions. The tool exists in the registry but is functionally invisible to the model. The fix is always a concrete sentence that describes input, output, and use case — exactly what the Step 2 example shows.

The required array is your API contract: If you omit required, some clients assume all fields are required. Others assume none are. Behavior is unpredictable across clients. Always declare exactly which fields must be present, even if it is a single field. Treat additions to required as breaking changes with the same discipline you would apply to a public REST API.

Performance Analysis: Cold Starts, Schema Overhead, and SSE Connection Pooling

Understanding transport performance characteristics before you hit load helps you choose correctly from the start.

Transport	Cold start latency	Per-call overhead	Max concurrent clients
stdio	150–400 ms (Python process spawn + import)	~0.1 ms (IPC)	1 per spawning client
HTTP + SSE	5–30 ms (HTTP connect to running process)	~1–5 ms (TCP + headers)	Hundreds (asyncio event loop)
Docker + SSE	500–2000 ms (first container start)	Same as HTTP + SSE	Same as HTTP + SSE

Schema parsing overhead is negligible: the tools/list response is typically under 2 KB even for servers with ten tools. Schema parsing completes in under 1 ms on every client. Optimize your tool handler, not your schema.

SSE connection pooling: HTTP+SSE maintains a persistent connection per client session. If you run behind a reverse proxy (nginx, Caddy), configure proxy_read_timeout to at least 300 seconds to prevent the proxy from closing idle SSE connections during long LLM calls. A closed SSE connection looks like a normal disconnect to the client — it silently retries, but mid-call reconnects lose in-flight responses.

The dominant cost in any MCP server is always the tool handler itself. A gpt-4o-mini call takes 1–4 seconds. Optimizing transport overhead is like optimizing the envelope while the postal system takes three days. Focus on caching repeated LLM calls (functools.lru_cache for deterministic inputs, Redis for shared state) and using async HTTP clients everywhere.

For HTTP+SSE servers under real load, the Python asyncio event loop is single-threaded. CPU-bound work inside a handler will block other concurrent requests. The fix is asyncio.to_thread() for synchronous blocking calls, or splitting compute-heavy work into a background task queue (Celery, ARQ) that the handler awaits.

📊 Visual Reference

flowchart TD
    Task["Task Type"]
    Changing{Requires frequent
knowledge updates?}
    RAG["Use RAG
(Dynamic retrieval)"]
    Finetune["Use Fine-tuning
(Static weights)"]

    Task --> Changing
    Changing -->|Yes| RAG
    Changing -->|No| Finetune

📊 From Local to Headless: HTTP+SSE Transport, Auth Middleware, and Docker

With a locally tested server in hand, these three steps lift it into a shared deployment that any client on any machine can reach. The call flow diagram at the end of this section shows how both transports converge at the same handler.

Step 5 — Switch to HTTP+SSE Transport

Replace the stdio_server entrypoint with SSE transport. The tool handlers are unchanged — only main() changes:

# server.py — updated main() for SSE
import uvicorn
from mcp.server.sse import SseServerTransport
from starlette.applications import Starlette
from starlette.routing import Route

sse = SseServerTransport("/messages/")

async def handle_sse(request):
    async with sse.connect_sse(request.scope, request.receive, request._send) as streams:
        await server.run(*streams, server.create_initialization_options())

starlette_app = Starlette(routes=[Route("/sse", endpoint=handle_sse)])

if __name__ == "__main__":
    uvicorn.run(starlette_app, host="0.0.0.0", port=8080)

Test it immediately with curl before adding auth:

python server.py &
curl -N http://localhost:8080/sse
# Should emit: data: {"type":"endpoint","uri":"/messages/?session_id=..."}

Step 6 — Add Bearer Token Authentication

Wrap the SSE route with a Starlette middleware that checks the Authorization header. Never expose a running MCP server on any network interface without auth — an unauthenticated HTTP MCP server is effectively an open code-execution endpoint.

# server.py — auth middleware
import os
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse

MCP_AUTH_TOKEN = os.environ.get("MCP_AUTH_TOKEN", "")

class BearerTokenMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        if not MCP_AUTH_TOKEN:
            return await call_next(request)
        auth = request.headers.get("Authorization", "")
        if auth != f"Bearer {MCP_AUTH_TOKEN}":
            return JSONResponse({"error": "Unauthorized"}, status_code=401)
        return await call_next(request)

starlette_app = Starlette(
    routes=[Route("/sse", endpoint=handle_sse)],
    middleware=[Middleware(BearerTokenMiddleware)]
)

Set MCP_AUTH_TOKEN in your environment before starting the server:

export MCP_AUTH_TOKEN="my-secret-token"
python server.py

Step 7 — Package as a Multi-Stage Docker Container

Use a multi-stage build to keep the image small. The first stage installs all dependencies; the second stage copies only the runtime artifacts and application code:

# Dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir "mcp>=1.0" fastmcp openai uvicorn starlette

FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY server.py .
EXPOSE 8080
CMD ["python", "server.py"]

Build and run:

docker build -t pr-summarizer-mcp .
docker run -d -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -e MCP_AUTH_TOKEN=secret \
  pr-summarizer-mcp

Verify the container is responding before configuring any clients:

curl -N -H "Authorization: Bearer secret" http://localhost:8080/sse

The MCP Call Flow: Both Transports to One Handler

The diagram below shows the complete round-trip from a user's intent in Cursor or Claude Desktop to the tool result returned by your Python server. Both transport paths converge at the same call_tool() handler — this is the architectural guarantee that makes switching transports zero-cost in terms of business logic.

flowchart TD
    A[User types intent in Cursor / Claude Desktop] --> B[Client LLM resolves tool name + params]
    B --> C{Transport?}
    C -->|stdio local| D[Client spawns server as child process]
    C -->|HTTP+SSE remote| E[Client POSTs to /message endpoint]
    D --> F[JSON-RPC request over stdin]
    E --> F
    F --> G[MCP Server: capability check + dispatch]
    G --> H["Tool handler: call_tool()"]
    H --> I[Skill logic: LLM call / API / file I/O]
    I --> J[TextContent result]
    J --> K[JSON-RPC response over stdout / SSE stream]
    K --> L[Client renders result to user]

Reading the diagram: the left branch (stdio) means the client manages the server's entire process lifecycle. The right branch (HTTP+SSE) means your server runs independently as a daemon; the client calls it over HTTP. Tool logic is identical in both cases.

🌍 Registering in Three Clients: Claude Desktop, Cursor, and VS Code

With the server running — either locally via stdio or in Docker via SSE — these three steps write the exact config entries needed for each client. The diagram below maps the full journey from Steps 1–11 so you can see where you are in the process.

flowchart TD
    A[Write tool function - Steps 1 and 2] --> B[Add error handling - Step 3]
    B --> C[Test with MCP Inspector - Step 4]
    C --> D{Which transport?}
    D -->|"Local / single dev"| E[Register via stdio - Steps 8 to 10]
    D -->|"Shared / remote"| F[Switch to HTTP+SSE - Step 5]
    F --> G[Add bearer token auth - Step 6]
    G --> H[Package as Docker container - Step 7]
    H --> I[docker run -p 8080:8080]
    I --> J[Register via SSE URL - Steps 8 to 10]
    E --> K[Claude Desktop sees tool]
    J --> K
    K --> L[Cursor sees tool]
    L --> M[VS Code Copilot sees tool]
    M --> N[Debug failures - Step 11]

The decision diamond is the key branch: stdio registration skips the Docker steps entirely and goes straight to config files. SSE registration requires the running server before the config files can point anywhere useful.

Step 8 — Register in Claude Desktop

Claude Desktop reads its server registry from a JSON config file in the OS application support directory. The file location is platform-specific:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

For stdio (local Python script):

{
  "mcpServers": {
    "pr-summarizer": {
      "command": "python",
      "args": ["/path/to/mcp-pr-summarizer/server.py"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

For SSE (Docker container or remote host):

{
  "mcpServers": {
    "pr-summarizer-remote": {
      "url": "http://localhost:8080/sse",
      "headers": {
        "Authorization": "Bearer secret"
      }
    }
  }
}

After saving the file, fully quit and reopen Claude Desktop (Cmd+Q on Mac, not just close the window). The tool should appear in the tool picker within the first new conversation. Claude Desktop caches the tools/list manifest aggressively — a window reload is not enough.

Step 9 — Register in Cursor

Cursor reads MCP config from ~/.cursor/mcp.json in your home directory (global, applies to all projects) or from .cursor/mcp.json in your project root (workspace-scoped, only that project). The workspace-scoped file takes precedence when both exist.

For stdio (global or workspace):

{
  "mcpServers": {
    "pr-summarizer": {
      "command": "python",
      "args": ["server.py"],
      "cwd": "/path/to/mcp-pr-summarizer",
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

The cwd field is important for stdio: Cursor spawns the process with this as the working directory, so relative paths in args resolve correctly. For the SSE variant, use the same url / headers format as Claude Desktop — Cursor respects both formats. Reload Cursor's window after saving the file (Cmd+Shift+P → "Developer: Reload Window").

Step 10 — Register in VS Code and GitHub Copilot Agent Mode

VS Code reads MCP config from .vscode/mcp.json in your workspace root. Note the slightly different schema: VS Code uses "type": "stdio" as an explicit discriminator field, and supports ${workspaceFolder} variable substitution in paths — useful for portable configs that work on any developer's machine regardless of their absolute path setup.

For stdio:

{
  "servers": {
    "pr-summarizer": {
      "type": "stdio",
      "command": "python",
      "args": ["server.py"],
      "cwd": "${workspaceFolder}/mcp-pr-summarizer",
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

For SSE:

{
  "servers": {
    "pr-summarizer-remote": {
      "type": "sse",
      "url": "http://localhost:8080/sse",
      "headers": {
        "Authorization": "Bearer secret"
      }
    }
  }
}

The tool becomes available in Copilot's @workspace agent mode. In the VS Code chat panel, open agent mode and type a request involving your tool — Copilot will show the tool in its available tools list and invoke it when the intent matches.

How Teams Are Using This Today

Once your server is registered in all three clients, you unlock patterns that were previously impossible without per-client rewrites:

The platform-agnostic code review assistant. A mid-size engineering team with developers split across Cursor, Claude Desktop, and VS Code built a single MCP server wrapping three tools: summarize_pr_diff, lint_findings_summary, and test_coverage_report. The server runs as a Railway-hosted Docker container. Each developer's config file points to the same SSE URL. Whether someone uses Cursor's inline chat or Claude Desktop's sidebar, they call the same tools against the same backend — no configuration drift, no "works on my machine" breakdowns.

The private codebase search skill. A fintech team cannot send their internal codebase to external LLM APIs for semantic search. They run a local MCP server with a search_codebase tool that queries an internal Elasticsearch index over stdio transport. The tool runs entirely on the developer's machine and never touches an external network — OS process isolation is the security boundary.

The CI/CD summary bot. A DevOps team registered their MCP server in a GitHub Actions environment. The same summarize_deployment_diff tool that developers use interactively from their IDEs is also called headlessly in CI, generating a plain-English deployment summary posted as a PR comment. One registration, two usage modes.

Step 11 — Debug Common Registration Failures

When a tool does not appear after registration, work through this checklist in order — skipping steps wastes time:

Check server startup: Run python server.py manually in a terminal. Any import error or missing environment variable will be visible immediately.
Run MCP Inspector: mcp dev server.py. Confirm the tool appears in the Tools tab and the Call panel returns valid results before touching any client config.
Check the config file path: Claude Desktop will silently ignore a misplaced config file. Use the exact OS-specific path listed in Step 8.
Check JSON syntax: A single misplaced comma in the config JSON causes the entire registry to fail silently. Validate the JSON with a linter before blaming anything else.
Restart the client fully: Not reload — fully quit and reopen. Claude Desktop especially caches manifests across sessions.
Check SSE reachability: If using SSE, run curl -N <url> from the same machine as the client before blaming the config. A connection refused here means the server is not running or the port is wrong.

⚖️ Trade-offs and Failure Modes: What Breaks When You Register Across Three Clients

Every cross-client MCP deployment surfaces failure modes that do not appear in local testing. The table below covers the six most common, with the exact symptom you will see, the underlying cause, and the fix:

Failure	Symptom	Root Cause	Fix
Tool not appearing	Tool absent from client UI after restart	Malformed `inputSchema` (missing `"type": "object"`) or server crash on startup	Run `mcp dev server.py`; check Tools tab
Schema mismatch	"Missing required parameter" error on every call	`required` array lists a field your handler treats as optional	Align `required` array with handler defaults; test with Inspector
Connection refused	"Failed to connect to MCP server" in Cursor/Claude	SSE server not running when client starts, or wrong port in config	Confirm `docker run` is active; verify port matches config URL
Auth 401	Tool call returns "Unauthorized" or silent empty response	Bearer token in config does not match `MCP_AUTH_TOKEN` env var	Re-check token in `headers` vs. server env; tokens are case-sensitive
Silent schema truncation	Tool appears but description is empty in UI	`description` field was `null` or omitted in `list_tools` return	Add a non-empty string to `description` in the `Tool` constructor
Stale tool list after update	Old tool signature showing after server update	Client cached the capability manifest from the previous session	Restart the client fully; some clients cache `tools/list` aggressively

The most dangerous failure is the last one. Claude Desktop and Cursor both cache tool manifests across sessions. If you update your tool's inputSchema — add a parameter, change a description — restart the entire client application, not just the MCP connection. A running server with a new schema next to a cached old manifest produces unpredictable argument-passing behaviour that is very hard to trace.

Beyond these specific failure modes, the broader trade-offs between transports deserve explicit framing:

stdio — simplicity at the cost of scale. stdio is the easiest transport and the safest from a security perspective (no network surface). The failure mode is isolation: each client spawns its own copy of the server process. Five developers opening Claude Desktop simultaneously means five Python processes, five cold-start LLM calls, five independent caches. If your tool has warm-up cost (model loading, database connection pooling), stdio amplifies it linearly with users.

HTTP+SSE — power at the cost of operational complexity. The SSE connection is persistent, which means network interruptions (firewalls closing idle connections, load balancer timeouts) will silently drop the stream. Misconfigured reverse proxies with short idle timeouts are a common production surprise — set proxy_read_timeout to at least 300 seconds.

Versioning and schema drift. Because tools are discovered at runtime, a server upgrade that removes or renames a tool will silently break any client that cached the old capability manifest. Use semantic versioning in your server name (pr-summarizer-v2) and maintain backwards-compatible parameter aliases during transition windows: arguments.get("repo_url") or arguments.get("repo") as a migration shim costs nothing.

🧭 Decision Guide: stdio, SSE, or Container — Choosing Once and Choosing Right

Situation	Recommendation
Use stdio when	The tool is for your own local use, runs on the same machine as the client, and you need zero infrastructure. Configuration is a single JSON entry; no ports, no auth.
Use HTTP+SSE when	Multiple developers need the same tool, or the server must run on a remote host. SSE supports hundreds of concurrent clients and persistent streaming responses.
Containerize when	The server needs to be available outside business hours, deployed to a shared environment, or reproduced identically across dev/staging/prod. Docker eliminates Python version and dependency drift.
Avoid SSE without auth when	The server is exposed on any network interface beyond `localhost`. An unauthenticated MCP server on a shared LAN is an open code-execution endpoint.
Avoid serverless (Lambda/Cloud Run) when	The tool has significant warm-up cost (model loading, connection pool establishment), your SSE sessions last more than 15 minutes, or latency SLOs are under 200 ms.
Use both transports in parallel when	You want local stdio for fast personal iteration and a shared SSE container for the team. The same `server.py` supports both — switch at startup via an environment variable: `if os.environ.get("USE_SSE"): ...`

The simplest production pattern for a small team: one Railway or Fly.io container running SSE on port 8080, bearer token authentication via environment variable, and a shared config snippet that each developer pastes into their client config file. Total infrastructure cost: one small container at roughly $5/month.

🧪 Complete Worked Example: The Repo Summarizer From One Python File to Three Live Clients

Here is the complete repo summarizer skill as a single deployable file, callable from Cursor, Claude Desktop, and GitHub Copilot simultaneously. This is the production-ready version combining everything from the preceding steps.

The Complete Server File

# repo_summarizer_server.py
import asyncio
import os
from mcp.server import Server
from mcp.server.sse import SseServerTransport
from mcp import types
from mcp.shared.exceptions import McpError
from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS
import httpx

app = Server("repo-summarizer")

@app.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="summarize_repo",
            description="Fetch recent commits and open PRs for a GitHub repo and return a structured summary.",
            inputSchema={
                "type": "object",
                "properties": {
                    "repo": {"type": "string", "description": "owner/repo (e.g. anthropics/mcp)"},
                    "days": {"type": "integer", "default": 7, "description": "Lookback window in days"}
                },
                "required": ["repo"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name != "summarize_repo":
        raise McpError(ErrorData(code=INVALID_PARAMS, message=f"Unknown tool: {name}"))
    repo = arguments.get("repo", "").strip()
    if not repo or "/" not in repo:
        raise McpError(ErrorData(code=INVALID_PARAMS, message="repo must be in owner/repo format"))
    days = arguments.get("days", 7)
    token = os.environ.get("GITHUB_TOKEN", "")
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    try:
        async with httpx.AsyncClient(headers=headers) as client:
            commits_resp = await client.get(
                f"https://api.github.com/repos/{repo}/commits",
                params={"per_page": 20}
            )
            prs_resp = await client.get(
                f"https://api.github.com/repos/{repo}/pulls",
                params={"state": "open", "per_page": 10}
            )
        commits = [c["commit"]["message"].split("\n")[0] for c in commits_resp.json()[:10]]
        prs = [f"#{p['number']}: {p['title']}" for p in prs_resp.json()[:5]]
        summary = (
            f"## {repo} — last {days} days\n\n"
            f"**Recent commits ({len(commits)}):**\n"
            + "\n".join(f"- {c}" for c in commits)
            + f"\n\n**Open PRs ({len(prs)}):**\n"
            + "\n".join(f"- {p}" for p in prs)
        )
        return [types.TextContent(type="text", text=summary)]
    except Exception as exc:
        raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"GitHub API call failed: {exc}")) from exc

Dockerfile for Headless Deployment

FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir "mcp>=1.0" httpx uvicorn starlette

FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY repo_summarizer_server.py .
EXPOSE 8080
CMD ["python", "repo_summarizer_server.py"]

Deploy to Railway with railway up. The resulting URL (https://repo-summarizer.railway.app) is the endpoint you register in each client config.

Registering in Three Clients

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "repo-summarizer": {
      "url": "https://repo-summarizer.railway.app/sse",
      "headers": { "Authorization": "Bearer YOUR_TOKEN" }
    }
  }
}

Cursor (.cursor/mcp.json in the project root):

{
  "mcpServers": {
    "repo-summarizer": {
      "url": "https://repo-summarizer.railway.app/sse",
      "headers": { "Authorization": "Bearer YOUR_TOKEN" }
    }
  }
}

VS Code / GitHub Copilot (.vscode/mcp.json):

{
  "servers": {
    "repo-summarizer": {
      "type": "sse",
      "url": "https://repo-summarizer.railway.app/sse",
      "headers": { "Authorization": "Bearer YOUR_TOKEN" }
    }
  }
}

All three now call the same Docker container. One deploy, three clients, zero duplicate skill code.

📊 Visual Reference

flowchart TD
    Q["User Question"]
    Step1["Step 1: Extract
Key Information"]
    Step2["Step 2: Apply
Logic/Formula"]
    Step3["Step 3: Verify
Reasoning"]
    Answer["Final Answer"]

    Q --> Step1
    Step1 --> Step2
    Step2 --> Step3
    Step3 --> Answer

🛠️ FastMCP and MCP Inspector: Two Tools That Save Hours Every Week

FastMCP: Type Hints as JSON Schema

The raw mcp SDK is explicit and flexible, but its decorator pattern can feel ceremonial for small servers. FastMCP (jlowin/fastmcp) provides a @mcp.tool() decorator that mirrors FastAPI's ergonomics — Python type hints become the JSON Schema automatically, and you skip the list_tools / call_tool split entirely.

The same repo summarizer in FastMCP:

# fast_server.py
from fastmcp import FastMCP
import httpx, os

mcp = FastMCP("repo-summarizer")

@mcp.tool()
async def summarize_repo(repo: str, days: int = 7) -> str:
    """Fetch recent commits and open PRs for a GitHub repository and return a structured summary."""
    token = os.environ.get("GITHUB_TOKEN", "")
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    async with httpx.AsyncClient(headers=headers) as client:
        commits = (await client.get(
            f"https://api.github.com/repos/{repo}/commits",
            params={"per_page": 20}, headers=headers
        )).json()
        prs = (await client.get(
            f"https://api.github.com/repos/{repo}/pulls",
            params={"state": "open", "per_page": 10}, headers=headers
        )).json()
    lines = [f"## {repo}", "", "**Commits:**"]
    lines += [f"- {c['commit']['message'].split(chr(10))[0]}" for c in commits[:10]]
    lines += ["", "**Open PRs:**"]
    lines += [f"- #{p['number']}: {p['title']}" for p in prs[:5]]
    return "\n".join(lines)

if __name__ == "__main__":
    mcp.run()

FastMCP extracts the function signature, docstring, and type hints to build the full JSON Schema automatically. The mcp.run() call defaults to stdio but accepts a transport="sse" argument for HTTP deployment. Both approaches produce identical wire output; the bare SDK makes the schema structure explicit, which is useful when you need fine-grained control over descriptions and defaults. FastMCP is the right default for new servers.

Also worth knowing: Anthropic maintains modelcontextprotocol/servers, a reference implementations repo with production-ready servers for Postgres, filesystem, GitHub, Slack, and more — useful both as callable tools and as code templates.

MCP Inspector: Your First Line of Defence Against Silent Failures

The MCP Inspector (mcp dev) is the single most useful tool in the MCP development workflow. It ships with the mcp[cli] package, runs entirely locally, and connects directly to your server over stdio — giving you a canonical client view of your tool before any real client sees it.

What each tab shows and why it matters:

Tools tab: Every tool your server advertises via list_tools, with the full JSON Schema rendered as a form. If a tool is missing here, the real client will never see it. Debug schema issues here before touching any config file.
Call panel: A form pre-filled from the schema. Submitting it sends a real tools/call JSON-RPC request and displays the raw response. Use this to confirm your McpError handling works — send an empty diff and verify you get a structured error, not a stack trace.
Messages tab: Full JSON-RPC traffic log for the session. When a client call fails mysteriously in production, the raw request/response from this tab gives you the ground truth to debug from.
Resources and Prompts tabs: If your server exposes resources or prompt templates, they appear here for interactive testing using the same workflow.

The Inspector does not test client-specific behaviour — it uses a canonical MCP client implementation. If a tool works in Inspector but fails in Claude Desktop, the issue is almost always the config file (wrong path, wrong JSON key, wrong transport type), not the server itself. This distinction saves hours of misdirected debugging.

📊 Visual Reference

flowchart TD
    Pretrained["Pretrained
Model
(Frozen)"]
    LoRA["LoRA Adapter
(Trainable)"]
    Finetune["Fine-tune on
Custom Data"]
    Result["Fine-tuned
Model"]

    Pretrained --> LoRA
    LoRA --> Finetune
    Finetune --> Result

📚 Lessons Learned: Six Things That Break in Production and How to Prevent Them

1. Write the tool description before the implementation. The description field is the single most important field in your schema — not for humans, but for the LLM that routes calls to your tool. A vague description like "summarizes diffs" competes poorly against "Summarize a GitHub PR diff into a human-readable description with overview, key changes, and testing notes". Write the description first, test it in Inspector's simulated tool selection, then implement the handler.

2. Never let raw exceptions leave a handler. An unhandled exception in call_tool() causes some clients to mark the tool as failed and stop invoking it for the rest of the session. Always wrap in McpError. This is not defensive programming — it is the MCP contract. Every production handler should have a try/except Exception as exc: raise McpError(ErrorData(code=INTERNAL_ERROR, ...)) from exc at its outermost level.

3. Test both transports before shipping. A tool that works perfectly over stdio may fail over SSE if it reads from environment variables that the Docker container does not have, or if it relies on a file path that exists locally but not in the container. Run mcp dev server.py for stdio, then docker run for SSE, before registering in any client config.

4. Restart clients fully after every schema change. Claude Desktop and Cursor both cache the tools/list response across sessions. Adding a parameter to a tool and only restarting the server means the client calls the old schema for the rest of the session. Always do a full client restart after any inputSchema change — this is non-obvious and responsible for a surprising number of "it's not working" reports on teams.

5. The required array is your public API contract. Treat additions to required with the same discipline as breaking changes to a REST API. Once a client caches your schema, removing a field from required is backwards-compatible. Adding a field to required will break any client that cached the old schema and does not provide the new field. Maintain backwards-compatible shims during transition: arguments.get("repo_url") or arguments.get("repo").

6. Use environment variables for all secrets — never hardcode in config files. The stdio config files (claude_desktop_config.json, .cursor/mcp.json, .vscode/mcp.json) are sometimes checked into version control by teams who want portable configs. An OPENAI_API_KEY or MCP_AUTH_TOKEN hardcoded in the env block of a JSON config that lands in a public repo is an expensive mistake. Use a .env file loaded by the server process, and document the required variables in your README instead.

📌 TLDR: What You Can Build After Reading This Post

MCP is a universal adapter: any Python function exposed as an MCP server is automatically callable from Cursor, Claude Desktop, GitHub Copilot, and VS Code agent mode — without per-client rewrites.
Two transports, two use cases: stdio for local/personal use (simpler, more secure, zero infrastructure); HTTP+SSE for remote/multi-client deployment (hundreds of concurrent clients, persistent streaming, shareable via URL).
The Python SDK wires tools in three parts: list_tools() for schema advertisement, call_tool() for dispatch, and a transport context manager for the wire layer. Only the transport context changes between stdio and SSE — the handlers are identical.
FastMCP removes boilerplate: type hints become JSON Schema automatically; the docstring becomes the description; @mcp.tool() is the only decorator you need for most servers.
Steps 1–4 build the server; Steps 5–7 make it headless; Steps 8–10 register it in all three clients. Step 11 is the fallback when something goes wrong.
MCP Inspector (mcp dev) is always Step 1 when something breaks. If the tool works in Inspector, config file issues explain 90% of client-specific failures.
Production pitfalls: schema drift breaks clients silently; SSE connections drop under idle reverse-proxy timeouts; uninstrumented tool handlers are impossible to debug at scale; secrets in config JSON files leak when repos go public.

Key takeaways

✓TLDR: Build an MCP server once and call it from Cursor, Claude Desktop, and VS Code without rewrites — this guide takes you from a single Python function to a containerized, authenticated, three client deployment in 11 concrete steps.
✓📖 The Trapped Skill Problem: When a Great LLM Tool Can't Leave the IDE It Was Born In You spent an afternoon building a beautiful skill inside GitHub Copilot CLI: given a repository URL, it summarises the codebase, identifies the top changed files, and drafts a pull request description.
✓Then your teammate on Cursor asks if they can use it.
✓Another colleague on Claude Desktop wants access too.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Reader feedback

Was this article useful?

Rate it before you leave, then follow or subscribe for the next deep dive.

Continue learning

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

31 min · Llm · best next step

View roadmap