Advanced26 min readAi AgentsDeveloper ToolsLlm

Step-by-Step: How to Expose a Skill as an MCP Server

Step-by-step: annotate a Python function, test with MCP Inspector, containerize it, and register in Claude Desktop, Cursor, and VS Code.

LLM Engineering

Abstract Algorithms

·Mar 28, 2026·26 min read

More actions⌄

Practice Interview Mock Discussion

Reading progress

26 min left

Metadata and pacing⌄

Total read

26 min

Sections

◴ On this page⌄

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries 🔍 Before You Start: What You Need and How MCP Registration Works ⚙️ Steps 1–4: From Python Function to Locally Tested MCP Server Step 1 — Set Up the Project Structure Step 2 — Write the Tool Function with @server.tool() Annotation Step 3 — Add Input Validation and Error Handling Step 4 — Test Locally with MCP Inspector 🧠 Deep Dive: Tool Schema Design and Why It Breaks Half the Integrations The Internals: How MCP Clients Read Your Tool Schema Performance Analysis: Cold Starts, Schema Overhead, and SSE Connection Pooling 📊 The Registration Journey: From Local Dev to Three Live Clients 🌍 Real-World Applications: How Teams Are Sharing Skills Across IDEs Today ⚖️ Trade-offs and Failure Modes: What Breaks When You Register Across Clients 🧭 Decision Guide: stdio for Local, SSE for Shared, Container for Team 🧪 Practical Walkthrough: Registering the PR Summarizer in Claude Desktop, Cursor, and VS Code Step 5 — Switch to HTTP+SSE Transport Step 6 — Add Bearer Token Authentication Step 7 — Package as a Docker Container Step 8 — Register in Claude Desktop Step 9 — Register in Cursor Step 10 — Register in VS Code / GitHub Copilot Agent Mode Step 11 — Debug Common Failures 🛠️ MCP Inspector: The Debugging Tool You'll Use Every Day 📚 Lessons Learned 📌 TLDR: Summary and Key Takeaways 📝 Practice Quiz 🔗 Related Posts

✣ Need another angle?⌄

Switch the article companion into a lower-complexity framing, then quiz yourself when you are ready.

Advanced26 min readAi AgentsDeveloper ToolsLlm

Step-by-Step: How to Expose a Skill as an MCP Server

Step-by-step: annotate a Python function, test with MCP Inspector, containerize it, and register in Claude Desktop, Cursor, and VS Code.

Abstract Algorithms

Mar 28, 2026 · 26 min read

Interview

Helpful?

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries

TLDR: Turn any Python function into a multi client MCP server in 11 steps — from annotation to Docker.

1. Overview

Step-by-step: annotate a Python function, test with MCP Inspector, containerize it, and register in Claude Desktop, Cursor, and VS Code.

⌁

Why it matters

TLDR: Turn any Python function into a multi client MCP server in 11 steps — from annotation to Docker.

Show high-level concept flow⌄

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries

Starting point

→

🔍 Before You Start: What You Need and How MCP Registration Works

Next concept

→

⚙️ Steps 1–4: From Python Function to Locally Tested MCP Server

Next concept

→

🧠 Deep Dive: Tool Schema Design and Why It Breaks Half the Integrations

Next concept

→

📊 The Registration Journey: From Local Dev to Three Live Clients

Outcome

Committed

At a glance

DifficultyAdvanced ▥

Concepts27

Estimated time26 min

PrerequisitesAi Agents, Developer Tools

System lens

See Step-by-Step: How to Expose a Skill as an MCP Server as a living topology.

Step-by-step: annotate a Python function, test with MCP Inspector, containerize it, and register in Claude Desktop, Cursor, and VS Code.

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries

Ingress and assumptions

🔍 Before You Start: What You Need and How MCP Registration Works

State transition

⚙️ Steps 1–4: From Python Function to Locally Tested MCP Server

State transition

🧠 Deep Dive: Tool Schema Design and Why It Breaks Half the Integrations

State transition

📊 The Registration Journey: From Local Dev to Three Live Clients

Outcome and guarantees

The article becomes easier when every section maps to a state change, a guarantee, or a failure boundary.

Narrative transition

Move from explanation to operating judgment.

Use these checkpoints as the conceptual pacing layer before continuing into the full article.

!Why this matters

TLDR: Turn any Python function into a multi client MCP server in 11 steps — from annotation to Docker.

#Key section to watch

Pay attention to "🔍 Before You Start: What You Need and How MCP Registration Works"; it usually contains the main mechanism or tradeoff.

?Interview angle

Be ready to explain 📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries and 🔍 Before You Start: What You Need and How MCP Registration Works with one concrete example and one tradeoff.

Tradeoff path 1

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries: speed-first

TLDR: Turn any Python function into a multi client MCP server in 11 steps — from annotation to Docker.

Tradeoff path 2

🔍 Before You Start: What You Need and How MCP Registration Works: reliability-first

📖 The Copy Paste Problem: Why Skills Die at IDE Boundaries A developer pastes their function into a Slack message because their teammate uses Cursor and can't call a Copilot skill.

Failure rehearsal

Pressure-test the mental model.

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries misunderstood

High model quality can still produce incorrect outputs without grounding and verification.

Mitigation: Revisit 📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries and validate the first principles.

Risk 68%

🔍 Before You Start: What You Need and How MCP Registration Works tradeoff missed

Low latency does not automatically mean high throughput under contention.

Mitigation: Compare against 🔍 Before You Start: What You Need and How MCP Registration Works and document the tradeoff.

Risk 58%

Back to the article

Continue into the authored sections with the topology in mind: each heading should now answer what changes, what can fail, and what guarantee the system is trying to preserve.

TLDR: Turn any Python function into a multi-client MCP server in 11 steps — from annotation to Docker.

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries

A developer pastes their summarize_pr_diff function into a Slack message because their teammate uses Cursor and can't call a Copilot skill. The function works perfectly. The sharing mechanism is broken. By the end of this post, that same function runs as an MCP server — callable from Cursor, Claude Desktop, and VS Code Copilot simultaneously, with no copy-paste required.

If you have ever written an LLM-powered function that worked exactly as intended in one tool and then had to manually explain it, copy-paste it, or rewrite it for a colleague on a different IDE, you already understand the problem this post solves. The tool is not broken. The distribution model is.

The Model Context Protocol (MCP) is an open standard that defines a single wire format for exposing Python functions as callable tools to any MCP-aware AI client. Once your function is wrapped as an MCP server, it becomes simultaneously available to Cursor, Claude Desktop, GitHub Copilot in VS Code agent mode, and any other compliant client — without any per-client rewriting.

This post is the practical companion to Headless Agents: How to Deploy Your Skills as an MCP Server. That post explains the why and what of MCP: the three-layer architecture, the stdio vs. HTTP transport decision guide, and the conceptual model for headless skill deployment. This post covers the how in full numbered detail — every command, every config file, every failure mode — starting from a single Python function and ending with a server visible across three clients.

🔍 Before You Start: What You Need and How MCP Registration Works

Before running a single command, it helps to understand what the end state looks like so each step has a clear purpose.

What you need installed:

Python 3.11+ with pip
Docker Desktop (for Step 7)
Claude Desktop, Cursor, or VS Code with GitHub Copilot (at least one to test registration)
curl for HTTP transport testing

What "registered" means in practice: Each MCP client maintains a config file — a JSON file in a platform-specific directory — that maps a server name to either a command to spawn (for stdio transport) or a URL to connect to (for HTTP+SSE transport). When you open the client, it reads this config, attempts to start or connect to every listed server, and calls tools/list to discover available tools. If your server is running and returns a valid schema, the tools appear in the client's tool picker within seconds. If anything in the chain fails silently, the tools simply don't appear — no error dialog, no log by default. That silence is why Step 4 (MCP Inspector) and Step 11 (debugging) are so important.

The two transports at a glance:

Transport	How the client connects	Ideal for
stdio	Spawns your script as a child process	Local dev, single developer, same machine
HTTP + SSE	Connects to a running HTTP server	Shared team use, Docker, cloud deployment

For a full transport decision guide, see Headless Agents. This post shows you how to implement both and choose at registration time.

⚙️ Steps 1–4: From Python Function to Locally Tested MCP Server

These four steps take you from an empty directory to a fully verified local MCP server. Every subsequent step builds on this foundation.

Step 1 — Set Up the Project Structure

Create the directory layout and install dependencies:

mkdir mcp-pr-summarizer && cd mcp-pr-summarizer
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install "mcp>=1.0" fastmcp openai

Your directory should look like this:

mcp-pr-summarizer/
├── server.py
├── pyproject.toml
└── Dockerfile

The pyproject.toml declares the package and its runtime dependencies:

# pyproject.toml
[project]
name = "pr-summarizer-mcp"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = ["mcp>=1.0", "fastmcp", "openai"]

FastMCP vs. bare mcp.server: FastMCP is a thin decorator layer over the official MCP Python SDK. With FastMCP, you write @app.tool() and the schema is inferred from your function signature. With the bare SDK, you write @server.list_tools() and @server.call_tool() separately and provide the JSON Schema manually. This post uses the bare SDK for Steps 2–3 so you see exactly what the wire format looks like, then shows the FastMCP shorthand for reference. See the previous post for a full FastMCP conceptual overview.

Step 2 — Write the Tool Function with `@server.tool()` Annotation

The Server class from the MCP SDK is the core registry. You declare tools using two decorators: @server.list_tools() for the capability announcement and @server.call_tool() for the dispatcher.

# server.py
import asyncio
import os
from mcp.server import Server
from mcp.server.stdio import stdio_server
import mcp.types as types
from openai import AsyncOpenAI

server = Server("pr-summarizer")
client = AsyncOpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

@server.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="summarize_pr_diff",
            description=(
                "Summarize a GitHub PR diff into a human-readable description. "
                "Returns a structured summary with an overview, list of key changes, "
                "and suggested testing notes."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "diff": {
                        "type": "string",
                        "description": "The raw git diff content from the PR"
                    },
                    "target_audience": {
                        "type": "string",
                        "description": "Who will read this summary",
                        "default": "engineering team"
                    }
                },
                "required": ["diff"]
            }
        )
    ]

@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "summarize_pr_diff":
        return await _summarize_pr_diff(
            diff=arguments["diff"],
            target_audience=arguments.get("target_audience", "engineering team")
        )
    raise ValueError(f"Unknown tool: {name}")

async def _summarize_pr_diff(diff: str, target_audience: str) -> list[types.TextContent]:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"You write PR descriptions for a {target_audience}."},
            {"role": "user", "content": f"Summarize this diff:\n\n{diff}"}
        ]
    )
    return [types.TextContent(type="text", text=response.choices[0].message.content)]

async def main():
    async with stdio_server() as streams:
        await server.run(*streams, server.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())

FastMCP shorthand (for comparison): Using FastMCP, the same tool looks like @app.tool() async def summarize_pr_diff(diff: str, target_audience: str = "engineering team") -> str: ... — the schema is inferred automatically from type hints, and there is no separate list_tools / call_tool split. Both approaches produce identical wire output; the bare SDK makes the schema structure explicit, which is useful when you need fine-grained control over descriptions and defaults.

Step 3 — Add Input Validation and Error Handling

Raw exceptions must never propagate out of an MCP handler. Clients interpret unhandled exceptions as protocol errors and may silently drop the tool from their registry. Always raise McpError with a structured ErrorData payload.

# server.py (updated call_tool and helper)
from mcp.shared.exceptions import McpError
from mcp.types import ErrorData, INTERNAL_ERROR, INVALID_PARAMS

@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "summarize_pr_diff":
        return await _summarize_pr_diff(
            diff=arguments.get("diff", ""),
            target_audience=arguments.get("target_audience", "engineering team")
        )
    raise McpError(ErrorData(code=INVALID_PARAMS, message=f"Unknown tool: {name}"))

async def _summarize_pr_diff(diff: str, target_audience: str) -> list[types.TextContent]:
    if not diff.strip():
        raise McpError(ErrorData(code=INVALID_PARAMS, message="diff cannot be empty"))
    if len(diff) > 100_000:
        raise McpError(ErrorData(code=INVALID_PARAMS, message="diff exceeds 100 KB limit"))
    try:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": f"You write PR descriptions for a {target_audience}."},
                {"role": "user", "content": f"Summarize this diff:\n\n{diff}"}
            ]
        )
        return [types.TextContent(type="text", text=response.choices[0].message.content)]
    except Exception as exc:
        raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"LLM call failed: {exc}")) from exc

The pattern here is deliberate: validate inputs first, wrap the LLM call in a try/except, and always raise McpError — never a bare ValueError or RuntimeError. The error code constants (INVALID_PARAMS, INTERNAL_ERROR) are standard JSON-RPC 2.0 error codes that clients know how to surface cleanly.

Step 4 — Test Locally with MCP Inspector

The MCP Inspector is a browser-based debugging UI that connects directly to your server over stdio. It shows your tool schema, lets you send test calls, and displays raw request/response JSON.

# Install the MCP CLI tools if not already present
pip install "mcp[cli]"

# Launch Inspector against your server
mcp dev server.py

The command spawns your server as a child process, then opens http://localhost:5173 in your browser. You will see:

Tools tab — your summarize_pr_diff tool listed with its full input schema
Call panel — a form pre-populated from the schema; fill in diff and click Run
Messages tab — raw JSON-RPC traffic between Inspector and your server

If the tool does not appear in the Tools tab, the schema is malformed. The most common cause is a missing "type": "object" at the top level of inputSchema — covered in the Deep Dive section next.

🧠 Deep Dive: Tool Schema Design and Why It Breaks Half the Integrations

The MCP Inspector passing your test call does not guarantee every client will work. Claude Desktop, Cursor, and VS Code each have slightly different schema validation behavior. Understanding what the client reads — and what it ignores — prevents silent failures in production.

The Internals: How MCP Clients Read Your Tool Schema

When a client calls tools/list, your server returns a JSON array of tool descriptors. Each descriptor has three fields: name, description, and inputSchema. The inputSchema field is a standard JSON Schema object. Clients use it to:

Generate the invocation form in the UI (Claude Desktop renders a form from the schema)
Validate arguments before sending them to your server
Choose the tool — the LLM reads description when deciding which tool to invoke for a user's intent

The four fields that matter most, in order of impact:

Field	Where it matters	What breaks without it
`inputSchema.type: "object"`	All clients	Tool is rejected or silently skipped by strict parsers
`description` (top-level)	LLM tool selection	LLM cannot match user intent to tool; tool is never invoked
`properties[x].description`	Claude Desktop form, LLM prompt injection	User sees blank form fields; LLM uses wrong arguments
`required` array	All clients	Optional fields treated as required; calls fail with missing param errors

Why missing descriptions cause silent failures: When the LLM decides which tool to invoke, it reads the tool's description field as part of its context. A blank or vague description (like "summarize") competes poorly against tools with rich descriptions. The tool exists in the registry but is functionally invisible to the model. The fix is always a concrete sentence that describes input, output, and use case — exactly what was shown in Step 2.

The required array is not optional: If you omit required, some clients assume all fields are required. Others assume none are. The resulting behavior is unpredictable across clients. Always declare exactly which fields must be present, even if it is a single field.

Performance Analysis: Cold Starts, Schema Overhead, and SSE Connection Pooling

Understanding the performance characteristics of each transport helps you choose the right one before you hit problems under real load.

Cold start times by transport:

Transport	Cold start latency	Per-call overhead	Max concurrent clients
stdio	150–400 ms (Python process spawn + import)	~0.1 ms (IPC)	1 per spawning client
HTTP + SSE	5–30 ms (HTTP connect to running process)	~1–5 ms (TCP + headers)	Hundreds (asyncio event loop)
Docker + SSE	500–2000 ms (first container start)	Same as HTTP + SSE	Same as HTTP + SSE

Schema parsing overhead is negligible in practice: the tools/list response is typically under 2 KB even for servers with ten tools. Schema parsing completes in under 1 ms on every client tested. Optimize your tool handler, not your schema.

SSE connection pooling: HTTP+SSE maintains a persistent connection per client session. If you run behind a reverse proxy (nginx, Caddy), configure proxy_read_timeout to at least 300 seconds to prevent the proxy from closing idle SSE connections during long LLM calls. A closed SSE connection looks like a normal disconnect to the client — it will silently retry, but mid-call reconnects lose in-flight responses.

The dominant cost in any MCP server is always the tool handler itself. A gpt-4o-mini call takes 1–4 seconds. Optimizing transport overhead is like optimizing the envelope on a letter while the postal system takes three days. Focus on caching repeated LLM calls (functools.lru_cache for deterministic inputs, Redis for shared state) and using async HTTP clients everywhere.

📊 The Registration Journey: From Local Dev to Three Live Clients

The diagram below shows the complete path from a working Python function to a tool registered across all three clients. Each arrow represents a concrete step in this post.

flowchart TD
    A["✏ Write tool function (Steps 1–2)"] --> B["� Add error handling (Step 3)"]
    B --> C["� Test with MCP Inspector (Step 4)"]
    C --> D{Which transport?}
    D -->|"Local / single dev"| E["� Register via stdio (Step 8, 9, 10)"]
    D -->|"Shared / remote"| F["� Switch to HTTP+SSE (Step 5)"]
    F --> G[" Add bearer token auth (Step 6)"]
    G --> H["� Package as Docker container (Step 7)"]
    H --> I[" docker run -p 8080: 8080"]
    I --> J["� Register via SSE URL (Step 8, 9, 10)"]
    E --> K[" Claude Desktop sees tool"]
    J --> K
    K --> L[" Cursor sees tool"]
    L --> M[" VS Code Copilot sees tool"]
    M --> N["� Debug failures (Step 11)"]

The decision diamond at the centre is the key branch: stdio registration skips the Docker steps entirely and goes straight to client config files. SSE registration requires the running server (Steps 5–7) before the config files can point anywhere useful.

The platform-agnostic code review assistant. A mid-size engineering team has developers split across Cursor, Claude Desktop, and VS Code. They built a single MCP server that wraps three tools: summarize_pr_diff (this post's example), lint_findings_summary (summarizes ESLint/Flake8 output), and test_coverage_report (describes coverage gaps in plain English). The server runs as a Railway-hosted Docker container. Each developer's config file points to the same SSE URL. Whether a developer uses Cursor's inline chat or Claude Desktop's sidebar, they call the same tools against the same backend — no per-tool configuration drift, no "works on my machine" breakdowns.

The private codebase search skill. A fintech team cannot send their internal codebase to external LLM APIs for semantic search. They run a local MCP server with a search_codebase tool that queries an internal Elasticsearch index. The server uses stdio transport so it never listens on a network port — OS process isolation is the security boundary. Each developer has the stdio config entry in their Cursor workspace config, and the tool is available in Copilot's /agent mode within their VS Code. The skill runs entirely on the developer's machine and never touches an external network.

The CI/CD summary bot. A DevOps team registered an MCP server in their GitHub Actions environment that calls summarize_deployment_diff to generate a plain-English deployment summary at the end of each pipeline run. The server is invoked by a Copilot-powered step in the workflow YAML. The output is posted as a PR comment. There is no human in the loop — the same MCP server that developers use interactively from their IDEs is also called headlessly in CI. One registration, two usage modes.

⚖️ Trade-offs and Failure Modes: What Breaks When You Register Across Clients

Every cross-client MCP deployment surfaces failure modes that do not appear in local testing. The table below covers the six most common, with the exact symptom you will see in each client, the underlying cause, and the fix.

Failure	Symptom	Root Cause	Fix
Tool not appearing	Tool absent from client UI after restart	Malformed `inputSchema` (missing `"type": "object"`) or server crash on startup	Run `mcp dev server.py` in Inspector; check Tools tab
Schema mismatch	"Missing required parameter" error on every call	`required` array lists a field that your handler treats as optional	Align `required` array with handler defaults; test with Inspector
Connection refused	"Failed to connect to MCP server" in Cursor/Claude	SSE server not running when client starts, or wrong port in config	Confirm `docker run` is active; verify port matches config URL
Auth 401	Tool call returns "Unauthorized" or silent empty response	Bearer token in config does not match `MCP_AUTH_TOKEN` env var	Re-check token in config `headers` vs. server env; tokens are case-sensitive
Silent schema truncation	Tool appears but description is empty in UI	`description` field was `null` or omitted in `list_tools` return	Add a non-empty string to `description` in the `Tool` constructor
Stale tool list after update	Old tool signature still showing after server update	Client cached the capability manifest from the previous session	Restart the client (not just reload); some clients cache `tools/list` aggressively

The most dangerous failure is the last one. Claude Desktop and Cursor both cache tool manifests across sessions. If you update your tool's inputSchema — add a parameter, change a description — restart the entire client application, not just the MCP connection. A running server with a new schema next to a cached old manifest causes unpredictable argument-passing behaviour that is very hard to trace.

🧭 Decision Guide: stdio for Local, SSE for Shared, Container for Team

Situation	Recommendation
Use stdio when	The tool is for your own local use, runs on the same machine as the client, and you need zero infrastructure setup. Configuration is a single JSON entry; no ports, no auth.
Use HTTP+SSE when	Multiple developers need the same tool, or the server must run on a remote host. SSE supports hundreds of concurrent clients and persistent streaming responses.
Containerize when	The server needs to be available outside business hours, deployed to a shared environment, or reproduced identically across dev/staging/prod. Docker eliminates Python version and dependency drift.
Avoid SSE without auth when	The server is exposed on any network interface beyond `localhost`. An unauthenticated MCP server on a shared LAN is a code-execution endpoint.
Avoid serverless (Lambda/Cloud Run) when	The tool has significant warm-up cost (model loading, connection pool establishment), or your SSE sessions last more than 15 minutes (Lambda's max execution timeout).
Use both transports in parallel when	You want local stdio for personal fast iteration and a shared SSE container for the team. The same `server.py` supports both — switch at startup via an environment variable.

The simplest production pattern for a small team: one Railway or Fly.io container running SSE on port 8080, bearer token authentication via environment variable, and a single shared config snippet that each developer pastes into their client config file. Total infrastructure cost: one small container at ~$5/month.

🧪 Practical Walkthrough: Registering the PR Summarizer in Claude Desktop, Cursor, and VS Code

This section covers Steps 5–11: switching transports, adding auth, packaging in Docker, and writing the exact config file entries for each client.

Step 5 — Switch to HTTP+SSE Transport

Replace the stdio_server entrypoint with SSE transport. The tool handlers are unchanged — only the main() function changes:

# server.py — updated main() for SSE
import uvicorn
from mcp.server.sse import SseServerTransport
from starlette.applications import Starlette
from starlette.routing import Route

sse = SseServerTransport("/messages/")

async def handle_sse(request):
    async with sse.connect_sse(request.scope, request.receive, request._send) as streams:
        await server.run(*streams, server.create_initialization_options())

starlette_app = Starlette(routes=[Route("/sse", endpoint=handle_sse)])

if __name__ == "__main__":
    uvicorn.run(starlette_app, host="0.0.0.0", port=8080)

Test it immediately with curl before adding auth:

python server.py &
curl -N http://localhost:8080/sse
# Should emit: data: {"type":"endpoint","uri":"/messages/?session_id=..."}

Step 6 — Add Bearer Token Authentication

Wrap the SSE route with a simple Starlette middleware that checks the Authorization header:

# server.py — auth middleware
import os
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse

MCP_AUTH_TOKEN = os.environ.get("MCP_AUTH_TOKEN", "")

class BearerTokenMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        if not MCP_AUTH_TOKEN:
            return await call_next(request)
        auth = request.headers.get("Authorization", "")
        if auth != f"Bearer {MCP_AUTH_TOKEN}":
            return JSONResponse({"error": "Unauthorized"}, status_code=401)
        return await call_next(request)

starlette_app = Starlette(
    routes=[Route("/sse", endpoint=handle_sse)],
    middleware=[Middleware(BearerTokenMiddleware)]
)

Set MCP_AUTH_TOKEN in your environment before starting the server:

export MCP_AUTH_TOKEN="my-secret-token"
python server.py

Step 7 — Package as a Docker Container

Use a multi-stage build to keep the image small. The first stage installs dependencies; the second stage copies only the runtime artifacts:

# Dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
COPY pyproject.toml .
RUN pip install --no-cache-dir "mcp>=1.0" fastmcp openai uvicorn starlette

FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY server.py .
EXPOSE 8080
CMD ["python", "server.py"]

Build and run:

docker build -t pr-summarizer-mcp .
docker run -d -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -e MCP_AUTH_TOKEN=secret \
  pr-summarizer-mcp

Verify the container is responding before configuring clients:

curl -N -H "Authorization: Bearer secret" http://localhost:8080/sse

Step 8 — Register in Claude Desktop

Claude Desktop reads its server registry from a JSON file in the OS application support directory.

File location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

For stdio (local script):

{
  "mcpServers": {
    "pr-summarizer": {
      "command": "python",
      "args": ["/path/to/mcp-pr-summarizer/server.py"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

For SSE (Docker container or remote):

{
  "mcpServers": {
    "pr-summarizer-remote": {
      "url": "http://localhost:8080/sse",
      "headers": {
        "Authorization": "Bearer secret"
      }
    }
  }
}

After saving the file, fully quit and reopen Claude Desktop (Cmd+Q on Mac, not just close the window). The tool should appear in the tool picker within the first new conversation.

Step 9 — Register in Cursor

Cursor reads MCP config from .cursor/mcp.json in your home directory (global) or in your project root (workspace-scoped):

{
  "mcpServers": {
    "pr-summarizer": {
      "command": "python",
      "args": ["server.py"],
      "cwd": "/path/to/mcp-pr-summarizer",
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

For the SSE variant, use the same url / headers format as Claude Desktop. Cursor respects both formats. Reload Cursor's window after saving (Cmd+Shift+P → "Developer: Reload Window").

Step 10 — Register in VS Code / GitHub Copilot Agent Mode

VS Code reads MCP config from .vscode/mcp.json in your workspace root. Note the slightly different schema: VS Code uses "type": "stdio" as an explicit discriminator field, and supports ${workspaceFolder} variable substitution in paths:

{
  "servers": {
    "pr-summarizer": {
      "type": "stdio",
      "command": "python",
      "args": ["server.py"],
      "cwd": "${workspaceFolder}/mcp-pr-summarizer",
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

For SSE:

{
  "servers": {
    "pr-summarizer-remote": {
      "type": "sse",
      "url": "http://localhost:8080/sse",
      "headers": {
        "Authorization": "Bearer secret"
      }
    }
  }
}

The tool becomes available in Copilot's @workspace agent mode. In the VS Code chat panel, open agent mode and type @workspace /summarize-pr-diff — Copilot will show the tool in its available tools list.

Step 11 — Debug Common Failures

When a tool does not appear after registration, work through this checklist in order:

Check server startup: Run python server.py manually in a terminal. Any import error or missing env variable will be visible immediately.
Run MCP Inspector: mcp dev server.py. Confirm the tool appears in the Tools tab before touching client configs.
Check the config file path: Claude Desktop will silently ignore a misplaced config file. Use the exact path for your OS.
Check JSON syntax: A single misplaced comma in the config JSON will cause the entire registry to fail silently. Use a JSON linter.
Restart the client fully: Not reload — fully quit and reopen. Claude Desktop especially caches manifests.
Check SSE reachability: If using SSE, run curl -N <url> from the same machine as the client before blaming the config.

🛠️ MCP Inspector: The Debugging Tool You'll Use Every Day

The MCP Inspector (mcp dev) is the single most useful tool in the MCP development workflow. It is open-source, ships with the mcp[cli] package, and runs entirely locally — no cloud account required.

What the Inspector UI shows:

Tools tab: Every tool your server advertises via list_tools, with the full JSON Schema rendered as a human-readable form. If a tool is missing here, the client will never see it.
Call panel: A form pre-filled from the schema. Submitting it sends a real tools/call JSON-RPC request to your server and displays the raw response. This is the fastest way to confirm your error handling works correctly — send an empty diff and verify that an McpError comes back, not a stack trace.
Messages tab: Full JSON-RPC traffic log for the session. When a client call fails mysteriously, paste the raw request/response from this tab into your debugging notes.
Resources tab and Prompts tab: If your server exposes resources or prompt templates, they appear here for the same interactive testing.

Workflow tip: Keep Inspector open in one browser tab while you edit server.py. The mcp dev process auto-reloads on file save (hot-reload support was added in MCP SDK 1.2). You can iterate on schema descriptions and error messages without restarting the command.

The Inspector does not test client-specific behaviour — it uses a canonical MCP client implementation. If a tool works in Inspector but fails in Claude Desktop, the issue is almost always the client config file (wrong path, wrong JSON key, wrong transport type) rather than the server itself.

📚 Lessons Learned

After walking through eleven steps across three clients, here are the non-obvious lessons that save the most debugging time:

1. Write the docstring before the implementation. The tool description is the single most important field in your schema — not for humans, but for the LLM that routes calls to your tool. A vague description like "summarizes diffs" competes poorly against "Summarize a GitHub PR diff into a human-readable description with overview, key changes, and testing notes". Write the description first, run it through Inspector's simulated tool selection, then implement the handler.

2. Never let raw exceptions leave a handler. An unhandled exception in call_tool() causes some clients to mark the tool as failed and stop calling it for the session. Always wrap in McpError. This is not defensive programming — it is the MCP contract.

3. Test both transports before shipping. A tool that works perfectly over stdio may fail over SSE if it reads from stdin or relies on environment variables that the Docker container does not have. Run mcp dev server.py for stdio, then docker run for SSE, before registering in client configs.

4. Restart clients after schema changes, not just servers. Claude Desktop and Cursor cache the tools/list response. If you add a parameter to a tool and only restart the server, the client will call the old schema for the rest of the session. Always do a full client restart after any schema change.

5. The required array is your API contract. Treat it with the same discipline as a public REST API. Once a client caches your schema, removing a field from required is backwards-compatible. Adding a field to required is a breaking change that will break any client that cached the old schema.

6. Use environment variables for all secrets — never hardcode. The stdio config files (claude_desktop_config.json, .cursor/mcp.json) are checked into version control by some teams. An OPENAI_API_KEY hardcoded in the env block of a JSON config that lands in a public repo is a costly mistake. Use a .env file loaded by the server process, and document the required variables in your README.

📌 TLDR: Summary and Key Takeaways

TLDR: Turn any Python function into a multi-client MCP server in 11 steps — from annotation to Docker.

The pattern is always the same: annotate → validate → test with Inspector → transport → auth → Docker → register. Every MCP server follows these eleven steps regardless of what the tool does.
Tool schema is your public API. The description and inputSchema fields are what MCP clients and LLMs read to discover and invoke your tool. Incomplete schemas cause silent failures, not loud errors.
stdio and SSE are two faces of the same server. The tool handlers are identical. Only main() changes. Choose the transport at deployment time, not at development time.
MCP Inspector (mcp dev) is the first line of defense. If a tool works in Inspector, client-specific failures are almost always config file issues — wrong path, wrong JSON key, wrong URL.
Always McpError, never bare exceptions. Unhandled exceptions in tool handlers cause clients to silently blacklist tools for the session.
Restart clients fully after schema changes. Claude Desktop and Cursor both cache tools/list responses across sessions.
One Docker container, three clients. The same SSE container registered via url + headers in claude_desktop_config.json, .cursor/mcp.json, and .vscode/mcp.json gives every developer on your team access to the same skill simultaneously.

📝 Practice Quiz

Test your understanding of the MCP server deployment workflow.

You run mcp dev server.py and your tool appears in the Inspector's Tools tab. You then add it to Claude Desktop's config file, fully restart the app, and the tool does not appear. What is the most likely cause?

a) The MCP Inspector cached the tool schema
b) The inputSchema has a top-level "type": "object" field
c) The config file path is wrong for your OS
d) The tool description is too long

Correct Answer: c — Claude Desktop silently ignores a config file at the wrong path. The Inspector passing confirms the server is valid; the client not seeing it almost always points to a configuration file issue.
Your summarize_pr_diff tool works perfectly over stdio but returns a 401 error when called over SSE from Cursor. What should you check first?

a) Whether OPENAI_API_KEY is set in the Docker container
b) Whether the Authorization header in .cursor/mcp.json matches MCP_AUTH_TOKEN in the server
c) Whether the SSE port is correct
d) Whether the tool description matches the call intent

Correct Answer: b — A 401 specifically means the auth header was sent but did not match the server's expected token. Port issues produce "Connection refused", not 401.
You update your tool to add a new required parameter and restart only the server (not the client). A teammate reports the tool is behaving strangely with unexpected argument errors. Why?

a) The new parameter conflicts with a reserved MCP field name
b) The McpError code for missing params is wrong
c) The client cached the old tool schema and is still calling with the old argument set
d) FastMCP does not support required parameter additions

Correct Answer: c — MCP clients cache the tools/list response. A schema change on the server is not visible to the client until the client is fully restarted and performs a fresh tools/list call.
You want your MCP server to be available to ten developers simultaneously, with persistent state between calls and a shared LLM call cache. Which deployment approach should you use?

a) stdio, one instance per developer
b) HTTP+SSE in a Docker container with Redis-backed caching
c) stdio, with a shared Unix socket
d) Serverless (AWS Lambda) with SSE transport

Correct Answer: b — stdio spawns one process per client with no shared state. Lambda does not support persistent SSE connections for multi-minute LLM calls. HTTP+SSE in a container with Redis is the correct pattern for shared, stateful, multi-client access.
(Open-ended — no single correct answer) You are building a review_pull_request MCP tool that calls three LLM APIs sequentially: one for diff summarization, one for security analysis, and one for test coverage review. The combined latency is 8–12 seconds. How would you design the tool's error handling and response strategy to give the client the best experience during that wait? Consider McpError codes, streaming vs. batch responses, partial results, and what happens if the second LLM call fails after the first succeeds. This is a design challenge — describe your approach.

Headless Agents: How to Deploy Your Skills as an MCP Server — The conceptual companion to this post: why MCP exists, the three-layer architecture, and the stdio vs. SSE transport decision guide.
Skills vs. LangChain, LangGraph, MCP, and Tools — How MCP tools compare to LangChain tool definitions, LangGraph node actions, and OpenAI function calling schemas.
LLM Skill Registry, Routing, and Evaluation for Production Agents — How to manage, version, and route across a library of MCP skills in a production multi-agent system.

Expandable deep dives

📖 The Copy-Paste Problem: Why Skills Die at IDE Boundaries⌄

Dive deeper into this section and cross-reference concepts before moving to the next heading.Jump to section

🔍 Before You Start: What You Need and How MCP Registration Works⌄

Dive deeper into this section and cross-reference concepts before moving to the next heading.Jump to section

⚙️ Steps 1–4: From Python Function to Locally Tested MCP Server⌄

Dive deeper into this section and cross-reference concepts before moving to the next heading.Jump to section

Step 1 — Set Up the Project Structure⌄

Dive deeper into this section and cross-reference concepts before moving to the next heading.Jump to section

Key takeaways

✓TLDR: Turn any Python function into a multi client MCP server in 11 steps — from annotation to Docker.
✓📖 The Copy Paste Problem: Why Skills Die at IDE Boundaries A developer pastes their function into a Slack message because their teammate uses Cursor and can't call a Copilot skill.
✓By the end of this post, that same function runs as an MCP server — callable from Cursor, Claude Desktop, and VS Code Copilot simultaneously, with no copy paste required.
✓If you have ever written an LLM powered function that worked exactly as intended in one tool and then had to manually explain it, copy paste it, or rewrite it for a colleague on a different IDE, you already understand the problem this post solves.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Reader feedback

Was this article useful?

Rate it before you leave, then follow or subscribe for the next deep dive.

Continue learning

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

31 min · Llm · best next step

View roadmap