Skip to content

Debugging MCP Protocol Issues

Henry edited this page Jul 20, 2025 · 1 revision

Debugging MCP Protocol Issues

This guide provides a systematic approach to troubleshooting MCP (Model Context Protocol) issues, based on real debugging experiences with handler registration, tool execution, and protocol routing problems.

Table of Contents

Overview

MCP protocol debugging can be challenging due to the asynchronous nature of the communication and the multiple layers involved. This guide documents proven strategies for identifying and resolving common issues.

Common Issues

1. Symptoms You Might Encounter

  • Tool execution hanging after 15+ seconds
  • "Tool not found" errors despite proper registration
  • Handler functions never being called
  • Protocol handshake succeeding but tools failing
  • TaskGroup asyncio errors

2. The Debugging Journey

Real example from June 6, 2025:

23:00 - Tool execution hanging
23:10 - Database operations suspected
23:15 - Handler registration errors found
23:20 - Protocol communication verified working
23:25 - MCP library validation attempted
23:30 - Root cause: message routing issue

Debugging Methodology

1. Systematic Elimination Approach

Create a debugging matrix to isolate issues:

# Test 1: Minimal MCP Server
# Purpose: Verify basic MCP functionality
@server
class MinimalServer(Server):
    @server.list_tools()
    async def handle_list_tools(self) -> List[types.Tool]:
        return [
            types.Tool(
                name=\"test_tool\",
                description=\"Simple test tool\",
                inputSchema={
                    \"type\": \"object\",
                    \"properties\": {
                        \"message\": {\"type\": \"string\"}
                    }
                }
            )
        ]
    
    @server.call_tool()
    async def handle_call_tool(self, name: str, arguments: dict) -> Any:
        print(f\"TOOL CALLED: {name}\", file=sys.stderr, flush=True)
        if name == \"test_tool\":
            return {\"result\": f\"Received: {arguments.get('message')}\"}

# Test 2: Simplified Memory Server
# Purpose: Test without complex initialization
class SimplifiedMemoryServer(Server):
    def __init__(self):
        super().__init__(\"simplified-memory\")
        self.storage = None  # No ChromaDB initialization
    
    @server.list_tools()
    async def handle_list_tools(self) -> List[types.Tool]:
        # Return tools without complex setup
        return self.get_tool_definitions()

# Test 3: Full Memory Service
# Purpose: Test complete implementation

2. Debug Logging Strategy

Add comprehensive logging at key points:

import sys
import logging

# Configure detailed logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stderr)
    ]
)

class DebugMemoryServer(Server):
    async def initialize(self, params):
        print(\"=== SERVER INITIALIZATION STARTED ===\", file=sys.stderr, flush=True)
        result = await super().initialize(params)
        print(\"=== SERVER INITIALIZATION COMPLETE ===\", file=sys.stderr, flush=True)
        return result
    
    @server.call_tool()
    async def handle_call_tool(self, name: str, arguments: dict) -> Any:
        print(f\"=== TOOL CALL INTERCEPTED: {name} ===\", file=sys.stderr, flush=True)
        print(f\"Arguments: {arguments}\", file=sys.stderr, flush=True)
        
        try:
            result = await self._execute_tool(name, arguments)
            print(f\"=== TOOL EXECUTION SUCCESS ===\", file=sys.stderr, flush=True)
            return result
        except Exception as e:
            print(f\"=== TOOL EXECUTION FAILED: {str(e)} ===\", file=sys.stderr, flush=True)
            raise

Handler Registration Problems

Issue: "Object of type 'ToolsCapability' has no len()"

This error occurs when trying to verify handler registration incorrectly.

Problem Code:

# WRONG - ToolsCapability is not a list
capabilities = server.get_capabilities()
print(f\"Registered {len(capabilities.tools)} tools\")  # ERROR!

Solution:

# CORRECT - Just log the capability object
capabilities = server.get_capabilities(
    notification_options=NotificationOptions(),
    experimental_capabilities={}
)
print(f\"Server capabilities: {capabilities}\")

Issue: Missing Required Arguments

The get_capabilities() method requires specific arguments:

# WRONG
capabilities = server.get_capabilities()

# CORRECT
from mcp.server.models import InitializationOptions
from mcp.types import NotificationOptions

capabilities = server.get_capabilities(
    notification_options=NotificationOptions(),
    experimental_capabilities={
        \"hardware_info\": {
            \"architecture\": \"x86_64\",
            \"memory_gb\": 16,
            \"cpu_count\": 8
        }
    }
)

Tool Execution Timeouts

Issue: Tools Hang During Initialization

Heavy initialization in the server startup can cause timeouts.

Problem Pattern:

class MemoryServer(Server):
    def __init__(self):
        super().__init__(\"memory\")
        # Heavy initialization causing hanging
        self.storage = ChromaMemoryStorage()  # This might download models!

Solution: Lazy Initialization

class MemoryServer(Server):
    def __init__(self):
        super().__init__(\"memory\")
        self.storage = None
        self._storage_initialized = False
    
    def _ensure_storage_initialized(self):
        \"\"\"Initialize storage only when needed\"\"\"
        if not self._storage_initialized:
            print(\"Initializing ChromaDB storage...\", file=sys.stderr, flush=True)
            self.storage = ChromaMemoryStorage()
            self._storage_initialized = True
    
    @server.call_tool()
    async def handle_call_tool(self, name: str, arguments: dict) -> Any:
        # Initialize only when actually needed
        if name != \"dashboard_check_health\":  # Health check doesn't need storage
            self._ensure_storage_initialized()
        
        return await self._execute_tool(name, arguments)

Database Validation Hanging

Database health checks during startup can cause hanging:

# PROBLEM: This runs during server initialization
async def initialize(self, params):
    await super().initialize(params)
    await validate_database_health()  # Can hang here!

# SOLUTION: Skip validation during startup
async def initialize(self, params):
    await super().initialize(params)
    print(\"Skipping database validation during startup\", file=sys.stderr, flush=True)
    # Validate later when actually using the database

Protocol Message Routing

Issue: Tool Calls Not Reaching Handlers

Even with proper registration, tool calls might not reach your handlers.

Debugging Steps:

  1. Verify Registration:
@server.list_tools()
async def handle_list_tools(self) -> List[types.Tool]:
    tools = self._get_tool_definitions()
    print(f\"Returning {len(tools)} tools\", file=sys.stderr, flush=True)
    for tool in tools:
        print(f\"  - {tool.name}\", file=sys.stderr, flush=True)
    return tools
  1. Test Protocol Communication:
// test_protocol.js
const { spawn } = require('child_process');

async function testProtocol() {
    const server = spawn('python', ['server.py']);
    
    // Send initialization
    const initRequest = {
        jsonrpc: \"2.0\",
        id: 1,
        method: \"initialize\",
        params: {
            protocolVersion: \"2024-11-05\",
            capabilities: {}
        }
    };
    
    server.stdin.write(JSON.stringify(initRequest) + '\
');
    
    // Listen for response
    server.stdout.on('data', (data) => {
        console.log('Server response:', data.toString());
    });
}
  1. Check Message Format:
# Add raw message logging
async def handle_raw_message(self, message: dict):
    print(f\"RAW MESSAGE: {json.dumps(message)}\", file=sys.stderr, flush=True)
    return await super().handle_raw_message(message)

Testing Strategies

1. Create Test Scripts for Each Layer

Layer 1: Direct Python Test

# test_direct.py
import asyncio
from server import MemoryServer

async def test_direct():
    server = MemoryServer()
    
    # Test tool listing
    tools = await server.handle_list_tools()
    print(f\"Found {len(tools)} tools\")
    
    # Test tool execution
    result = await server.handle_call_tool(
        \"dashboard_check_health\",
        {}
    )
    print(f\"Result: {result}\")

asyncio.run(test_direct())

Layer 2: MCP Protocol Test

# test_mcp_protocol.py
import asyncio
import json
from mcp.server.stdio import stdio_server

async def test_with_protocol():
    async with stdio_server() as (read_stream, write_stream):
        # Create server
        server = MemoryServer()
        
        # Initialize
        init_msg = {
            \"jsonrpc\": \"2.0\",
            \"id\": 1,
            \"method\": \"initialize\",
            \"params\": {\"protocolVersion\": \"2024-11-05\"}
        }
        
        # Process message
        response = await server.handle_message(init_msg)
        print(f\"Init response: {response}\")

asyncio.run(test_with_protocol())

Layer 3: Full Integration Test

#!/bin/bash
# test_integration.sh

echo \"Starting MCP server...\"
python server.py &
SERVER_PID=$!
sleep 2

echo \"Running client test...\"
node test_client.js

echo \"Killing server...\"
kill $SERVER_PID

2. Progressive Complexity Testing

Start simple and add complexity:

  1. Minimal tool - Just returns a string
  2. Database read - Reads but doesn't write
  3. Full operation - Complete functionality
# Progressive test tools
tools = [
    # Level 1: No dependencies
    {
        \"name\": \"echo_test\",
        \"handler\": lambda args: {\"echo\": args.get(\"message\")}
    },
    
    # Level 2: Read-only database
    {
        \"name\": \"count_test\",
        \"handler\": lambda args: {\"count\": storage.count()}
    },
    
    # Level 3: Full functionality
    {
        \"name\": \"store_test\",
        \"handler\": lambda args: storage.store(args[\"content\"])
    }
]

Solutions and Patterns

1. Lazy Initialization Pattern

class LazyServer(Server):
    def __init__(self):
        super().__init__(\"lazy-server\")
        self._resources = {}
    
    def _get_resource(self, name: str):
        if name not in self._resources:
            if name == \"storage\":
                self._resources[name] = ChromaMemoryStorage()
            elif name == \"embedder\":
                self._resources[name] = EmbeddingModel()
        return self._resources[name]
    
    @property
    def storage(self):
        return self._get_resource(\"storage\")

2. Timeout Handling

import asyncio

async def with_timeout(coro, timeout_seconds=30):
    try:
        return await asyncio.wait_for(coro, timeout=timeout_seconds)
    except asyncio.TimeoutError:
        print(f\"Operation timed out after {timeout_seconds}s\", file=sys.stderr)
        raise

3. Health Check Without Dependencies

@server.call_tool()
async def handle_call_tool(self, name: str, arguments: dict) -> Any:
    # Health check bypasses all initialization
    if name == \"health_check\":
        return {
            \"status\": \"healthy\",
            \"server_running\": True,
            \"timestamp\": datetime.now().isoformat()
        }
    
    # Other tools initialize resources
    self._ensure_initialized()
    return await self._route_tool(name, arguments)

4. Debugging Checklist

When debugging MCP issues:

  • Server starts without errors
  • Initialization completes successfully
  • Tools are listed in handle_list_tools
  • Debug logs show tool interception
  • No heavy operations in init
  • Database operations are deferred
  • Error handling includes logging
  • Test with minimal example first
  • Check protocol version compatibility
  • Verify message format compliance

Conclusion

Debugging MCP protocol issues requires:

  1. Systematic approach - Test each layer independently
  2. Comprehensive logging - Log at every decision point
  3. Progressive testing - Start simple, add complexity
  4. Lazy initialization - Defer heavy operations
  5. Timeout awareness - Handle long operations gracefully

The key is isolating whether the issue is in your implementation, the MCP framework, or the communication protocol. By following this guide's strategies, you can efficiently identify and resolve MCP protocol issues.

Clone this wiki locally