mcpcodeanalysis

johannhartmann/mcpcodeanalysis

3.3

If you are the rightful owner of mcpcodeanalysis and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

An intelligent MCP server providing advanced code analysis and search capabilities for large codebases.

Tools
  1. semantic_search

    Search for code using natural language queries.

  2. find_similar_code

    Find code similar to a given entity.

  3. search_by_code_snippet

    Search for code similar to a snippet.

  4. keyword_search

    Search for code using keywords.

  5. get_code

    Get code content for a specific entity.

  6. analyze_file

    Analyze file structure and metrics.

  7. get_dependencies

    Get dependencies for a code entity.

  8. find_usages

    Find where a function or class is used.

  9. add_repository

    Add a new GitHub repository to track.

  10. list_repositories

    List all tracked repositories.

  11. scan_repository

    Scan or rescan a repository.

  12. update_embeddings

    Update embeddings for a repository.

  13. get_repository_stats

    Get detailed statistics.

  14. delete_repository

    Delete a repository and its data.

MCP Code Analysis Server

License: MIT Python 3.11+ Code style: black Ruff Pre-commit

An intelligent MCP (Model Context Protocol) server that provides advanced code analysis and search capabilities for large codebases. It uses TreeSitter for parsing, PostgreSQL with pgvector for storage, and OpenAI embeddings for semantic search.

Features

  • 🔍 Semantic Code Search: Natural language queries to find relevant code
  • 🏛️ Domain-Driven Analysis: Extract business entities and bounded contexts using LLM
  • 📊 Code Structure Analysis: Hierarchical understanding of modules, classes, and functions
  • 🔄 Incremental Updates: Git-based change tracking for efficient re-indexing
  • 🎯 Smart Code Explanations: AI-powered explanations with context aggregation
  • 🔗 Dependency Analysis: Understand code relationships and dependencies
  • 🌐 Knowledge Graph: Build semantic graphs with community detection (Leiden algorithm)
  • 💡 DDD Refactoring: Domain-Driven Design suggestions and improvements
  • 🚀 High Performance: Handles codebases with millions of lines of code
  • 🐍 Python Support: Full support for Python with more languages coming

MCP Tools Available

Code Search Tools

  • semantic_search - Search for code using natural language queries (with optional domain enhancement)
  • find_similar_code - Find code similar to a given entity
  • search_by_code_snippet - Search for code similar to a snippet
  • search_by_business_capability - Find code implementing business capabilities
  • keyword_search - Search for code using keywords

Code Analysis Tools

  • get_code - Get code content for a specific entity
  • analyze_file - Analyze file structure and metrics
  • get_dependencies - Get dependencies for a code entity
  • find_usages - Find where a function or class is used

Domain-Driven Design Tools

  • extract_domain_model - Extract domain entities and relationships using LLM
  • find_aggregate_roots - Find aggregate roots in the codebase
  • analyze_bounded_context - Analyze bounded contexts and their relationships
  • suggest_ddd_refactoring - Get DDD-based refactoring suggestions
  • find_bounded_contexts - Discover all bounded contexts
  • generate_context_map - Generate context maps (JSON, Mermaid, PlantUML)

Advanced Analysis Tools

  • analyze_coupling - Analyze coupling between bounded contexts with metrics
  • detect_anti_patterns - Detect DDD anti-patterns (anemic models, god objects, etc.)
  • suggest_context_splits - Suggest how to split large bounded contexts
  • analyze_domain_evolution - Track domain model changes over time
  • get_domain_metrics - Get comprehensive domain health metrics

Repository Management Tools

  • add_repository - Add a new GitHub repository to track
  • list_repositories - List all tracked repositories
  • scan_repository - Scan or rescan a repository
  • update_embeddings - Update embeddings for a repository
  • get_repository_stats - Get detailed statistics
  • delete_repository - Delete a repository and its data

Quick Start

Prerequisites

  • Docker and Docker Compose
  • OpenAI API key

Recommended: Docker with MCP over HTTP

The preferred way to use this tool is via Docker with MCP over HTTP, which provides a fully isolated and reproducible environment.

  1. Clone the repository:
git clone https://github.com/johannhartmann/mcp-code-analysis-server.git
cd mcp-code-analysis-server
  1. Set up environment variables:
export OPENAI_API_KEY="your-api-key-here"
# Or add to .env file
  1. Configure repositories: Create a config.yaml file to specify which repositories to track:
repositories:
  - url: https://github.com/owner/repo1
    branch: main
  - url: https://github.com/owner/repo2
    branch: develop
  - url: https://github.com/owner/private-repo
    access_token: "github_pat_..."  # For private repos

# Scanner configuration
scanner:
  storage_path: ./repositories
  exclude_patterns: 
    - "__pycache__"
    - "*.pyc"
    - ".git"
    - "venv"
    - "node_modules"
  1. Start the server with Docker Compose:
docker-compose up

This will automatically:

  • Start PostgreSQL with pgvector
  • Initialize the database
  • Start the MCP server on port 8080
  • Scan configured repositories on startup

The server will be available at http://localhost:8080 and can be used with any MCP client that supports HTTP.

Managing Multiple Repositories

After initial setup, you can dynamically add more repositories using the MCP tools:

# Add a new repository
await mcp.call_tool("add_repository", {
    "url": "https://github.com/owner/new-repo",
    "scan_immediately": True,
    "generate_embeddings": True
})

# List all tracked repositories
await mcp.call_tool("list_repositories", {})

# Update a repository
await mcp.call_tool("scan_repository", {
    "repository_id": 1,
    "full_scan": False  # Incremental scan
})

Alternative: Local Development

If you need to run locally for development:

  1. Prerequisites:

    • Nix with flakes enabled (recommended) OR Python 3.11+
    • Docker for PostgreSQL
  2. Enter the development environment:

nix develop  # Recommended
# OR
python -m venv venv && source venv/bin/activate
  1. Install dependencies:
uv sync  # If using nix
# OR
pip install -e ".[dev]"  # If using regular Python
  1. Create configuration file:
python -m src.mcp_server create-config
# Edit config.yaml with your settings
  1. Start PostgreSQL:
docker-compose up -d postgres
  1. Initialize the database:
python -m src.mcp_server init-db
  1. Start the MCP server:
python -m src.mcp_server serve --port 8080

Configuration

Edit config.yaml to customize:

# OpenAI API key (can also use OPENAI_API_KEY env var)
openai_api_key: "sk-..."

# Repositories to track
repositories:
  - url: https://github.com/owner/repo
    branch: main  # Optional, uses default branch if not specified
  - url: https://github.com/owner/private-repo
    access_token: "github_pat_..."  # For private repos

# Scanner configuration
scanner:
  storage_path: ./repositories
  exclude_patterns: 
    - "__pycache__"
    - "*.pyc"
    - ".git"
    - "venv"
    - "node_modules"
    
# Embeddings configuration
embeddings:
  model: "text-embedding-ada-002"
  batch_size: 100
  max_tokens: 8000
  
# MCP server configuration
mcp:
  host: "0.0.0.0"
  port: 8080

# Database configuration
database:
  host: localhost
  port: 5432
  database: code_analysis
  user: codeanalyzer
  password: your-secure-password

Usage Examples

Command Line

# Add and scan a repository
python -m src.mcp_server scan https://github.com/owner/repo

# Search for code
python -m src.mcp_server search "authentication handler"

# Start the server
python -m src.mcp_server serve

Using the MCP Tools

Once the server is running, you can use the tools via any MCP client:

# Semantic search
await mcp.call_tool("semantic_search", {
    "query": "functions that handle user authentication",
    "scope": "functions",
    "limit": 10
})

# Get code content
await mcp.call_tool("get_code", {
    "entity_type": "function",
    "entity_id": 123,
    "include_context": True
})

# Add a repository
await mcp.call_tool("add_repository", {
    "url": "https://github.com/owner/repo",
    "scan_immediately": True,
    "generate_embeddings": True
})

With Claude Desktop (HTTP)

For Claude Desktop or other MCP clients that support HTTP, configure the server URL:

{
  "mcpServers": {
    "code-analysis": {
      "url": "http://localhost:8080"
    }
  }
}

With Claude Desktop (Stdio)

If you need to use stdio mode:

{
  "mcpServers": {
    "code-analysis": {
      "command": "python",
      "args": ["-m", "src.mcp_server", "serve"],
      "cwd": "/path/to/mcp-code-analysis-server"
    }
  }
}

Then in Claude Desktop:

  • "Search for functions that handle authentication"
  • "Show me the implementation of the UserService class"
  • "Find all usages of the database connection pool"
  • "What files import the utils module?"

Development

Running Tests

make test-all  # Run all tests with coverage
make test-unit  # Run unit tests only
make test-integration  # Run integration tests

Code Quality

make qa  # Run all quality checks
make format  # Format code
make lint  # Run linters
make type-check  # Type checking

Building Documentation

make docs  # Build docs
make docs-serve  # Serve docs locally

Architecture

The server consists of several key components:

  • Scanner Module: Monitors filesystem changes using Git
  • Parser Module: Extracts code structure using TreeSitter
  • Embeddings Module: Generates semantic embeddings via OpenAI
  • Query Module: Processes natural language queries
  • MCP Server: Exposes tools via FastMCP

Performance

  • Initial indexing: <1000 files/minute
  • Incremental updates: <10 seconds for 100 changed files
  • Query response time: <2 seconds
  • Supports codebases up to 10M lines of code

Contributing

We welcome contributions! Please see for guidelines.

License

This project is licensed under the MIT License - see the file for details.

Author

Johann-Peter Hartmann
Email:
GitHub: @johannhartmann

Acknowledgments