mcpcodeanalysis by johannhartmann - MCP Server

MCP Code Analysis Server

An intelligent MCP (Model Context Protocol) server that provides advanced code analysis and search capabilities for large codebases. It uses TreeSitter for parsing, PostgreSQL with pgvector for storage, and OpenAI embeddings for semantic search.

Features

🔍 Semantic Code Search: Natural language queries to find relevant code
🏛️ Domain-Driven Analysis: Extract business entities and bounded contexts using LLM
📊 Code Structure Analysis: Hierarchical understanding of modules, classes, and functions
🔄 Incremental Updates: Git-based change tracking for efficient re-indexing
🎯 Smart Code Explanations: AI-powered explanations with context aggregation
🔗 Dependency Analysis: Understand code relationships and dependencies
🌐 Knowledge Graph: Build semantic graphs with community detection (Leiden algorithm)
💡 DDD Refactoring: Domain-Driven Design suggestions and improvements
🚀 High Performance: Handles codebases with millions of lines of code
🐍 Python Support: Full support for Python with more languages coming

MCP Tools Available

Code Search Tools

semantic_search - Search for code using natural language queries (with optional domain enhancement)
find_similar_code - Find code similar to a given entity
search_by_code_snippet - Search for code similar to a snippet
search_by_business_capability - Find code implementing business capabilities
keyword_search - Search for code using keywords

Code Analysis Tools

get_code - Get code content for a specific entity
analyze_file - Analyze file structure and metrics
get_dependencies - Get dependencies for a code entity
find_usages - Find where a function or class is used

Domain-Driven Design Tools

extract_domain_model - Extract domain entities and relationships using LLM
find_aggregate_roots - Find aggregate roots in the codebase
analyze_bounded_context - Analyze bounded contexts and their relationships
suggest_ddd_refactoring - Get DDD-based refactoring suggestions
find_bounded_contexts - Discover all bounded contexts
generate_context_map - Generate context maps (JSON, Mermaid, PlantUML)

Advanced Analysis Tools

analyze_coupling - Analyze coupling between bounded contexts with metrics
detect_anti_patterns - Detect DDD anti-patterns (anemic models, god objects, etc.)
suggest_context_splits - Suggest how to split large bounded contexts
analyze_domain_evolution - Track domain model changes over time
get_domain_metrics - Get comprehensive domain health metrics

Repository Management Tools

add_repository - Add a new GitHub repository to track
list_repositories - List all tracked repositories
scan_repository - Scan or rescan a repository
update_embeddings - Update embeddings for a repository
get_repository_stats - Get detailed statistics
delete_repository - Delete a repository and its data

Quick Start

Prerequisites

Docker and Docker Compose
OpenAI API key

Recommended: Docker with MCP over HTTP

The preferred way to use this tool is via Docker with MCP over HTTP, which provides a fully isolated and reproducible environment.

Clone the repository:

git clone https://github.com/johannhartmann/mcp-code-analysis-server.git
cd mcp-code-analysis-server

Set up environment variables:

export OPENAI_API_KEY="your-api-key-here"
# Or add to .env file

Configure repositories: Create a config.yaml file to specify which repositories to track:

repositories:
  - url: https://github.com/owner/repo1
    branch: main
  - url: https://github.com/owner/repo2
    branch: develop
  - url: https://github.com/owner/private-repo
    access_token: "github_pat_..."  # For private repos

# Scanner configuration
scanner:
  storage_path: ./repositories
  exclude_patterns: 
    - "__pycache__"
    - "*.pyc"
    - ".git"
    - "venv"
    - "node_modules"

Start the server with Docker Compose:

docker-compose up

This will automatically:

Start PostgreSQL with pgvector
Initialize the database
Start the MCP server on port 8080
Scan configured repositories on startup

The server will be available at http://localhost:8080 and can be used with any MCP client that supports HTTP.

Managing Multiple Repositories

After initial setup, you can dynamically add more repositories using the MCP tools:

# Add a new repository
await mcp.call_tool("add_repository", {
    "url": "https://github.com/owner/new-repo",
    "scan_immediately": True,
    "generate_embeddings": True
})

# List all tracked repositories
await mcp.call_tool("list_repositories", {})

# Update a repository
await mcp.call_tool("scan_repository", {
    "repository_id": 1,
    "full_scan": False  # Incremental scan
})

Alternative: Local Development

If you need to run locally for development:

Prerequisites:
- Nix with flakes enabled (recommended) OR Python 3.11+
- Docker for PostgreSQL
Enter the development environment:

nix develop  # Recommended
# OR
python -m venv venv && source venv/bin/activate

Install dependencies:

uv sync  # If using nix
# OR
pip install -e ".[dev]"  # If using regular Python

Create configuration file:

python -m src.mcp_server create-config
# Edit config.yaml with your settings

Start PostgreSQL:

docker-compose up -d postgres

Initialize the database:

python -m src.mcp_server init-db

Start the MCP server:

python -m src.mcp_server serve --port 8080

Configuration

Edit config.yaml to customize:

# OpenAI API key (can also use OPENAI_API_KEY env var)
openai_api_key: "sk-..."

# Repositories to track
repositories:
  - url: https://github.com/owner/repo
    branch: main  # Optional, uses default branch if not specified
  - url: https://github.com/owner/private-repo
    access_token: "github_pat_..."  # For private repos

# Scanner configuration
scanner:
  storage_path: ./repositories
  exclude_patterns: 
    - "__pycache__"
    - "*.pyc"
    - ".git"
    - "venv"
    - "node_modules"
    
# Embeddings configuration
embeddings:
  model: "text-embedding-ada-002"
  batch_size: 100
  max_tokens: 8000
  
# MCP server configuration
mcp:
  host: "0.0.0.0"
  port: 8080

# Database configuration
database:
  host: localhost
  port: 5432
  database: code_analysis
  user: codeanalyzer
  password: your-secure-password

Usage Examples

Command Line

# Add and scan a repository
python -m src.mcp_server scan https://github.com/owner/repo

# Search for code
python -m src.mcp_server search "authentication handler"

# Start the server
python -m src.mcp_server serve

Using the MCP Tools

Once the server is running, you can use the tools via any MCP client:

# Semantic search
await mcp.call_tool("semantic_search", {
    "query": "functions that handle user authentication",
    "scope": "functions",
    "limit": 10
})

# Get code content
await mcp.call_tool("get_code", {
    "entity_type": "function",
    "entity_id": 123,
    "include_context": True
})

# Add a repository
await mcp.call_tool("add_repository", {
    "url": "https://github.com/owner/repo",
    "scan_immediately": True,
    "generate_embeddings": True
})

With Claude Desktop (HTTP)

For Claude Desktop or other MCP clients that support HTTP, configure the server URL:

{
  "mcpServers": {
    "code-analysis": {
      "url": "http://localhost:8080"
    }
  }
}

With Claude Desktop (Stdio)

If you need to use stdio mode:

{
  "mcpServers": {
    "code-analysis": {
      "command": "python",
      "args": ["-m", "src.mcp_server", "serve"],
      "cwd": "/path/to/mcp-code-analysis-server"
    }
  }
}

Then in Claude Desktop:

"Search for functions that handle authentication"
"Show me the implementation of the UserService class"
"Find all usages of the database connection pool"
"What files import the utils module?"

Development

Running Tests

make test-all  # Run all tests with coverage
make test-unit  # Run unit tests only
make test-integration  # Run integration tests

Code Quality

make qa  # Run all quality checks
make format  # Format code
make lint  # Run linters
make type-check  # Type checking

Building Documentation

make docs  # Build docs
make docs-serve  # Serve docs locally

Architecture

The server consists of several key components:

Scanner Module: Monitors filesystem changes using Git
Parser Module: Extracts code structure using TreeSitter
Embeddings Module: Generates semantic embeddings via OpenAI
Query Module: Processes natural language queries
MCP Server: Exposes tools via FastMCP

Performance

Initial indexing: <1000 files/minute
Incremental updates: <10 seconds for 100 changed files
Query response time: <2 seconds
Supports codebases up to 10M lines of code

Contributing

We welcome contributions! Please see for guidelines.

License

This project is licensed under the MIT License - see the file for details.

Author

Johann-Peter Hartmann
Email:
GitHub: @johannhartmann

Acknowledgments

Built with FastMCP for MCP protocol support
Uses TreeSitter for code parsing
Powered by LangChain and LangGraph
Vector search via pgvector

johannhartmann/mcpcodeanalysis

semantic_search

find_similar_code

search_by_code_snippet

keyword_search

get_code

analyze_file

get_dependencies

find_usages

add_repository

list_repositories

scan_repository

update_embeddings

get_repository_stats

delete_repository