johannhartmann/mcpcodeanalysis
If you are the rightful owner of mcpcodeanalysis and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
An intelligent MCP server providing advanced code analysis and search capabilities for large codebases.
semantic_search
Search for code using natural language queries.
find_similar_code
Find code similar to a given entity.
search_by_code_snippet
Search for code similar to a snippet.
keyword_search
Search for code using keywords.
get_code
Get code content for a specific entity.
analyze_file
Analyze file structure and metrics.
get_dependencies
Get dependencies for a code entity.
find_usages
Find where a function or class is used.
add_repository
Add a new GitHub repository to track.
list_repositories
List all tracked repositories.
scan_repository
Scan or rescan a repository.
update_embeddings
Update embeddings for a repository.
get_repository_stats
Get detailed statistics.
delete_repository
Delete a repository and its data.
MCP Code Analysis Server
An intelligent MCP (Model Context Protocol) server that provides advanced code analysis and search capabilities for large codebases. It uses TreeSitter for parsing, PostgreSQL with pgvector for storage, and OpenAI embeddings for semantic search.
Features
- 🔍 Semantic Code Search: Natural language queries to find relevant code
- 🏛️ Domain-Driven Analysis: Extract business entities and bounded contexts using LLM
- 📊 Code Structure Analysis: Hierarchical understanding of modules, classes, and functions
- 🔄 Incremental Updates: Git-based change tracking for efficient re-indexing
- 🎯 Smart Code Explanations: AI-powered explanations with context aggregation
- 🔗 Dependency Analysis: Understand code relationships and dependencies
- 🌐 Knowledge Graph: Build semantic graphs with community detection (Leiden algorithm)
- 💡 DDD Refactoring: Domain-Driven Design suggestions and improvements
- 🚀 High Performance: Handles codebases with millions of lines of code
- 🐍 Python Support: Full support for Python with more languages coming
MCP Tools Available
Code Search Tools
semantic_search
- Search for code using natural language queries (with optional domain enhancement)find_similar_code
- Find code similar to a given entitysearch_by_code_snippet
- Search for code similar to a snippetsearch_by_business_capability
- Find code implementing business capabilitieskeyword_search
- Search for code using keywords
Code Analysis Tools
get_code
- Get code content for a specific entityanalyze_file
- Analyze file structure and metricsget_dependencies
- Get dependencies for a code entityfind_usages
- Find where a function or class is used
Domain-Driven Design Tools
extract_domain_model
- Extract domain entities and relationships using LLMfind_aggregate_roots
- Find aggregate roots in the codebaseanalyze_bounded_context
- Analyze bounded contexts and their relationshipssuggest_ddd_refactoring
- Get DDD-based refactoring suggestionsfind_bounded_contexts
- Discover all bounded contextsgenerate_context_map
- Generate context maps (JSON, Mermaid, PlantUML)
Advanced Analysis Tools
analyze_coupling
- Analyze coupling between bounded contexts with metricsdetect_anti_patterns
- Detect DDD anti-patterns (anemic models, god objects, etc.)suggest_context_splits
- Suggest how to split large bounded contextsanalyze_domain_evolution
- Track domain model changes over timeget_domain_metrics
- Get comprehensive domain health metrics
Repository Management Tools
add_repository
- Add a new GitHub repository to tracklist_repositories
- List all tracked repositoriesscan_repository
- Scan or rescan a repositoryupdate_embeddings
- Update embeddings for a repositoryget_repository_stats
- Get detailed statisticsdelete_repository
- Delete a repository and its data
Quick Start
Prerequisites
- Docker and Docker Compose
- OpenAI API key
Recommended: Docker with MCP over HTTP
The preferred way to use this tool is via Docker with MCP over HTTP, which provides a fully isolated and reproducible environment.
- Clone the repository:
git clone https://github.com/johannhartmann/mcp-code-analysis-server.git
cd mcp-code-analysis-server
- Set up environment variables:
export OPENAI_API_KEY="your-api-key-here"
# Or add to .env file
- Configure repositories:
Create a
config.yaml
file to specify which repositories to track:
repositories:
- url: https://github.com/owner/repo1
branch: main
- url: https://github.com/owner/repo2
branch: develop
- url: https://github.com/owner/private-repo
access_token: "github_pat_..." # For private repos
# Scanner configuration
scanner:
storage_path: ./repositories
exclude_patterns:
- "__pycache__"
- "*.pyc"
- ".git"
- "venv"
- "node_modules"
- Start the server with Docker Compose:
docker-compose up
This will automatically:
- Start PostgreSQL with pgvector
- Initialize the database
- Start the MCP server on port 8080
- Scan configured repositories on startup
The server will be available at http://localhost:8080
and can be used with any MCP client that supports HTTP.
Managing Multiple Repositories
After initial setup, you can dynamically add more repositories using the MCP tools:
# Add a new repository
await mcp.call_tool("add_repository", {
"url": "https://github.com/owner/new-repo",
"scan_immediately": True,
"generate_embeddings": True
})
# List all tracked repositories
await mcp.call_tool("list_repositories", {})
# Update a repository
await mcp.call_tool("scan_repository", {
"repository_id": 1,
"full_scan": False # Incremental scan
})
Alternative: Local Development
If you need to run locally for development:
-
Prerequisites:
- Nix with flakes enabled (recommended) OR Python 3.11+
- Docker for PostgreSQL
-
Enter the development environment:
nix develop # Recommended
# OR
python -m venv venv && source venv/bin/activate
- Install dependencies:
uv sync # If using nix
# OR
pip install -e ".[dev]" # If using regular Python
- Create configuration file:
python -m src.mcp_server create-config
# Edit config.yaml with your settings
- Start PostgreSQL:
docker-compose up -d postgres
- Initialize the database:
python -m src.mcp_server init-db
- Start the MCP server:
python -m src.mcp_server serve --port 8080
Configuration
Edit config.yaml
to customize:
# OpenAI API key (can also use OPENAI_API_KEY env var)
openai_api_key: "sk-..."
# Repositories to track
repositories:
- url: https://github.com/owner/repo
branch: main # Optional, uses default branch if not specified
- url: https://github.com/owner/private-repo
access_token: "github_pat_..." # For private repos
# Scanner configuration
scanner:
storage_path: ./repositories
exclude_patterns:
- "__pycache__"
- "*.pyc"
- ".git"
- "venv"
- "node_modules"
# Embeddings configuration
embeddings:
model: "text-embedding-ada-002"
batch_size: 100
max_tokens: 8000
# MCP server configuration
mcp:
host: "0.0.0.0"
port: 8080
# Database configuration
database:
host: localhost
port: 5432
database: code_analysis
user: codeanalyzer
password: your-secure-password
Usage Examples
Command Line
# Add and scan a repository
python -m src.mcp_server scan https://github.com/owner/repo
# Search for code
python -m src.mcp_server search "authentication handler"
# Start the server
python -m src.mcp_server serve
Using the MCP Tools
Once the server is running, you can use the tools via any MCP client:
# Semantic search
await mcp.call_tool("semantic_search", {
"query": "functions that handle user authentication",
"scope": "functions",
"limit": 10
})
# Get code content
await mcp.call_tool("get_code", {
"entity_type": "function",
"entity_id": 123,
"include_context": True
})
# Add a repository
await mcp.call_tool("add_repository", {
"url": "https://github.com/owner/repo",
"scan_immediately": True,
"generate_embeddings": True
})
With Claude Desktop (HTTP)
For Claude Desktop or other MCP clients that support HTTP, configure the server URL:
{
"mcpServers": {
"code-analysis": {
"url": "http://localhost:8080"
}
}
}
With Claude Desktop (Stdio)
If you need to use stdio mode:
{
"mcpServers": {
"code-analysis": {
"command": "python",
"args": ["-m", "src.mcp_server", "serve"],
"cwd": "/path/to/mcp-code-analysis-server"
}
}
}
Then in Claude Desktop:
- "Search for functions that handle authentication"
- "Show me the implementation of the UserService class"
- "Find all usages of the database connection pool"
- "What files import the utils module?"
Development
Running Tests
make test-all # Run all tests with coverage
make test-unit # Run unit tests only
make test-integration # Run integration tests
Code Quality
make qa # Run all quality checks
make format # Format code
make lint # Run linters
make type-check # Type checking
Building Documentation
make docs # Build docs
make docs-serve # Serve docs locally
Architecture
The server consists of several key components:
- Scanner Module: Monitors filesystem changes using Git
- Parser Module: Extracts code structure using TreeSitter
- Embeddings Module: Generates semantic embeddings via OpenAI
- Query Module: Processes natural language queries
- MCP Server: Exposes tools via FastMCP
Performance
- Initial indexing: <1000 files/minute
- Incremental updates: <10 seconds for 100 changed files
- Query response time: <2 seconds
- Supports codebases up to 10M lines of code
Contributing
We welcome contributions! Please see for guidelines.
License
This project is licensed under the MIT License - see the file for details.
Author
Johann-Peter Hartmann
Email:
GitHub: @johannhartmann
Acknowledgments
- Built with FastMCP for MCP protocol support
- Uses TreeSitter for code parsing
- Powered by LangChain and LangGraph
- Vector search via pgvector