mcp-code-indexer

fluffypony/mcp-code-indexer

3.4

If you are the rightful owner of mcp-code-indexer and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

MCP Code Indexer is a production-ready Model Context Protocol server designed to enhance AI agents' ability to navigate and understand large codebases efficiently.

Tools
5
Resources
0
Prompts
0

MCP Code Indexer 🚀

PyPI version Python License

A production-ready Model Context Protocol (MCP) server that revolutionizes how AI agents navigate and understand codebases. Built for high-concurrency environments with advanced database resilience, the server provides instant access to intelligent descriptions, semantic search, and context-aware recommendations while maintaining 800+ writes/sec throughput.

🎯 What It Does

The MCP Code Indexer solves a critical problem for AI agents working with large codebases: understanding code structure without repeatedly scanning files. Instead of reading every file, agents can:

  • Query file purposes instantly with natural language descriptions
  • Search across codebases using full-text search
  • Get intelligent recommendations based on codebase size (overview vs search)
  • Generate condensed overviews for project understanding

Perfect for AI-powered code review, refactoring tools, documentation generation, and codebase analysis workflows.

⚡ Quick Start

👨‍💻 For Developers

Get started integrating MCP Code Indexer into your AI agent workflow:

# Install with Poetry
poetry add mcp-code-indexer

# Or with pip
pip install mcp-code-indexer

# Start the MCP server
mcp-code-indexer

# Connect your MCP client and start using tools
# See API Reference for complete tool documentation

🌐 For Web Applications

Enable HTTP/REST API access for browser-based applications:

# Start HTTP server with authentication
mcp-code-indexer --http --auth-token "your-secret-token"

# Custom host and port
mcp-code-indexer --http --host 0.0.0.0 --port 8080

# CORS configuration for web apps
mcp-code-indexer --http --cors-origins "https://localhost:3000" "https://myapp.com"

🔗

🤖 For AI-Powered Q&A

Ask questions about your codebase using natural language:

# Set OpenRouter API key for Claude access
export OPENROUTER_API_KEY="your-openrouter-api-key"

# Simple questions about project architecture
mcp-code-indexer --ask "What does this project do?" my-project

# Enhanced analysis with file search
mcp-code-indexer --deepask "How is authentication implemented?" web-app

# JSON output for programmatic use
mcp-code-indexer --ask "List the main components" my-project --json

🤖

🔧 For System Administrators

Deploy and configure the server for your team:

# Production deployment with custom settings
mcp-code-indexer \
  --token-limit 64000 \
  --db-path /data/mcp-index.db \
  --cache-dir /var/cache/mcp \
  --log-level INFO

# Check installation
mcp-code-indexer --version

🎯 For Everyone

New to MCP Code Indexer? Start here:

  1. Install: poetry add mcp-code-indexer (or pip install mcp-code-indexer)
  2. Run: mcp-code-indexer --token-limit 32000
  3. Connect: Use your favorite MCP client
  4. Explore: Try the check_codebase_size tool first

Development Setup:

# Clone and setup for contributing
git clone https://github.com/fluffypony/mcp-code-indexer.git
cd mcp-code-indexer

# Install with Poetry (recommended)
poetry install

# Or install in development mode with pip
pip install -e .

# Run the server
mcp-code-indexer --token-limit 32000

🔗 Git Hook Integration

🚀 NEW Feature: Automated code indexing with AI-powered analysis! Keep your file descriptions synchronized automatically as your codebase evolves.

👤 For Users: Quick Setup

# Set your OpenRouter API key
export OPENROUTER_API_KEY="sk-or-v1-your-api-key-here"

# Test git hook functionality
mcp-code-indexer --githook

# Install post-commit hook
cp examples/git-hooks/post-commit .git/hooks/
chmod +x .git/hooks/post-commit

👨‍💻 For Developers: How It Works

The git hook integration provides intelligent automation:

  • 📊 Git Analysis: Automatically analyzes git diffs after commits/merges
  • 🤖 AI Processing: Uses OpenRouter API with Anthropic's Claude Sonnet 4
  • ⚡ Smart Updates: Only processes files that actually changed
  • 🔄 Overview Maintenance: Updates project overview when structure changes
  • 🛡️ Error Isolation: Git operations continue even if indexing fails
  • ⏱️ Rate Limiting: Built-in retry logic with exponential backoff

🎯 Key Benefits

💡 Zero Manual Work: Descriptions stay current without any effort ⚡ Performance: Only analyzes changed files, not entire codebase 🔒 Reliability: Robust error handling ensures git operations never fail 🎛️ Configurable: Support for custom models and timeout settings

Learn More: See for complete configuration options and troubleshooting.

🧠 Vector Mode (BETA)

🚀 NEW Feature: Semantic code search with vector embeddings! Experience AI-powered code discovery that understands context and meaning, not just keywords.

🎯 What is Vector Mode?

Vector Mode transforms how you search and understand codebases by using AI embeddings:

  • 🔍 Semantic Search: Find code by meaning, not just text matching
  • ⚡ Real-time Indexing: Automatic embedding generation as code changes
  • 🛡️ Secure by Default: Comprehensive secret redaction before API calls
  • 🌐 Multi-language: Python, JavaScript, TypeScript with AST-based chunking
  • 📊 Smart Chunking: Context-aware code segmentation for optimal embeddings

🚀 Quick Start

# Install MCP Code Indexer (includes vector mode)
pip install mcp-code-indexer

# Set required API keys
export VOYAGE_API_KEY="pa-your-voyage-api-key"
export TURBOPUFFER_API_KEY="your-turbopuffer-api-key"

# Optional: Configure region (default: gcp-europe-west3)
export TURBOPUFFER_REGION="gcp-europe-west3" 

# Start with vector mode enabled
mcp-code-indexer --vector

# The daemon automatically starts and begins indexing your projects

💡 Key Features

  • 🔐 Secret Redaction: 20+ pattern types automatically detected and redacted
  • 🌳 Merkle Trees: Efficient change detection without full directory scans
  • 🎛️ Circuit Breakers: Resilient API integration with automatic retry logic
  • 📈 Production Ready: Built for high-concurrency with comprehensive monitoring

🔧 Advanced Configuration

# Custom configuration
mcp-code-indexer --vector --vector-config /path/to/config.yaml

# HTTP mode with vector search
mcp-code-indexer --vector --http --port 8080

🛠️ Architecture

Vector Mode adds powerful new MCP tools:

  • vector_search - Semantic code search across projects
  • find_similar_code - Find code similar to a given snippet or file section
  • similarity_search - Find similar code patterns
  • dependency_search - Discover code relationships
  • vector_status - Monitor indexing progress

Status: Currently in BETA - foundations implemented, full pipeline in development.

🔧 Development Setup

👨‍💻 For Contributors

Contributing to MCP Code Indexer? Follow these steps for a proper development environment:

# Setup development environment
git clone https://github.com/fluffypony/mcp-code-indexer.git
cd mcp-code-indexer

# Install with Poetry (recommended)
poetry install

# Or use pip with virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .[dev]

# Verify installation
python main.py --help
mcp-code-indexer --version

⚠️ Important: The editable install (pip install -e .) is required for development. The project uses proper PyPI package structure with absolute imports like from mcp_code_indexer.database.database import DatabaseManager. Without editable installation, you'll get ModuleNotFoundError exceptions.

🎯 Development Workflow

# Activate virtual environment
source venv/bin/activate

# Run the server directly
python main.py --token-limit 32000

# Or use the installed CLI command
mcp-code-indexer --token-limit 32000

# Run tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Format code
black src/ tests/
isort src/ tests/

# Type checking
mypy src/

🛠️ MCP Tools Available

The server provides 13 powerful MCP tools for intelligent codebase management. Whether you're an AI agent or human developer, these tools make navigating code effortless.

🎯 Essential Tools (Start Here)

ToolPurposeWhen to Use
check_codebase_sizeGet navigation recommendationsFirst tool to call for any project
search_descriptionsFind files by functionalityWhen you need specific files
get_codebase_overviewProject architectural summaryUnderstanding system design

🔧 Core Operations

ToolPurposeBest For
get_file_descriptionRetrieve file summariesQuick file understanding
update_file_descriptionStore detailed file analysisAI agents updating descriptions
find_missing_descriptionsScan for undocumented filesMaintenance and coverage

🔍 Advanced Features

ToolPurposeUse Case
get_all_descriptionsComplete project structureSmall-to-medium codebases
get_word_frequencyTechnical vocabulary analysisDomain understanding
update_codebase_overviewCreate project documentationArchitecture documentation
search_codebase_overviewSearch in project overviewsFinding specific topics
find_similar_codeFind code similar to snippet/sectionCode pattern discovery (Vector Mode)

🏥 System Health

ToolPurposeFor
check_database_healthReal-time performance monitoringProduction deployments

💡 Pro Tip: Always start with check_codebase_size to get personalized recommendations for navigating your specific codebase.

📖 Complete API Documentation:

🔗 Git Hook Integration

Keep your codebase documentation automatically synchronized with automated analysis on every commit:

# Analyze current staged changes
mcp-code-indexer --githook

# Analyze a specific commit
mcp-code-indexer --githook abc123def

# Analyze using HEAD syntax
mcp-code-indexer --githook HEAD
mcp-code-indexer --githook HEAD~1
mcp-code-indexer --githook HEAD~3

# Analyze a commit range (perfect for rebases)
mcp-code-indexer --githook abc123 def456
mcp-code-indexer --githook HEAD~5 HEAD

🎯 Perfect for:

  • Automated documentation that never goes stale
  • Rebase-aware analysis that handles complex git operations
  • Zero-effort maintenance with background processing

See the for complete installation instructions including post-commit, post-merge, and post-rewrite hooks.

🏗️ Architecture Highlights

🚀 Performance Optimized

  • SQLite with WAL mode for high-concurrency access (800+ writes/sec)
  • Smart connection pooling with optimized pool size (3 connections default)
  • FTS5 full-text search with prefix indexing for sub-100ms queries
  • Token-aware caching to minimize expensive operations
  • Write operation serialization to eliminate database lock conflicts

🛡️ Production Ready

  • Database resilience features with <2% error rate under high load
  • Exponential backoff retry logic with intelligent failure recovery
  • Comprehensive health monitoring with automatic pool refresh
  • Structured JSON logging with performance metrics tracking
  • Async-first design with proper resource cleanup
  • MCP protocol compliant with clean stdio streams
  • Upstream inheritance for fork workflows
  • Git integration with .gitignore support

👨‍💻 Developer Friendly

  • 95%+ test coverage with async support and concurrent access tests
  • Integration tests for complete workflows including database stress testing
  • Performance benchmarks for large codebases with resilience validation
  • Clear error messages with MCP protocol compliance
  • Comprehensive configuration options for production tuning

📖 Documentation

Comprehensive documentation organized by user journey and expertise level.

🚀 Getting Started (New Users)

GuidePurposeTime Investment
Quick StartInstall and run your first server2 minutes
Master all 13 MCP tools15 minutes
REST API for web applications10 minutes
AI-powered codebase analysis8 minutes
Automate your workflow5 minutes

🏗️ Production Deployment (Teams & Admins)

GuideFocusBest For
Complete command documentationAll users
Project & database managementSystem administrators
Production setup & tuningSystem administrators
High-concurrency optimizationDevOps teams
Production monitoringOperations teams

🔧 Advanced Topics (Power Users)

GuideDepthFor
System design deep diveDevelopers & architects
Advanced error handlingSenior developers
Development workflowContributors

📋 Quick References

📚 Reading Paths:

  • New to MCP Code Indexer? Quick Start → API Reference → HTTP API → Q&A Interface
  • Web developers? Quick Start → HTTP API Reference → Q&A Interface → Git Hooks
  • AI/ML engineers? Quick Start → Q&A Interface → API Reference → Git Hooks
  • Setting up for a team? CLI Reference → Configuration → Administrative Commands → Monitoring
  • Contributing to the project? Architecture → Contributing → API Reference

🚦 System Requirements

  • Python 3.8+ with asyncio support
  • SQLite 3.35+ (included with Python)
  • 4GB+ RAM for large codebases (1000+ files)
  • SSD storage recommended for optimal performance

📊 Performance

Tested with codebases up to 10,000 files:

  • File description retrieval: < 10ms
  • Full-text search: < 100ms
  • Codebase overview generation: < 2s
  • Merge conflict detection: < 5s

🔧 Advanced Configuration

👨‍💻 For Developers: Basic Configuration

# Production setup with custom limits
mcp-code-indexer \
  --token-limit 50000 \
  --db-path /data/mcp-index.db \
  --cache-dir /tmp/mcp-cache \
  --log-level INFO

# Enable structured logging
export MCP_LOG_FORMAT=json
mcp-code-indexer

🔧 For System Administrators: Database Resilience Tuning

Configure advanced database resilience features for high-concurrency environments:

# High-performance production deployment
mcp-code-indexer \
  --token-limit 64000 \
  --db-path /data/mcp-index.db \
  --cache-dir /var/cache/mcp \
  --log-level INFO \
  --db-pool-size 5 \
  --db-retry-count 7 \
  --db-timeout 15.0 \
  --enable-wal-mode \
  --health-check-interval 20.0

# Environment variable configuration
export DB_POOL_SIZE=5
export DB_RETRY_COUNT=7
export DB_TIMEOUT=15.0
export DB_WAL_MODE=true
export DB_HEALTH_CHECK_INTERVAL=20.0
mcp-code-indexer --token-limit 64000
Configuration Options
ParameterDefaultDescriptionUse Case
--db-pool-size3Database connection pool sizeHigher for more concurrent clients
--db-retry-count5Max retry attempts for failed operationsIncrease for unstable environments
--db-timeout10.0Transaction timeout (seconds)Increase for large operations
--enable-wal-modetrueEnable WAL mode for concurrencyAlways enable for production
--health-check-interval30.0Health monitoring interval (seconds)Lower for faster issue detection

💡 Performance Tip: For environments with 10+ concurrent clients, use --db-pool-size 5 and --health-check-interval 15.0 for optimal throughput.

🤝 Integration Examples

With AI Agents

# Example: AI agent using MCP tools
async def analyze_codebase(project_path):
    # Check if codebase is large
    size_info = await mcp_client.call_tool("check_codebase_size", {
        "projectName": "my-project",
        "folderPath": project_path
    })

    if size_info["isLarge"]:
        # Use search for large codebases
        results = await mcp_client.call_tool("search_descriptions", {
            "projectName": "my-project",
            "folderPath": project_path,
            "query": "authentication logic"
        })
    else:
        # Get full overview for smaller projects
        overview = await mcp_client.call_tool("get_codebase_overview", {
            "projectName": "my-project",
            "folderPath": project_path
        })

With CI/CD Pipelines

# Example: GitHub Actions integration
- name: Update Code Descriptions
  run: |
    python -c "
    import asyncio
    from mcp_client import MCPClient

    async def update_descriptions():
        client = MCPClient('mcp-code-indexer')

        # Find files without descriptions
        missing = await client.call_tool('find_missing_descriptions', {
            'projectName': '${{ github.repository }}',
            'folderPath': '.'
        })

        # Process with AI and update...

    asyncio.run(update_descriptions())
    "

🧪 Testing

# Install with test dependencies using Poetry
poetry install --with test

# Or with pip
pip install mcp-code-indexer[test]

# Run full test suite
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=html

# Run performance tests
python -m pytest tests/ -m performance

# Run integration tests only
python -m pytest tests/integration/ -v

📈 Monitoring

The server provides structured JSON logs for monitoring:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "message": "Tool search_descriptions completed",
  "tool_usage": {
    "tool_name": "search_descriptions",
    "success": true,
    "duration_seconds": 0.045,
    "result_size": 1247
  }
}

📋 Command Line Options

Server Mode (Default)

mcp-code-indexer [OPTIONS]

Options:
  --token-limit INT     Maximum tokens before recommending search (default: 32000)
  --db-path PATH        SQLite database path (default: ~/.mcp-code-index/tracker.db)
  --cache-dir PATH      Cache directory path (default: ~/.mcp-code-index/cache)
  --log-level LEVEL     Logging level: DEBUG|INFO|WARNING|ERROR|CRITICAL (default: INFO)

Git Hook Mode

mcp-code-indexer --githook [OPTIONS]

# Automated analysis of git changes using OpenRouter API
# Requires: OPENROUTER_API_KEY environment variable

HTTP Server Mode

# Start HTTP/REST API server
mcp-code-indexer --http [OPTIONS]

# HTTP server with authentication
mcp-code-indexer --http --auth-token "your-secret-token"

# Custom host and port configuration
mcp-code-indexer --http --host 0.0.0.0 --port 8080

Q&A Commands

# Simple AI-powered questions (requires OPENROUTER_API_KEY)
mcp-code-indexer --ask "What does this project do?" PROJECT_NAME

# Enhanced analysis with file search
mcp-code-indexer --deepask "How is authentication implemented?" PROJECT_NAME

# JSON output for programmatic use
mcp-code-indexer --ask "Question" PROJECT_NAME --json

Administrative Commands

# List all projects
mcp-code-indexer --getprojects

# Execute MCP tool directly
mcp-code-indexer --runcommand '{"method": "tools/call", "params": {...}}'

# Export descriptions for a project
mcp-code-indexer --dumpdescriptions PROJECT_ID

# Create local database for a project
mcp-code-indexer --makelocal /path/to/project

# Generate project documentation map
mcp-code-indexer --map PROJECT_NAME

🛡️ Security Features

  • Input validation on all MCP tool parameters
  • SQL injection protection via parameterized queries
  • File system sandboxing with .gitignore respect
  • Error sanitization to prevent information leakage
  • Async resource cleanup to prevent memory leaks

🚨 Quick Troubleshooting

Common issues and instant solutions:

IssueQuick FixLearn More
"No module named 'mcp_code_indexer'"pip install -e . (for development)
"OPENROUTER_API_KEY not found"export OPENROUTER_API_KEY="your-key"
"Database is locked"Enable WAL mode: --enable-wal-mode
"Large codebase - use search"Normal for 200+ files. Use search_descriptions
HTTP authentication failedCheck --auth-token configuration
Q&A commands not workingSet OPENROUTER_API_KEY environment variable
High memory usageReduce token limit: --token-limit 10000

💡 Not finding your issue? Check the in our documentation.

🚀 Next Steps

Ready to supercharge your AI agents with intelligent codebase navigation?

🎯 Choose Your Path

🆕 New to MCP Code Indexer?

  1. Install and run your first server - Get up and running in 2 minutes
  2. - Learn all 11 tools with examples
  3. - REST API for web applications
  4. - Ask questions about your code
  5. - Automate your workflow

👥 Setting up for a team?

  1. - Complete command reference
  2. - Production deployment guide
  3. - Project & database management
  4. - High-concurrency setup
  5. - Production monitoring

🔧 Want to contribute?

  1. - Technical deep dive
  2. - Contribution workflow
  3. Report issues - Share feedback and suggestions

📚 Learning Resources:

🤝 Contributing

We welcome contributions! See our for:

  • Development setup
  • Code style guidelines
  • Testing requirements
  • Pull request process

📄 License

MIT License - see for details.

🙏 Built With


Transform how your AI agents understand code! 🚀

🎯 New User? Get started in 2 minutes 👨‍💻 Developer? 🔧 Production?