codebase-mcp by Ravenight13 - MCP Server

Codebase MCP Server

A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.

What's New in v2.0

Version 2.0 represents a major architectural refactoring focused exclusively on semantic code search capabilities. This release removes project management, entity tracking, and work item features to maintain single-responsibility focus.

Breaking Changes:

14 tools removed (project management, entity tracking, work item features extracted to workflow-mcp)
3 tools remaining: start_indexing_background, get_indexing_status, and search_code with multi-project support
Foreground index_repository removed (all indexing now uses background jobs to prevent timeouts)
Database schema simplified (9 tables dropped, project_id parameter added)
New environment variables for optional workflow-mcp integration

Migration Required: Existing v1.x users must follow the migration guide to upgrade safely. See for complete upgrade and rollback procedures.

What's Preserved: All indexed repositories and code embeddings remain searchable after migration.

What's Discarded: All v1.x project management data, entities, and work items are permanently removed.

Features

The Codebase MCP Server provides exactly 3 MCP tools for semantic code search with multi-project workspace support:

start_indexing_background: Start a background indexing job for a repository
- Returns job_id immediately to prevent MCP client timeouts
- Accepts optional project_id parameter for workspace isolation
- Default behavior: indexes to default project workspace if project_id not specified
- Performance target: 60-second indexing for 10,000 files
get_indexing_status: Poll the status of a background indexing job
- Query job progress using job_id from start_indexing_background
- Returns files_indexed, chunks_created, and completion status
- Enables responsive UIs with progress indicators
search_code: Semantic code search with natural language queries
- Accepts optional project_id parameter to restrict search scope
- Default behavior: searches default project workspace if project_id not specified
- Performance target: 500ms p95 search latency

Multi-Project Support

The v2.0 architecture supports isolated project workspaces through the optional project_id parameter:

Single Project Workflow (default):

# Start background indexing job - uses default workspace
job = await start_indexing_background(repo_path="/path/to/repo")
job_id = job["job_id"]

# Poll for completion
while True:
    status = await get_indexing_status(job_id=job_id)
    if status["status"] in ["completed", "failed"]:
        break
    await asyncio.sleep(2)

# Search without project_id - searches default workspace
search_code(query="authentication logic")

Multi-Project Workflow:

# Index to specific project workspace
job = await start_indexing_background(
    repo_path="/path/to/client-a-repo",
    project_id="client-a"
)
job_id = job["job_id"]

# Poll for completion
while True:
    status = await get_indexing_status(job_id=job_id, project_id="client-a")
    if status["status"] in ["completed", "failed"]:
        break
    await asyncio.sleep(2)

# Search specific project workspace
search_code(query="authentication logic", project_id="client-a")

Use Cases:

Single Project: Individual developers or small teams working on one codebase
Multi-Project: Consultants managing multiple client codebases, organizations with separate product lines, or multi-tenant deployments requiring workspace isolation

Optional Integration: The project_id can be automatically resolved from Git repository context when the optional workflow-mcp server is configured. Without workflow-mcp, all operations default to a single shared workspace.

Quick Start

1. Database Setup

# Create database
createdb codebase_mcp

# Initialize schema
psql -d codebase_mcp -f db/init_tables.sql

2. Install Dependencies

# Install dependencies including FastMCP framework
uv sync

# Or with pip
pip install -r requirements.txt

Key Dependencies:

fastmcp>=0.1.0 - Modern MCP framework with decorator-based tools
anthropic-mcp - MCP protocol implementation
sqlalchemy>=2.0 - Async ORM
pgvector - PostgreSQL vector extension
ollama - Embedding generation

3. Configure Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "codebase-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--with",
        "fastmcp",
        "python",
        "/absolute/path/to/codebase-mcp/server_fastmcp.py"
      ]
    }
  }
}

Important:

Use absolute paths!
Server uses FastMCP framework with decorator-based tool definitions
All logs go to /tmp/codebase-mcp.log (no stdout/stderr pollution)

4. Start Ollama

ollama serve
ollama pull nomic-embed-text

5. Test

# Test database and tools
uv run python tests/test_tool_handlers.py

# Test repository indexing
uv run python tests/test_embeddings.py

Current Status

Working Tools (3/3) ✅

Tool	Status	Description
`start_indexing_background`	✅ Working	Start background indexing job, returns job_id immediately
`get_indexing_status`	✅ Working	Poll indexing job status with files_indexed/chunks_created
`search_code`	✅ Working	Semantic code search with pgvector similarity

Recent Fixes (Oct 6, 2025)

✅ Parameter passing architecture (Pydantic models)
✅ MCP schema mismatches (status enums, missing parameters)
✅ Timezone/datetime compatibility (PostgreSQL)
✅ Binary file filtering (images, cache dirs)

Test Results

✅ Task Management: 7/7 tests passed
✅ Repository Indexing: 2 files indexed, 6 chunks created
✅ Embeddings: 100% coverage (768-dim vectors)
✅ Database: Connection pool, async operations working

Tool Usage Examples

Index a Repository (Background Job)

In Claude Desktop:

Index the repository at /Users/username/projects/myapp

Initial Response (immediate):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "message": "Indexing job started",
  "project_id": "default",
  "database_name": "cb_proj_default_00000000"
}

Poll for Status:

Check the status of indexing job 550e8400-e29b-41d4-a716-446655440000

Completed Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "repo_path": "/Users/username/projects/myapp",
  "files_indexed": 234,
  "chunks_created": 1456,
  "error_message": null,
  "created_at": "2025-10-18T10:30:00Z",
  "started_at": "2025-10-18T10:30:01Z",
  "completed_at": "2025-10-18T10:30:15Z"
}

Search Code

Search for "authentication middleware" in Python files

Response:

{
  "results": [
    {
      "file_path": "src/middleware/auth.py",
      "content": "def authenticate_request(request):\n    ...",
      "start_line": 45,
      "similarity_score": 0.92
    }
  ],
  "total_count": 5,
  "latency_ms": 250
}

Architecture

Claude Desktop ↔ FastMCP Server ↔ Tool Handlers ↔ Services ↔ PostgreSQL
                                                        ↓
                                                     Ollama (embeddings)

MCP Framework: Built with FastMCP - a modern, decorator-based framework for building MCP servers with:

Type-safe tool definitions via @mcp.tool() decorators
Automatic JSON Schema generation from Pydantic models
Dual logging (file + MCP protocol) without stdout pollution
Async/await support throughout

See for detailed component diagrams.

Documentation

- System architecture and data flow
- Config-based project switching internals
- Production deployment and tuning
- Complete MCP tool documentation
- Specify workflow for AI-assisted development

Database Schema

11 tables with pgvector for semantic search:

Core Tables:

repositories - Indexed repositories
code_files - Source files with metadata
code_chunks - Semantic chunks with embeddings (vector(768))
tasks - Development tasks with git tracking
task_status_history - Audit trail

See for complete schema documentation.

Technology Stack

MCP Framework: FastMCP 0.1+ (decorator-based tool definitions)
Server: Python 3.13+, FastAPI patterns, async/await
Database: PostgreSQL 14+ with pgvector extension
Embeddings: Ollama (nomic-embed-text, 768 dimensions)
ORM: SQLAlchemy 2.0 (async), Pydantic V2 for validation
Type Safety: Full mypy --strict compliance

Development

Running Tests

# Tool handlers
uv run python tests/test_tool_handlers.py

# Repository indexing
uv run python tests/test_embeddings.py

# Unit tests
uv run pytest tests/ -v

Code Structure

codebase-mcp/
├── server_fastmcp.py              # FastMCP server entry point (NEW)
├── src/
│   ├── mcp/
│   │   └── tools/                 # Tool handlers with service integration
│   │       ├── tasks.py           # Task management
│   │       ├── indexing.py        # Repository indexing
│   │       └── search.py          # Semantic search
│   ├── services/                  # Business logic layer
│   │   ├── tasks.py               # Task CRUD + git tracking
│   │   ├── indexer.py             # Indexing orchestration
│   │   ├── scanner.py             # File discovery
│   │   ├── chunker.py             # AST-based chunking
│   │   ├── embedder.py            # Ollama integration
│   │   └── searcher.py            # pgvector similarity search
│   └── models/                    # Database models + Pydantic schemas
│       ├── task.py                # Task, TaskCreate, TaskUpdate
│       ├── code_chunk.py          # CodeChunk
│       └── ...
└── tests/
    ├── test_tool_handlers.py      # Integration tests
    └── test_embeddings.py         # Embedding validation

FastMCP Server Architecture:

server_fastmcp.py - Main entry point using @mcp.tool() decorators
Tool handlers in src/mcp/tools/ provide service integration
Services in src/services/ contain all business logic
Dual logging: file (/tmp/codebase-mcp.log) + MCP protocol

Installation

Prerequisites

Before installing Codebase MCP Server v2.0, ensure the following requirements are met:

Required Software:

PostgreSQL 14+ - Database with pgvector extension for vector similarity search
Python 3.11+ - Runtime environment (Python 3.13 compatible)
Ollama - Local embedding model server with nomic-embed-text model

System Requirements:

4GB+ RAM recommended for typical workloads
SSD storage for optimal performance (database and embedding operations are I/O intensive)
Network access to Ollama server (default: localhost:11434)

Installation Commands

Install Codebase MCP Server v2.0 using pip:

# Install latest v2.0 release
pip install codebase-mcp

Alternative Installation Methods:

# Install specific v2.0 version
pip install codebase-mcp==2.0.0

# Install from source (for development)
git clone https://github.com/cliffclarke/codebase-mcp.git
cd codebase-mcp
pip install -e .

Key Dependencies Installed Automatically:

fastmcp>=0.1.0 - Modern MCP framework
sqlalchemy>=2.0 - Async database ORM
pgvector - PostgreSQL vector extension Python bindings
ollama - Embedding generation client
pydantic>=2.0 - Data validation and settings

Verification Steps

After installation, verify the setup is correct:

# Verify codebase-mcp is installed
codebase-mcp --version
# Expected output: codebase-mcp 2.0.0

# Check PostgreSQL is accessible
psql --version
# Expected output: psql (PostgreSQL) 14.x or higher

# Verify Ollama is running
curl http://localhost:11434/api/tags
# Expected output: JSON response with available models

# Confirm embedding model is available
ollama list | grep nomic-embed-text
# Expected output: nomic-embed-text model listed

Setup Complete: If all verification steps pass, Codebase MCP Server v2.0 is ready for use. Proceed to the Quick Start section for first-time indexing and search operations.

Multi-Project Configuration

The Codebase MCP server supports automatic project switching based on your working directory using .codebase-mcp/config.json files.

Quick Start

Create a config file in your project root:

mkdir -p .codebase-mcp
cat > .codebase-mcp/config.json <<EOF
{
  "version": "1.0",
  "project": {
    "name": "my-project",
    "id": "optional-uuid-here"
  },
  "auto_switch": true
}
EOF

Set your working directory (via MCP client):

await mcpClient.callTool("set_working_directory", {
  directory: "/absolute/path/to/your/project"
});

Use tools normally - they'll automatically use your project:

// Automatically uses "my-project" workspace
const result = await mcpClient.callTool("start_indexing_background", {
  repo_path: "/path/to/repo"
});
const jobId = result.job_id;

// Poll for completion
while (true) {
  const status = await mcpClient.callTool("get_indexing_status", {
    job_id: jobId
  });
  if (status.status === "completed" || status.status === "failed") {
    break;
  }
  await sleep(2000);
}

Config File Format

{
  "version": "1.0",
  "project": {
    "name": "my-project-name",
    "id": "optional-project-uuid",
    "database_name": "optional-database-override"
  },
  "auto_switch": true,
  "strict_mode": false,
  "dry_run": false,
  "description": "Optional project description"
}

Fields:

version (required): Config version (currently "1.0")
project.name (required): Project identifier (used if no ID provided)
project.id (optional): Explicit project UUID (takes priority over name)
project.database_name (optional): Override computed database name (see Database Name Resolution below)
auto_switch (optional, default true): Enable automatic project switching
strict_mode (optional, default false): Reject operations if project mismatch
dry_run (optional, default false): Log intended switches without executing

Database Name Resolution:

The server determines which database to use in this order:

Explicit database_name in config - Uses exact database name specified
```
{"project": {"database_name": "cb_proj_my_project_550e8400"}}
```

Computed from name + id - Automatically generates database name

Format: cb_proj_{sanitized_name}_{id_prefix}
Example: cb_proj_my_project_550e8400

Use Cases for database_name Override:

Recovering from database name mismatches
Migrating from old database naming schemes
Explicit control over database selection
Debugging and troubleshooting

Example - Auto-generated (default):

{
  "version": "1.0",
  "project": {
    "name": "my-project",
    "id": "550e8400-e29b-41d4-a716-446655440000"
  }
}

Database used: cb_proj_my_project_550e8400 (auto-computed)

Example - Explicit override:

{
  "version": "1.0",
  "project": {
    "name": "my-project",
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "database_name": "cb_proj_legacy_database_12345678"
  }
}

Database used: cb_proj_legacy_database_12345678 (explicit override)

Project Resolution Priority

When you call MCP tools, the server resolves the project workspace using this 4-tier priority system:

Explicit project_id parameter (highest priority)

await mcpClient.callTool("start_indexing_background", {
  repo_path: "/path/to/repo",
  project_id: "explicit-project-id"  // Always takes priority
});

Session-based config file (via set_working_directory)
- Server searches up to 20 directory levels for .codebase-mcp/config.json
- Cached with mtime-based invalidation for performance
- Isolated per MCP session (multiple clients stay independent)
workflow-mcp integration (external project tracking)
- Queries workflow-mcp server for active project context
- Configurable timeout and caching
Default workspace (fallback)
- Uses project_default schema when no other resolution succeeds

Multi-Session Isolation

The server maintains separate working directories for each MCP session (client connection):

// Session 1 (Claude Code instance A)
await mcpClient1.callTool("set_working_directory", {
  directory: "/Users/alice/project-a"
});

// Session 2 (Claude Code instance B)
await mcpClient2.callTool("set_working_directory", {
  directory: "/Users/bob/project-b"
});

// Each session independently resolves its own project
// No cross-contamination between sessions

Config File Discovery

The server searches for .codebase-mcp/config.json by:

Starting from your working directory
Searching up to 20 parent directories
Stopping at the first config file found
Caching the result (with automatic invalidation on file modification)

Example directory structure:

/Users/alice/projects/my-app/     <- .codebase-mcp/config.json here
├── .codebase-mcp/
│   └── config.json
├── src/
│   └── components/               <- Working directory
│       └── Button.tsx

If you set working directory to /Users/alice/projects/my-app/src/components/, the server will find the config at /Users/alice/projects/my-app/.codebase-mcp/config.json.

Performance

Config discovery: <50ms (with upward traversal)
Cache hit: <5ms
Session lookup: <1ms
Background cleanup: Hourly (removes sessions inactive >24h)

Database Setup

1. Create Database

# Connect to PostgreSQL
psql -U postgres

# Create database
CREATE DATABASE codebase_mcp;

# Enable pgvector extension
\c codebase_mcp
CREATE EXTENSION IF NOT EXISTS vector;
\q

2. Initialize Schema

# Run database initialization script
python scripts/init_db.py

# Verify schema creation
alembic current

The initialization script will:

Create all required tables (repositories, files, chunks, tasks)
Set up vector indexes for similarity search
Configure connection pooling
Apply all database migrations

3. Verify Setup

# Check database connectivity
python -c "from src.database import Database; import asyncio; asyncio.run(Database.create_pool())"

# Run migration status check
alembic current

4. Database Reset & Cleanup

During development, you may need to reset your database using the following reset options:

scripts/clear_data.sh - Clear all data, keep schema (fastest, no restart needed)
scripts/reset_database.sh - Drop and recreate all tables (recommended for schema changes)
scripts/nuclear_reset.sh - Drop entire database (requires Claude Desktop restart)

# Quick data wipe (keeps schema)
./scripts/clear_data.sh

# Full table reset (recommended)
./scripts/reset_database.sh

# Nuclear option (drops database)
./scripts/nuclear_reset.sh

Running the Server

FastMCP Server (Recommended)

The primary way to run the server is via Claude Desktop or other MCP clients:

# Via Claude Desktop (configured in claude_desktop_config.json)
# Server starts automatically when Claude Desktop launches

# Manual testing with FastMCP CLI
uv run --with fastmcp python server_fastmcp.py

# With custom log level
LOG_LEVEL=DEBUG uv run --with fastmcp python server_fastmcp.py

Server Entry Point: server_fastmcp.py in repository root

Logging: All output goes to /tmp/codebase-mcp.log (configurable via LOG_FILE env var)

Development Mode (Legacy FastAPI)

# Start with auto-reload (if FastAPI server exists)
uvicorn src.main:app --reload --host 127.0.0.1 --port 3000

# With custom log level
LOG_LEVEL=DEBUG uvicorn src.main:app --reload

Production Mode (Legacy)

# Start production server
uvicorn src.main:app --host 0.0.0.0 --port 3000 --workers 4

# With gunicorn (recommended for production)
gunicorn src.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:3000

stdio Transport (Legacy CLI Mode)

The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout.

# Start stdio server (reads JSON-RPC from stdin)
python -m src.mcp.stdio_server

# Echo a single request
echo '{"jsonrpc":"2.0","id":1,"method":"list_tasks","params":{"limit":5}}' | python -m src.mcp.stdio_server

# Pipe requests from a file (one JSON-RPC request per line)
cat requests.jsonl | python -m src.mcp.stdio_server

# Interactive mode (type JSON-RPC requests manually)
python -m src.mcp.stdio_server
{"jsonrpc":"2.0","id":1,"method":"get_task","params":{"task_id":"..."}}

JSON-RPC 2.0 Request Format:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "search_code",
  "params": {
    "query": "async def",
    "limit": 10
  }
}

JSON-RPC 2.0 Response Format:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "results": [...],
    "total_count": 42,
    "latency_ms": 250
  }
}

Available Methods:

search_code - Semantic code search
start_indexing_background - Start background indexing job
get_indexing_status - Poll indexing job status

Logging: All logs go to /tmp/codebase-mcp.log (configurable via LOG_FILE env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.

Health Check

# Check server health
curl http://localhost:3000/health

# Expected response:
{
  "status": "healthy",
  "database": "connected",
  "ollama": "connected",
  "version": "0.1.0"
}

Usage Examples

1. Index a Repository (Background Job)

# Start indexing job via MCP protocol
{
  "tool": "start_indexing_background",
  "arguments": {
    "repo_path": "/path/to/your/repo"
  }
}

# Immediate response
{
  "job_id": "uuid-here",
  "status": "pending",
  "message": "Indexing job started",
  "project_id": "default",
  "database_name": "cb_proj_default_00000000"
}

# Poll for status
{
  "tool": "get_indexing_status",
  "arguments": {
    "job_id": "uuid-here"
  }
}

# Completed response
{
  "job_id": "uuid-here",
  "status": "completed",
  "repo_path": "/path/to/your/repo",
  "files_indexed": 150,
  "chunks_created": 1200,
  "error_message": null,
  "created_at": "2025-10-18T10:30:00Z",
  "started_at": "2025-10-18T10:30:01Z",
  "completed_at": "2025-10-18T10:30:45Z"
}

2. Search Code

# Search for authentication logic
{
  "tool": "search_code",
  "arguments": {
    "query": "user authentication password validation",
    "limit": 10,
    "file_type": "py"
  }
}

# Response includes ranked code chunks with context
{
  "results": [...],
  "total_count": 25,
  "latency_ms": 230
}

Architecture

┌─────────────────────────────────────────────────┐
│                 MCP Client (AI)                 │
└─────────────────┬───────────────────────────────┘
                  │ SSE Protocol
┌─────────────────▼───────────────────────────────┐
│              MCP Server Layer                   │
│  ┌──────────────────────────────────────────┐  │
│  │         Tool Registration & Routing       │  │
│  └──────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────┐  │
│  │          Request/Response Handling        │  │
│  └──────────────────────────────────────────┘  │
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│              Service Layer                      │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐  │
│  │  Indexer   │ │  Searcher  │ │Task Manager│  │
│  └──────┬─────┘ └──────┬─────┘ └──────┬─────┘  │
│         │              │              │         │
│  ┌──────▼──────────────▼──────────────▼──────┐ │
│  │          Repository Service                │ │
│  └──────┬─────────────────────────────────────┘ │
│         │                                       │
│  ┌──────▼─────────────────────────────────────┐ │
│  │          Embedding Service (Ollama)        │ │
│  └─────────────────────────────────────────────┘│
└─────────────────┬───────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────┐
│              Data Layer                         │
│  ┌──────────────────────────────────────────┐  │
│  │     PostgreSQL with pgvector              │  │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐  │  │
│  │  │Repository│ │   Files  │ │  Chunks  │  │  │
│  │  └──────────┘ └──────────┘ └──────────┘  │  │
│  │  ┌──────────┐ ┌──────────────────────┐   │  │
│  │  │  Tasks   │ │  Vector Embeddings   │   │  │
│  │  └──────────┘ └──────────────────────┘   │  │
│  └──────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘

Component Overview

MCP Layer: Handles protocol compliance, tool registration, SSE transport
Service Layer: Business logic for indexing, searching, task management
Repository Service: File system operations, git integration, .gitignore handling
Embedding Service: Ollama integration for generating text embeddings
Data Layer: PostgreSQL with pgvector for storage and similarity search

Data Flow

Indexing: Repository → Parse → Chunk → Embed → Store
Searching: Query → Embed → Vector Search → Rank → Return
Task Tracking: Create → Update → Git Integration → Query

Testing

Run All Tests

# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=term-missing

# Run specific test categories
pytest tests/unit/ -v          # Unit tests only
pytest tests/integration/ -v   # Integration tests
pytest tests/contract/ -v      # Contract tests

Test Categories

Unit Tests: Fast, isolated component tests
Integration Tests: Database and service integration
Contract Tests: MCP protocol compliance validation
Performance Tests: Latency and throughput benchmarks

Coverage Requirements

Minimum coverage: 95%
Critical paths: 100%
View HTML report: open htmlcov/index.html

Performance Tuning

Database Optimization

-- Optimize vector searches
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Adjust work_mem for large result sets
ALTER SYSTEM SET work_mem = '256MB';
SELECT pg_reload_conf();

Connection Pool Settings

# In .env
DATABASE_POOL_SIZE=20        # Connection pool size
DATABASE_MAX_OVERFLOW=10     # Max overflow connections
DATABASE_POOL_TIMEOUT=30     # Connection timeout in seconds

Embedding Batch Size

# Adjust based on available memory
EMBEDDING_BATCH_SIZE=100     # For systems with 8GB+ RAM
EMBEDDING_BATCH_SIZE=50      # Default for 4GB RAM
EMBEDDING_BATCH_SIZE=25      # For constrained environments

Troubleshooting

Common Issues

Database Connection Failed
- Check PostgreSQL is running: pg_ctl status
- Verify DATABASE_URL in .env
- Ensure database exists: psql -U postgres -l
Ollama Connection Error
- Check Ollama is running: curl http://localhost:11434/api/tags
- Verify model is installed: ollama list
- Check OLLAMA_BASE_URL in .env
Slow Performance
- Check database indexes: \di in psql
- Monitor query performance: See logs at LOG_FILE path
- Adjust batch sizes and connection pool

For detailed troubleshooting, see the Configuration Guide troubleshooting section.

Contributing

We follow a specification-driven development workflow using the Specify framework.

Development Workflow

Feature Specification: Use /specify command to create feature specs
Planning: Generate implementation plan with /plan
Task Breakdown: Create tasks with /tasks
Implementation: Execute tasks with /implement

Git Workflow

# Create feature branch
git checkout -b 001-feature-name

# Make atomic commits
git add .
git commit -m "feat(component): add specific feature"

# Push and create PR
git push origin 001-feature-name

Code Quality Standards

Type Safety: mypy --strict must pass
Linting: ruff check with no errors
Testing: All tests must pass with 95%+ coverage
Documentation: Update relevant docs with changes

Constitutional Principles

Simplicity Over Features: Focus on core semantic search
Local-First Architecture: No cloud dependencies
Protocol Compliance: Strict MCP adherence
Performance Guarantees: Meet stated benchmarks
Production Quality: Comprehensive error handling

See for full principles.

FastMCP Migration (Oct 2025)

Migration Complete: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework.

What Changed

Before (MCP SDK):

# Old: Manual tool registration with JSON schemas
class MCPServer:
    def __init__(self):
        self.tools = {
            "search_code": {
                "name": "search_code",
                "description": "...",
                "inputSchema": {...}
            }
        }

After (FastMCP):

# New: Decorator-based tool definitions
@mcp.tool()
async def search_code(query: str, limit: int = 10) -> dict[str, Any]:
    """Semantic code search with natural language queries."""
    # Implementation

Key Benefits

Simpler Tool Definitions: Decorators replace manual JSON schema creation
Type Safety: Automatic schema generation from Pydantic models
Dual Logging: File logging + MCP protocol without stdout pollution
Better Error Handling: Structured error responses with context
Cleaner Architecture: Separation of tool interface from business logic

Server Files

New Entry Point: server_fastmcp.py (root directory)
Legacy Server: src/mcp/mcp_stdio_server_v3.py (deprecated, will be removed)
Tool Handlers: src/mcp/tools/*.py (unchanged, reused by FastMCP)
Services: src/services/*.py (unchanged, business logic intact)

Configuration Update Required

Update your Claude Desktop config to use the new server:

{
  "mcpServers": {
    "codebase-mcp": {
      "command": "uv",
      "args": ["run", "--with", "fastmcp", "python", "/path/to/server_fastmcp.py"]
    }
  }
}

Migration Notes

All 6 MCP tools remain functional (100% backward compatible)
No database schema changes required
Tool signatures and responses unchanged
Logging now goes exclusively to /tmp/codebase-mcp.log
All tests pass with FastMCP implementation

Performance

FastMCP maintains performance targets:

Repository indexing: <60 seconds for 10K files
Code search: <500ms p95 latency
Async/await throughout for optimal concurrency

License

MIT License (LICENSE file pending).

Support

Issues: GitHub Issues
Documentation:
Logs: Check /tmp/codebase-mcp.log for detailed debugging

Quick Start

Basic Usage (Default Project)

For most users, the default project workspace is sufficient. All indexing now uses background jobs to prevent MCP client timeouts:

# Start background indexing job (returns immediately)
job = await start_indexing_background(repo_path="/path/to/your/repo")
job_id = job["job_id"]

# Poll for completion
while True:
    status = await get_indexing_status(job_id=job_id)
    if status["status"] in ["completed", "failed"]:
        break
    await asyncio.sleep(2)

# Check result
if status["status"] == "completed":
    print(f"✅ Indexed {status['files_indexed']} files, {status['chunks_created']} chunks")
else:
    print(f"❌ Indexing failed: {status['error_message']}")

# Search code
results = await search_code(query="function to handle authentication")

# Search with filters
results = await search_code(
    query="database query",
    file_type="py",
    limit=20
)

The server automatically uses a default project workspace (project_default) if no project ID is specified.

Multi-Project Usage

For users managing multiple codebases or client projects, use the project_id parameter to isolate repositories:

# Index repositories with project_id
job_a = await start_indexing_background(
    repo_path="/path/to/client-a-repo",
    project_id="client-a"
)

job_b = await start_indexing_background(
    repo_path="/path/to/client-b-repo",
    project_id="client-b"
)

# Poll both jobs
for job in [job_a, job_b]:
    while True:
        status = await get_indexing_status(job_id=job["job_id"])
        if status["status"] in ["completed", "failed"]:
            break
        await asyncio.sleep(2)

# Search within specific project
results_a = await search_code(
    query="authentication logic",
    project_id="client-a"
)

results_b = await search_code(
    query="payment processing",
    project_id="client-b"
)

Each project has its own isolated database schema, ensuring repositories and embeddings are completely separated.

workflow-mcp Integration (Optional)

The Codebase MCP Server can optionally integrate with workflow-mcp for automatic project context resolution. This is an advanced feature and not required for basic usage.

Standalone Usage (Default)

By default, Codebase MCP operates independently:

# Works out of the box without workflow-mcp
job = await start_indexing_background(repo_path="/path/to/repo")
results = await search_code(query="search query")

Integration with workflow-mcp

If you're using workflow-mcp to manage development projects, Codebase MCP can automatically resolve project context:

# Set workflow-mcp URL in environment
export WORKFLOW_MCP_URL=http://localhost:8001

# Now project_id is automatically resolved from workflow-mcp's active project
job = await start_indexing_background(repo_path="/path/to/repo")  # Uses active project
results = await search_code(query="search query")  # Searches in active project's context

How It Works:

Codebase MCP queries workflow-mcp for the active project
If an active project exists, it's used as the project_id
If no active project or workflow-mcp is unavailable, falls back to default project
You can still override with --project-id flag

Configuration:

# In .env file
WORKFLOW_MCP_URL=http://localhost:8001  # Optional, enables integration

See Also: workflow-mcp repository for details on project workspace management.

Documentation

Comprehensive documentation is available for different use cases:

- Upgrading from v1.x to v2.x with multi-project support
- Production deployment and tuning
- System design and multi-project isolation
- Complete MCP tool documentation
- Canonical terminology definitions

For quick setup, refer to the Installation section above.

Contributing

We welcome contributions to the Codebase MCP Server. This project follows a specification-driven development workflow.

Getting Started

Read the Architecture: Start with to understand the system design
Review the Constitution: See for project principles
Follow the Workflow: Use the Specify workflow documented in

Development Process

Create a feature specification using /specify command
Plan the implementation with /plan
Generate tasks using /tasks
Implement incrementally with atomic commits

Code Standards

Type Safety: Full mypy --strict compliance
Testing: 95%+ test coverage, contract tests for MCP protocol
Performance: Meet benchmarks (60s indexing, 500ms search p95)
Documentation: Update docs with all changes

Code of Conduct

This project adheres to a code of conduct that promotes a welcoming, inclusive environment. We expect:

Respectful communication in issues and PRs
Constructive feedback focused on code and ideas
Recognition that contributors volunteer their time
Patience with maintainers and fellow contributors

By participating, you agree to uphold these standards.

Acknowledgments

MCP framework powered by FastMCP
Built with FastAPI, SQLAlchemy, and Pydantic
Vector search powered by pgvector
Embeddings via Ollama and nomic-embed-text
Code parsing with tree-sitter
MCP protocol by Anthropic