Ravenight13/codebase-mcp
If you are the rightful owner of codebase-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Codebase MCP Server is a production-grade server designed to index code repositories into PostgreSQL with pgvector for semantic search, specifically tailored for AI coding assistants.
Codebase MCP Server
A production-grade MCP (Model Context Protocol) server that indexes code repositories into PostgreSQL with pgvector for semantic search, designed specifically for AI coding assistants.
What's New in v2.0
Version 2.0 represents a major architectural refactoring focused exclusively on semantic code search capabilities. This release removes project management, entity tracking, and work item features to maintain single-responsibility focus.
Breaking Changes:
- 14 tools removed (project management, entity tracking, work item features extracted to workflow-mcp)
- 3 tools remaining:
start_indexing_background,get_indexing_status, andsearch_codewith multi-project support - Foreground
index_repositoryremoved (all indexing now uses background jobs to prevent timeouts) - Database schema simplified (9 tables dropped,
project_idparameter added) - New environment variables for optional workflow-mcp integration
Migration Required: Existing v1.x users must follow the migration guide to upgrade safely. See for complete upgrade and rollback procedures.
What's Preserved: All indexed repositories and code embeddings remain searchable after migration.
What's Discarded: All v1.x project management data, entities, and work items are permanently removed.
Features
The Codebase MCP Server provides exactly 3 MCP tools for semantic code search with multi-project workspace support:
-
start_indexing_background: Start a background indexing job for a repository- Returns job_id immediately to prevent MCP client timeouts
- Accepts optional
project_idparameter for workspace isolation - Default behavior: indexes to default project workspace if
project_idnot specified - Performance target: 60-second indexing for 10,000 files
-
get_indexing_status: Poll the status of a background indexing job- Query job progress using job_id from start_indexing_background
- Returns files_indexed, chunks_created, and completion status
- Enables responsive UIs with progress indicators
-
search_code: Semantic code search with natural language queries- Accepts optional
project_idparameter to restrict search scope - Default behavior: searches default project workspace if
project_idnot specified - Performance target: 500ms p95 search latency
- Accepts optional
Multi-Project Support
The v2.0 architecture supports isolated project workspaces through the optional project_id parameter:
Single Project Workflow (default):
# Start background indexing job - uses default workspace
job = await start_indexing_background(repo_path="/path/to/repo")
job_id = job["job_id"]
# Poll for completion
while True:
status = await get_indexing_status(job_id=job_id)
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Search without project_id - searches default workspace
search_code(query="authentication logic")
Multi-Project Workflow:
# Index to specific project workspace
job = await start_indexing_background(
repo_path="/path/to/client-a-repo",
project_id="client-a"
)
job_id = job["job_id"]
# Poll for completion
while True:
status = await get_indexing_status(job_id=job_id, project_id="client-a")
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Search specific project workspace
search_code(query="authentication logic", project_id="client-a")
Use Cases:
- Single Project: Individual developers or small teams working on one codebase
- Multi-Project: Consultants managing multiple client codebases, organizations with separate product lines, or multi-tenant deployments requiring workspace isolation
Optional Integration: The project_id can be automatically resolved from Git repository context when the optional workflow-mcp server is configured. Without workflow-mcp, all operations default to a single shared workspace.
Quick Start
1. Database Setup
# Create database
createdb codebase_mcp
# Initialize schema
psql -d codebase_mcp -f db/init_tables.sql
2. Install Dependencies
# Install dependencies including FastMCP framework
uv sync
# Or with pip
pip install -r requirements.txt
Key Dependencies:
fastmcp>=0.1.0- Modern MCP framework with decorator-based toolsanthropic-mcp- MCP protocol implementationsqlalchemy>=2.0- Async ORMpgvector- PostgreSQL vector extensionollama- Embedding generation
3. Configure Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"codebase-mcp": {
"command": "uv",
"args": [
"run",
"--with",
"fastmcp",
"python",
"/absolute/path/to/codebase-mcp/server_fastmcp.py"
]
}
}
}
Important:
- Use absolute paths!
- Server uses FastMCP framework with decorator-based tool definitions
- All logs go to
/tmp/codebase-mcp.log(no stdout/stderr pollution)
4. Start Ollama
ollama serve
ollama pull nomic-embed-text
5. Test
# Test database and tools
uv run python tests/test_tool_handlers.py
# Test repository indexing
uv run python tests/test_embeddings.py
Current Status
Working Tools (3/3) ✅
| Tool | Status | Description |
|---|---|---|
start_indexing_background | ✅ Working | Start background indexing job, returns job_id immediately |
get_indexing_status | ✅ Working | Poll indexing job status with files_indexed/chunks_created |
search_code | ✅ Working | Semantic code search with pgvector similarity |
Recent Fixes (Oct 6, 2025)
- ✅ Parameter passing architecture (Pydantic models)
- ✅ MCP schema mismatches (status enums, missing parameters)
- ✅ Timezone/datetime compatibility (PostgreSQL)
- ✅ Binary file filtering (images, cache dirs)
Test Results
✅ Task Management: 7/7 tests passed
✅ Repository Indexing: 2 files indexed, 6 chunks created
✅ Embeddings: 100% coverage (768-dim vectors)
✅ Database: Connection pool, async operations working
Tool Usage Examples
Index a Repository (Background Job)
In Claude Desktop:
Index the repository at /Users/username/projects/myapp
Initial Response (immediate):
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"message": "Indexing job started",
"project_id": "default",
"database_name": "cb_proj_default_00000000"
}
Poll for Status:
Check the status of indexing job 550e8400-e29b-41d4-a716-446655440000
Completed Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"repo_path": "/Users/username/projects/myapp",
"files_indexed": 234,
"chunks_created": 1456,
"error_message": null,
"created_at": "2025-10-18T10:30:00Z",
"started_at": "2025-10-18T10:30:01Z",
"completed_at": "2025-10-18T10:30:15Z"
}
Search Code
Search for "authentication middleware" in Python files
Response:
{
"results": [
{
"file_path": "src/middleware/auth.py",
"content": "def authenticate_request(request):\n ...",
"start_line": 45,
"similarity_score": 0.92
}
],
"total_count": 5,
"latency_ms": 250
}
Architecture
Claude Desktop ↔ FastMCP Server ↔ Tool Handlers ↔ Services ↔ PostgreSQL
↓
Ollama (embeddings)
MCP Framework: Built with FastMCP - a modern, decorator-based framework for building MCP servers with:
- Type-safe tool definitions via
@mcp.tool()decorators - Automatic JSON Schema generation from Pydantic models
- Dual logging (file + MCP protocol) without stdout pollution
- Async/await support throughout
See for detailed component diagrams.
Documentation
- - System architecture and data flow
- - Config-based project switching internals
- - Production deployment and tuning
- - Complete MCP tool documentation
- - Specify workflow for AI-assisted development
Database Schema
11 tables with pgvector for semantic search:
Core Tables:
repositories- Indexed repositoriescode_files- Source files with metadatacode_chunks- Semantic chunks with embeddings (vector(768))tasks- Development tasks with git trackingtask_status_history- Audit trail
See for complete schema documentation.
Technology Stack
- MCP Framework: FastMCP 0.1+ (decorator-based tool definitions)
- Server: Python 3.13+, FastAPI patterns, async/await
- Database: PostgreSQL 14+ with pgvector extension
- Embeddings: Ollama (nomic-embed-text, 768 dimensions)
- ORM: SQLAlchemy 2.0 (async), Pydantic V2 for validation
- Type Safety: Full mypy --strict compliance
Development
Running Tests
# Tool handlers
uv run python tests/test_tool_handlers.py
# Repository indexing
uv run python tests/test_embeddings.py
# Unit tests
uv run pytest tests/ -v
Code Structure
codebase-mcp/
├── server_fastmcp.py # FastMCP server entry point (NEW)
├── src/
│ ├── mcp/
│ │ └── tools/ # Tool handlers with service integration
│ │ ├── tasks.py # Task management
│ │ ├── indexing.py # Repository indexing
│ │ └── search.py # Semantic search
│ ├── services/ # Business logic layer
│ │ ├── tasks.py # Task CRUD + git tracking
│ │ ├── indexer.py # Indexing orchestration
│ │ ├── scanner.py # File discovery
│ │ ├── chunker.py # AST-based chunking
│ │ ├── embedder.py # Ollama integration
│ │ └── searcher.py # pgvector similarity search
│ └── models/ # Database models + Pydantic schemas
│ ├── task.py # Task, TaskCreate, TaskUpdate
│ ├── code_chunk.py # CodeChunk
│ └── ...
└── tests/
├── test_tool_handlers.py # Integration tests
└── test_embeddings.py # Embedding validation
FastMCP Server Architecture:
server_fastmcp.py- Main entry point using@mcp.tool()decorators- Tool handlers in
src/mcp/tools/provide service integration - Services in
src/services/contain all business logic - Dual logging: file (
/tmp/codebase-mcp.log) + MCP protocol
Installation
Prerequisites
Before installing Codebase MCP Server v2.0, ensure the following requirements are met:
Required Software:
- PostgreSQL 14+ - Database with pgvector extension for vector similarity search
- Python 3.11+ - Runtime environment (Python 3.13 compatible)
- Ollama - Local embedding model server with nomic-embed-text model
System Requirements:
- 4GB+ RAM recommended for typical workloads
- SSD storage for optimal performance (database and embedding operations are I/O intensive)
- Network access to Ollama server (default: localhost:11434)
Installation Commands
Install Codebase MCP Server v2.0 using pip:
# Install latest v2.0 release
pip install codebase-mcp
Alternative Installation Methods:
# Install specific v2.0 version
pip install codebase-mcp==2.0.0
# Install from source (for development)
git clone https://github.com/cliffclarke/codebase-mcp.git
cd codebase-mcp
pip install -e .
Key Dependencies Installed Automatically:
fastmcp>=0.1.0- Modern MCP frameworksqlalchemy>=2.0- Async database ORMpgvector- PostgreSQL vector extension Python bindingsollama- Embedding generation clientpydantic>=2.0- Data validation and settings
Verification Steps
After installation, verify the setup is correct:
# Verify codebase-mcp is installed
codebase-mcp --version
# Expected output: codebase-mcp 2.0.0
# Check PostgreSQL is accessible
psql --version
# Expected output: psql (PostgreSQL) 14.x or higher
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Expected output: JSON response with available models
# Confirm embedding model is available
ollama list | grep nomic-embed-text
# Expected output: nomic-embed-text model listed
Setup Complete: If all verification steps pass, Codebase MCP Server v2.0 is ready for use. Proceed to the Quick Start section for first-time indexing and search operations.
Multi-Project Configuration
The Codebase MCP server supports automatic project switching based on your working directory using .codebase-mcp/config.json files.
Quick Start
- Create a config file in your project root:
mkdir -p .codebase-mcp
cat > .codebase-mcp/config.json <<EOF
{
"version": "1.0",
"project": {
"name": "my-project",
"id": "optional-uuid-here"
},
"auto_switch": true
}
EOF
- Set your working directory (via MCP client):
await mcpClient.callTool("set_working_directory", {
directory: "/absolute/path/to/your/project"
});
- Use tools normally - they'll automatically use your project:
// Automatically uses "my-project" workspace
const result = await mcpClient.callTool("start_indexing_background", {
repo_path: "/path/to/repo"
});
const jobId = result.job_id;
// Poll for completion
while (true) {
const status = await mcpClient.callTool("get_indexing_status", {
job_id: jobId
});
if (status.status === "completed" || status.status === "failed") {
break;
}
await sleep(2000);
}
Config File Format
{
"version": "1.0",
"project": {
"name": "my-project-name",
"id": "optional-project-uuid",
"database_name": "optional-database-override"
},
"auto_switch": true,
"strict_mode": false,
"dry_run": false,
"description": "Optional project description"
}
Fields:
version(required): Config version (currently "1.0")project.name(required): Project identifier (used if no ID provided)project.id(optional): Explicit project UUID (takes priority over name)project.database_name(optional): Override computed database name (see Database Name Resolution below)auto_switch(optional, default true): Enable automatic project switchingstrict_mode(optional, default false): Reject operations if project mismatchdry_run(optional, default false): Log intended switches without executing
Database Name Resolution:
The server determines which database to use in this order:
-
Explicit
database_namein config - Uses exact database name specified{"project": {"database_name": "cb_proj_my_project_550e8400"}} -
Computed from
name+id- Automatically generates database nameFormat: cb_proj_{sanitized_name}_{id_prefix} Example: cb_proj_my_project_550e8400
Use Cases for database_name Override:
- Recovering from database name mismatches
- Migrating from old database naming schemes
- Explicit control over database selection
- Debugging and troubleshooting
Example - Auto-generated (default):
{
"version": "1.0",
"project": {
"name": "my-project",
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
Database used: cb_proj_my_project_550e8400 (auto-computed)
Example - Explicit override:
{
"version": "1.0",
"project": {
"name": "my-project",
"id": "550e8400-e29b-41d4-a716-446655440000",
"database_name": "cb_proj_legacy_database_12345678"
}
}
Database used: cb_proj_legacy_database_12345678 (explicit override)
Project Resolution Priority
When you call MCP tools, the server resolves the project workspace using this 4-tier priority system:
-
Explicit
project_idparameter (highest priority)await mcpClient.callTool("start_indexing_background", { repo_path: "/path/to/repo", project_id: "explicit-project-id" // Always takes priority }); -
Session-based config file (via
set_working_directory)- Server searches up to 20 directory levels for
.codebase-mcp/config.json - Cached with mtime-based invalidation for performance
- Isolated per MCP session (multiple clients stay independent)
- Server searches up to 20 directory levels for
-
workflow-mcp integration (external project tracking)
- Queries workflow-mcp server for active project context
- Configurable timeout and caching
-
Default workspace (fallback)
- Uses
project_defaultschema when no other resolution succeeds
- Uses
Multi-Session Isolation
The server maintains separate working directories for each MCP session (client connection):
// Session 1 (Claude Code instance A)
await mcpClient1.callTool("set_working_directory", {
directory: "/Users/alice/project-a"
});
// Session 2 (Claude Code instance B)
await mcpClient2.callTool("set_working_directory", {
directory: "/Users/bob/project-b"
});
// Each session independently resolves its own project
// No cross-contamination between sessions
Config File Discovery
The server searches for .codebase-mcp/config.json by:
- Starting from your working directory
- Searching up to 20 parent directories
- Stopping at the first config file found
- Caching the result (with automatic invalidation on file modification)
Example directory structure:
/Users/alice/projects/my-app/ <- .codebase-mcp/config.json here
├── .codebase-mcp/
│ └── config.json
├── src/
│ └── components/ <- Working directory
│ └── Button.tsx
If you set working directory to /Users/alice/projects/my-app/src/components/, the server will find the config at /Users/alice/projects/my-app/.codebase-mcp/config.json.
Performance
- Config discovery: <50ms (with upward traversal)
- Cache hit: <5ms
- Session lookup: <1ms
- Background cleanup: Hourly (removes sessions inactive >24h)
Database Setup
1. Create Database
# Connect to PostgreSQL
psql -U postgres
# Create database
CREATE DATABASE codebase_mcp;
# Enable pgvector extension
\c codebase_mcp
CREATE EXTENSION IF NOT EXISTS vector;
\q
2. Initialize Schema
# Run database initialization script
python scripts/init_db.py
# Verify schema creation
alembic current
The initialization script will:
- Create all required tables (repositories, files, chunks, tasks)
- Set up vector indexes for similarity search
- Configure connection pooling
- Apply all database migrations
3. Verify Setup
# Check database connectivity
python -c "from src.database import Database; import asyncio; asyncio.run(Database.create_pool())"
# Run migration status check
alembic current
4. Database Reset & Cleanup
During development, you may need to reset your database using the following reset options:
- scripts/clear_data.sh - Clear all data, keep schema (fastest, no restart needed)
- scripts/reset_database.sh - Drop and recreate all tables (recommended for schema changes)
- scripts/nuclear_reset.sh - Drop entire database (requires Claude Desktop restart)
# Quick data wipe (keeps schema)
./scripts/clear_data.sh
# Full table reset (recommended)
./scripts/reset_database.sh
# Nuclear option (drops database)
./scripts/nuclear_reset.sh
Running the Server
FastMCP Server (Recommended)
The primary way to run the server is via Claude Desktop or other MCP clients:
# Via Claude Desktop (configured in claude_desktop_config.json)
# Server starts automatically when Claude Desktop launches
# Manual testing with FastMCP CLI
uv run --with fastmcp python server_fastmcp.py
# With custom log level
LOG_LEVEL=DEBUG uv run --with fastmcp python server_fastmcp.py
Server Entry Point: server_fastmcp.py in repository root
Logging: All output goes to /tmp/codebase-mcp.log (configurable via LOG_FILE env var)
Development Mode (Legacy FastAPI)
# Start with auto-reload (if FastAPI server exists)
uvicorn src.main:app --reload --host 127.0.0.1 --port 3000
# With custom log level
LOG_LEVEL=DEBUG uvicorn src.main:app --reload
Production Mode (Legacy)
# Start production server
uvicorn src.main:app --host 0.0.0.0 --port 3000 --workers 4
# With gunicorn (recommended for production)
gunicorn src.main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:3000
stdio Transport (Legacy CLI Mode)
The legacy MCP server supports stdio transport for CLI clients via JSON-RPC 2.0 over stdin/stdout.
# Start stdio server (reads JSON-RPC from stdin)
python -m src.mcp.stdio_server
# Echo a single request
echo '{"jsonrpc":"2.0","id":1,"method":"list_tasks","params":{"limit":5}}' | python -m src.mcp.stdio_server
# Pipe requests from a file (one JSON-RPC request per line)
cat requests.jsonl | python -m src.mcp.stdio_server
# Interactive mode (type JSON-RPC requests manually)
python -m src.mcp.stdio_server
{"jsonrpc":"2.0","id":1,"method":"get_task","params":{"task_id":"..."}}
JSON-RPC 2.0 Request Format:
{
"jsonrpc": "2.0",
"id": 1,
"method": "search_code",
"params": {
"query": "async def",
"limit": 10
}
}
JSON-RPC 2.0 Response Format:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"results": [...],
"total_count": 42,
"latency_ms": 250
}
}
Available Methods:
search_code- Semantic code searchstart_indexing_background- Start background indexing jobget_indexing_status- Poll indexing job status
Logging:
All logs go to /tmp/codebase-mcp.log (configurable via LOG_FILE env var). No stdout/stderr pollution - only JSON-RPC protocol messages on stdout.
Health Check
# Check server health
curl http://localhost:3000/health
# Expected response:
{
"status": "healthy",
"database": "connected",
"ollama": "connected",
"version": "0.1.0"
}
Usage Examples
1. Index a Repository (Background Job)
# Start indexing job via MCP protocol
{
"tool": "start_indexing_background",
"arguments": {
"repo_path": "/path/to/your/repo"
}
}
# Immediate response
{
"job_id": "uuid-here",
"status": "pending",
"message": "Indexing job started",
"project_id": "default",
"database_name": "cb_proj_default_00000000"
}
# Poll for status
{
"tool": "get_indexing_status",
"arguments": {
"job_id": "uuid-here"
}
}
# Completed response
{
"job_id": "uuid-here",
"status": "completed",
"repo_path": "/path/to/your/repo",
"files_indexed": 150,
"chunks_created": 1200,
"error_message": null,
"created_at": "2025-10-18T10:30:00Z",
"started_at": "2025-10-18T10:30:01Z",
"completed_at": "2025-10-18T10:30:45Z"
}
2. Search Code
# Search for authentication logic
{
"tool": "search_code",
"arguments": {
"query": "user authentication password validation",
"limit": 10,
"file_type": "py"
}
}
# Response includes ranked code chunks with context
{
"results": [...],
"total_count": 25,
"latency_ms": 230
}
Architecture
┌─────────────────────────────────────────────────┐
│ MCP Client (AI) │
└─────────────────┬───────────────────────────────┘
│ SSE Protocol
┌─────────────────▼───────────────────────────────┐
│ MCP Server Layer │
│ ┌──────────────────────────────────────────┐ │
│ │ Tool Registration & Routing │ │
│ └──────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ Request/Response Handling │ │
│ └──────────────────────────────────────────┘ │
└─────────────────┬───────────────────────────────┘
│
┌─────────────────▼───────────────────────────────┐
│ Service Layer │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Indexer │ │ Searcher │ │Task Manager│ │
│ └──────┬─────┘ └──────┬─────┘ └──────┬─────┘ │
│ │ │ │ │
│ ┌──────▼──────────────▼──────────────▼──────┐ │
│ │ Repository Service │ │
│ └──────┬─────────────────────────────────────┘ │
│ │ │
│ ┌──────▼─────────────────────────────────────┐ │
│ │ Embedding Service (Ollama) │ │
│ └─────────────────────────────────────────────┘│
└─────────────────┬───────────────────────────────┘
│
┌─────────────────▼───────────────────────────────┐
│ Data Layer │
│ ┌──────────────────────────────────────────┐ │
│ │ PostgreSQL with pgvector │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │Repository│ │ Files │ │ Chunks │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ ┌──────────┐ ┌──────────────────────┐ │ │
│ │ │ Tasks │ │ Vector Embeddings │ │ │
│ │ └──────────┘ └──────────────────────┘ │ │
│ └──────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
Component Overview
- MCP Layer: Handles protocol compliance, tool registration, SSE transport
- Service Layer: Business logic for indexing, searching, task management
- Repository Service: File system operations, git integration, .gitignore handling
- Embedding Service: Ollama integration for generating text embeddings
- Data Layer: PostgreSQL with pgvector for storage and similarity search
Data Flow
- Indexing: Repository → Parse → Chunk → Embed → Store
- Searching: Query → Embed → Vector Search → Rank → Return
- Task Tracking: Create → Update → Git Integration → Query
Testing
Run All Tests
# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=term-missing
# Run specific test categories
pytest tests/unit/ -v # Unit tests only
pytest tests/integration/ -v # Integration tests
pytest tests/contract/ -v # Contract tests
Test Categories
- Unit Tests: Fast, isolated component tests
- Integration Tests: Database and service integration
- Contract Tests: MCP protocol compliance validation
- Performance Tests: Latency and throughput benchmarks
Coverage Requirements
- Minimum coverage: 95%
- Critical paths: 100%
- View HTML report:
open htmlcov/index.html
Performance Tuning
Database Optimization
-- Optimize vector searches
CREATE INDEX ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Adjust work_mem for large result sets
ALTER SYSTEM SET work_mem = '256MB';
SELECT pg_reload_conf();
Connection Pool Settings
# In .env
DATABASE_POOL_SIZE=20 # Connection pool size
DATABASE_MAX_OVERFLOW=10 # Max overflow connections
DATABASE_POOL_TIMEOUT=30 # Connection timeout in seconds
Embedding Batch Size
# Adjust based on available memory
EMBEDDING_BATCH_SIZE=100 # For systems with 8GB+ RAM
EMBEDDING_BATCH_SIZE=50 # Default for 4GB RAM
EMBEDDING_BATCH_SIZE=25 # For constrained environments
Troubleshooting
Common Issues
-
Database Connection Failed
- Check PostgreSQL is running:
pg_ctl status - Verify DATABASE_URL in .env
- Ensure database exists:
psql -U postgres -l
- Check PostgreSQL is running:
-
Ollama Connection Error
- Check Ollama is running:
curl http://localhost:11434/api/tags - Verify model is installed:
ollama list - Check OLLAMA_BASE_URL in .env
- Check Ollama is running:
-
Slow Performance
- Check database indexes:
\diin psql - Monitor query performance: See logs at LOG_FILE path
- Adjust batch sizes and connection pool
- Check database indexes:
For detailed troubleshooting, see the Configuration Guide troubleshooting section.
Contributing
We follow a specification-driven development workflow using the Specify framework.
Development Workflow
- Feature Specification: Use
/specifycommand to create feature specs - Planning: Generate implementation plan with
/plan - Task Breakdown: Create tasks with
/tasks - Implementation: Execute tasks with
/implement
Git Workflow
# Create feature branch
git checkout -b 001-feature-name
# Make atomic commits
git add .
git commit -m "feat(component): add specific feature"
# Push and create PR
git push origin 001-feature-name
Code Quality Standards
- Type Safety:
mypy --strictmust pass - Linting:
ruff checkwith no errors - Testing: All tests must pass with 95%+ coverage
- Documentation: Update relevant docs with changes
Constitutional Principles
- Simplicity Over Features: Focus on core semantic search
- Local-First Architecture: No cloud dependencies
- Protocol Compliance: Strict MCP adherence
- Performance Guarantees: Meet stated benchmarks
- Production Quality: Comprehensive error handling
See for full principles.
FastMCP Migration (Oct 2025)
Migration Complete: The server has been successfully migrated from the legacy MCP SDK to the modern FastMCP framework.
What Changed
Before (MCP SDK):
# Old: Manual tool registration with JSON schemas
class MCPServer:
def __init__(self):
self.tools = {
"search_code": {
"name": "search_code",
"description": "...",
"inputSchema": {...}
}
}
After (FastMCP):
# New: Decorator-based tool definitions
@mcp.tool()
async def search_code(query: str, limit: int = 10) -> dict[str, Any]:
"""Semantic code search with natural language queries."""
# Implementation
Key Benefits
- Simpler Tool Definitions: Decorators replace manual JSON schema creation
- Type Safety: Automatic schema generation from Pydantic models
- Dual Logging: File logging + MCP protocol without stdout pollution
- Better Error Handling: Structured error responses with context
- Cleaner Architecture: Separation of tool interface from business logic
Server Files
- New Entry Point:
server_fastmcp.py(root directory) - Legacy Server:
src/mcp/mcp_stdio_server_v3.py(deprecated, will be removed) - Tool Handlers:
src/mcp/tools/*.py(unchanged, reused by FastMCP) - Services:
src/services/*.py(unchanged, business logic intact)
Configuration Update Required
Update your Claude Desktop config to use the new server:
{
"mcpServers": {
"codebase-mcp": {
"command": "uv",
"args": ["run", "--with", "fastmcp", "python", "/path/to/server_fastmcp.py"]
}
}
}
Migration Notes
- All 6 MCP tools remain functional (100% backward compatible)
- No database schema changes required
- Tool signatures and responses unchanged
- Logging now goes exclusively to
/tmp/codebase-mcp.log - All tests pass with FastMCP implementation
Performance
FastMCP maintains performance targets:
- Repository indexing: <60 seconds for 10K files
- Code search: <500ms p95 latency
- Async/await throughout for optimal concurrency
License
MIT License (LICENSE file pending).
Support
- Issues: GitHub Issues
- Documentation:
- Logs: Check
/tmp/codebase-mcp.logfor detailed debugging
Quick Start
Basic Usage (Default Project)
For most users, the default project workspace is sufficient. All indexing now uses background jobs to prevent MCP client timeouts:
# Start background indexing job (returns immediately)
job = await start_indexing_background(repo_path="/path/to/your/repo")
job_id = job["job_id"]
# Poll for completion
while True:
status = await get_indexing_status(job_id=job_id)
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Check result
if status["status"] == "completed":
print(f"✅ Indexed {status['files_indexed']} files, {status['chunks_created']} chunks")
else:
print(f"❌ Indexing failed: {status['error_message']}")
# Search code
results = await search_code(query="function to handle authentication")
# Search with filters
results = await search_code(
query="database query",
file_type="py",
limit=20
)
The server automatically uses a default project workspace (project_default) if no project ID is specified.
Multi-Project Usage
For users managing multiple codebases or client projects, use the project_id parameter to isolate repositories:
# Index repositories with project_id
job_a = await start_indexing_background(
repo_path="/path/to/client-a-repo",
project_id="client-a"
)
job_b = await start_indexing_background(
repo_path="/path/to/client-b-repo",
project_id="client-b"
)
# Poll both jobs
for job in [job_a, job_b]:
while True:
status = await get_indexing_status(job_id=job["job_id"])
if status["status"] in ["completed", "failed"]:
break
await asyncio.sleep(2)
# Search within specific project
results_a = await search_code(
query="authentication logic",
project_id="client-a"
)
results_b = await search_code(
query="payment processing",
project_id="client-b"
)
Each project has its own isolated database schema, ensuring repositories and embeddings are completely separated.
workflow-mcp Integration (Optional)
The Codebase MCP Server can optionally integrate with workflow-mcp for automatic project context resolution. This is an advanced feature and not required for basic usage.
Standalone Usage (Default)
By default, Codebase MCP operates independently:
# Works out of the box without workflow-mcp
job = await start_indexing_background(repo_path="/path/to/repo")
results = await search_code(query="search query")
Integration with workflow-mcp
If you're using workflow-mcp to manage development projects, Codebase MCP can automatically resolve project context:
# Set workflow-mcp URL in environment
export WORKFLOW_MCP_URL=http://localhost:8001
# Now project_id is automatically resolved from workflow-mcp's active project
job = await start_indexing_background(repo_path="/path/to/repo") # Uses active project
results = await search_code(query="search query") # Searches in active project's context
How It Works:
- Codebase MCP queries workflow-mcp for the active project
- If an active project exists, it's used as the
project_id - If no active project or workflow-mcp is unavailable, falls back to default project
- You can still override with
--project-idflag
Configuration:
# In .env file
WORKFLOW_MCP_URL=http://localhost:8001 # Optional, enables integration
See Also: workflow-mcp repository for details on project workspace management.
Documentation
Comprehensive documentation is available for different use cases:
- - Upgrading from v1.x to v2.x with multi-project support
- - Production deployment and tuning
- - System design and multi-project isolation
- - Complete MCP tool documentation
- - Canonical terminology definitions
For quick setup, refer to the Installation section above.
Contributing
We welcome contributions to the Codebase MCP Server. This project follows a specification-driven development workflow.
Getting Started
- Read the Architecture: Start with to understand the system design
- Review the Constitution: See for project principles
- Follow the Workflow: Use the Specify workflow documented in
Development Process
- Create a feature specification using
/specifycommand - Plan the implementation with
/plan - Generate tasks using
/tasks - Implement incrementally with atomic commits
Code Standards
- Type Safety: Full mypy --strict compliance
- Testing: 95%+ test coverage, contract tests for MCP protocol
- Performance: Meet benchmarks (60s indexing, 500ms search p95)
- Documentation: Update docs with all changes
Code of Conduct
This project adheres to a code of conduct that promotes a welcoming, inclusive environment. We expect:
- Respectful communication in issues and PRs
- Constructive feedback focused on code and ideas
- Recognition that contributors volunteer their time
- Patience with maintainers and fellow contributors
By participating, you agree to uphold these standards.