code-search-mcp by doozMen - MCP Server

code-search-mcp

An MCP server for pure vector-based semantic code search across multiple projects using 384-dimensional BERT embeddings.

Features

Semantic Search: Find code with similar meaning using 384-dimensional BERT vector embeddings
File Context Extraction: Get code snippets with surrounding context
Dependency Analysis: Find files that import or depend on a given file
Multi-Project Support: Index and search across multiple codebases
Language Support: Swift, Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, C#, Ruby, PHP, Kotlin
Legacy File Encoding: Automatic fallback for non-UTF-8 files (ISO-8859-1, Windows-1252, ASCII)
Smart Caching: Cache embeddings to avoid recomputation
Pure Vector Search: No regex patterns, no keyword matching - 100% semantic understanding

Requirements

macOS 15.0+
Swift 6.0+
Python 3.8+ with pip
Xcode 16.0+ (for development)

Installation

Option 1: From PromptPing Marketplace (Recommended)

# Add marketplace
/plugin marketplace add /Users/stijnwillems/Developer/promptping-marketplace

# Install plugin
/plugin install code-search-mcp

# Restart Claude Code

Option 2: From Source

git clone https://github.com/doozMen/code-search-mcp.git
cd code-search-mcp

# Install Python dependencies (required for BERT embeddings)
./Scripts/install_python_deps.sh

# Build and install
./install.sh

Option 3: Manual Build

# Install Python dependencies first
./Scripts/install_python_deps.sh

# Build and install
swift build -c release
swift package experimental-install

Configuration

For Marketplace Installation (Option 1)

The plugin is automatically configured when installed from the marketplace. Ensure your ~/.claude/settings.json includes the PATH:

{
  "env": {
    "PATH": "/Users/<YOUR_USERNAME>/.swiftpm/bin:/usr/local/bin:/usr/bin:/bin"
  }
}

For Manual Installation (Options 2 & 3)

Add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "code-search-mcp": {
      "command": "code-search-mcp",
      "args": ["--log-level", "info"],
      "env": {
        "PATH": "$HOME/.swiftpm/bin:/usr/local/bin:/usr/bin:/bin"
      }
    }
  }
}

Automatic Re-Indexing Setup

Set up automatic re-indexing using git hooks and direnv:

# Navigate to your project
cd ~/Developer/MyApp

# Run setup command (creates .envrc and .githooks/)
code-search-mcp setup-hooks --install-hooks

# Allow direnv (if you have it installed)
direnv allow

What it creates:

.envrc - Triggers re-indexing when entering the directory
.githooks/ - Re-indexes automatically on commit, merge, and branch switch
Git config for hooks - Configures core.hooksPath to use .githooks/

Options:

# Setup without direnv
code-search-mcp setup-hooks --no-direnv

# Setup without git hooks
code-search-mcp setup-hooks --no-git-hooks

# Setup for different project
code-search-mcp setup-hooks --project-path ~/my-other-project

Environment Variables:

CODE_SEARCH_PROJECT_NAME: Auto-filter searches to this project
CODE_SEARCH_PROJECTS: Colon-separated list of project paths to index (see Multi-Project Configuration below)
Explicit projectFilter parameter overrides environment

Benefits:

Index stays up-to-date automatically
No manual re-indexing needed
Works seamlessly with multi-project workflows
Background re-indexing won't block your work

Multi-Project Configuration

For indexing multiple projects simultaneously, configure CODE_SEARCH_PROJECTS in your MCP server config:

{
  "mcpServers": {
    "code-search-mcp": {
      "command": "code-search-mcp",
      "args": ["--log-level", "info"],
      "env": {
        "CODE_SEARCH_PROJECTS": "/Users/you/MyApp:/Users/you/MyLibrary:/Users/you/MyService"
      }
    }
  }
}

Important: Use colon (:) to separate paths, not commas or semicolons.

Recommended: Index each project separately rather than a parent directory containing multiple projects.

❌ Bad: /Users/you/Developer (indexes everything, 87k+ files) ✅ Good: /Users/you/Developer/MyApp:/Users/you/Developer/MyLib

Common Pitfalls

1. Indexing Parent Directories

Problem: Indexing /Users/you/Developer creates a massive single project with irrelevant search results.

Solution:

Use setup-hooks in each project directory
Or configure CODE_SEARCH_PROJECTS with individual project paths

2. Poor Search Relevance

Problem: Search returns results from unrelated projects in your Developer folder.

Diagnosis: Run /index_status and check if you have one large project (>10k files).

Solution:

# Clear the oversized index
rm -rf ~/.cache/code-search-mcp

# Re-index individual projects
cd ~/Developer/MyApp
code-search-mcp setup-hooks --install-hooks

3. Slow Performance

Problem: Searches take >1 second, high memory usage.

Cause: Searching 100k+ chunks from parent directory indexing.

Solution: Limit indexing to projects you actually work with.

Usage

Tool: semantic_search

Search for code with similar meaning to your query.

Parameters:

query (required): Natural language query or code snippet
maxResults (optional): Maximum results to return (default: 10)
projectFilter (REQUIRED): Specify which project to search in. Use list_projects to see available projects or set CODE_SEARCH_PROJECT_NAME environment variable for default.

Example:

{
  "name": "semantic_search",
  "arguments": {
    "query": "function that validates email addresses",
    "projectFilter": "my-project",
    "maxResults": 5
  }
}

Tool: file_context

Extract code from a file with optional line range. Supports multi-project workflows.

Parameters:

filePath (required): Path to file (relative to project root OR absolute)
projectName (optional): Project name for disambiguation when file exists in multiple projects
startLine (optional): Start line number (1-indexed)
endLine (optional): End line number (1-indexed)
contextLines (optional): Context lines around range (default: 3)

Example:

{
  "name": "file_context",
  "arguments": {
    "filePath": "Sources/Core/Utils.swift",
    "projectName": "my-project",
    "startLine": 42,
    "endLine": 50,
    "contextLines": 5
  }
}

Multi-Project Usage: When working with multiple indexed projects that have files with the same relative path:

{
  "name": "file_context",
  "arguments": {
    "filePath": "src/Models/User.swift",
    "projectName": "ios-app"
  }
}

Tool: find_related

Find files that import, depend on, or are related to a file.

Parameters:

filePath (required): Path to file (relative to project root)
direction (optional): "imports", "imports_from", or "both" (default: "both")

Example:

{
  "name": "find_related",
  "arguments": {
    "filePath": "Sources/Core/Database.swift",
    "direction": "both"
  }
}

Tool: index_status

Get metadata and statistics about indexed projects.

Parameters: None

Example:

{
  "name": "index_status"
}

Architecture

Services

ProjectIndexer: Crawls directories and extracts code chunks
EmbeddingService: Generates and caches 384-d BERT embeddings (via Python bridge)
VectorSearchService: Performs cosine similarity search on vector embeddings
CodeMetadataExtractor: Builds dependency graphs and extracts metadata

Models

CodeChunk: Represents indexed code with location and embedding
SearchResult: Unified search result type
ProjectMetadata: Project information and statistics
DependencyGraph: Inter-file dependency relationships

Storage

Index data stored in ~/.cache/code-search-mcp/:

~/.cache/code-search-mcp/
├── embeddings/          # Cached BERT embeddings (one file per unique text hash)
└── dependencies/        # Dependency graphs (one file per project)

Development

Building

swift build

Testing

swift test

Running with Debug Logging

swift run code-search-mcp --log-level debug

Code Formatting

# Check formatting
swift format lint -s -p -r Sources Tests Package.swift

# Auto-fix formatting
swift format format -p -r -i Sources Tests Package.swift

Implementation Roadmap

Current state: Scaffold complete with services and models defined

Phase 1: Core Infrastructure (In Progress)

Phase 2: Search Capabilities

Phase 3: Optimization

Batch embedding generation
Index compression
Incremental indexing
Performance benchmarking

Phase 4: Enterprise Features

Project-level access control
Search result caching
Custom embedding models
Search analytics

Troubleshooting

Cache Issues

Clear embedding cache:

rm -rf ~/.cache/code-search-mcp/embeddings

Logging

Enable debug logging to troubleshoot:

swift run code-search-mcp --log-level debug

Check Claude Desktop logs:

macOS: ~/Library/Logs/Claude/mcp-server-code-search-mcp.log

Index Not Updating

Rebuild the index:

rm -rf ~/.cache/code-search-mcp
# Re-index projects by restarting Claude

Performance Notes

First embedding generation takes longer (BERT model loading)
Subsequent queries are fast due to embedding caching
Large projects (10k+ files) may take 1-2 minutes to index
Vector search is O(n) but fast for typical project sizes

Limitations

Dependency tracking works for explicit imports only (implicit dependencies not tracked)
No support for cross-language dependency tracking
Vector search is O(n) - performance scales linearly with index size

License

MIT

Contributing

Contributions welcome! Please ensure:

Swift 6 strict concurrency compliance
All types conform to Sendable
Comprehensive error handling
Swift Testing framework for tests
Code formatted with swift-format

Support

For issues or questions:

Check the troubleshooting section
Enable debug logging and check logs
Open an issue on GitHub with:
- Log output (with debug logging enabled)
- Steps to reproduce
- Expected vs actual behavior