Shawnzheng011019/Codex7
If you are the rightful owner of Codex7 and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
An MCP server that provides GitHub trending projects to LLM.
Code Retrieval System - MCP Server
A comprehensive code retrieval system built in Python that combines vector search with graph-based relationships for intelligent code analysis and search, implemented as a Model Context Protocol (MCP) server.
π Features
- π Local Codebase Analysis: Scan and index your local projects for intelligent search
- π Multi-Content Support: Analyze code, documentation, configuration files, and more
- β‘ Semantic Search: Advanced hybrid search combining vector similarity and BM25
- π€ MCP Integration: Model Context Protocol server for seamless AI IDE integration
- π― Multi-language Support: AST Splitter for JavaScript, TypeScript, Python, Go, Rust, Java, C++, and more
- π Knowledge Graph: Build code dependency graphs for impact analysis
- ποΈ Milvus Vector Database: High-performance vector similarity search
- πΈοΈ Neo4j Graph Database: Rich relationship modeling and querying
ποΈ Architecture
The system is built entirely in Python with a clean, modular architecture:
Core Components
- Scanner: Local codebase file system scanner (
src/scanner/
) - Processor: Content chunking and embedding generation (
src/processor/
) - Vector Database: Milvus client for fast similarity search (
src/query/
) - Graph Database: Neo4j client for code relationships (
src/graph/
) - Embedding Service: OpenAI embedding generation optimized for Milvus (
src/embedding/
) - Search Engine: Hybrid search with BM25 and reranking (
src/search/
) - MCP Server: FastMCP-based server for AI tool integration (
src/mcp/
)
Processing Pipeline
- Scan β Discover and categorize local project files
- Extract β Parse code and documentation content
- Chunk β Intelligent text segmentation with context preservation
- Embed β Generate semantic embeddings using OpenAI
- Index β Store in Milvus vector database and Neo4j graph database
- Search β Hybrid search with reranking for optimal results
π Quick Start
Prerequisites
- Python 3.10+
- Docker (for Milvus and Neo4j)
- OpenAI API key for embeddings
Installation
- Clone the repository:
git clone <repository-url>
cd code-retrieval-system
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.example .env
# Edit .env with your OpenAI API key and database settings
- Start databases with Docker:
docker-compose up -d
- Run the MCP server:
# Using stdio transport (default)
python main.py --stdio
# Using SSE transport
python main.py --sse --port 8000
Configuration
Edit the .env
file to configure:
- Database Settings: Milvus and Neo4j connection parameters
- OpenAI Settings: API key and embedding model
- Search Parameters: BM25 weights, reranking thresholds
- File Processing: Chunk sizes, supported extensions
π MCP Tools
The system provides the following MCP tools:
Core Tools
index_codebase
- Index a codebase for searchsearch_code
- Hybrid search with rerankingsearch_in_file
- Search within specific filesclear_database
- Clear all data from databases
Graph Analysis Tools
get_function_dependencies
- Get function dependency graphget_class_hierarchy
- Get class inheritance hierarchyget_file_structure
- Get file structure analysis
System Tools
get_system_stats
- Get system statistics and health
Example Usage
Index a Codebase
# Using MCP client
result = await client.call_tool("index_codebase", {
"root_path": "/path/to/your/code",
"max_workers": 4
})
Search Code
# Using MCP client
result = await client.call_tool("search_code", {
"query": "how to implement authentication",
"top_k": 10,
"use_graph": true,
"use_reranking": true
})
Get Function Dependencies
# Using MCP client
result = await client.call_tool("get_function_dependencies", {
"function_name": "user.authenticate"
})
Search in File
# Using MCP client
result = await client.call_tool("search_in_file", {
"file_path": "src/main.py",
"query": "database connection"
})
π§ Advanced Configuration
OpenAI Embeddings
# Required for vector search
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=text-embedding-ada-002
Search Parameters
BM25_K1
: BM25 parameter for term frequency saturation (default: 1.2)BM25_B
: BM25 parameter for document length normalization (default: 0.75)TOP_K_RESULTS
: Number of results to return (default: 10)RERANK_THRESHOLD
: Threshold for graph-based reranking (default: 0.5)
π οΈ Development
Project Structure
code-retrieval-system/
βββ src/
β βββ config.py # Configuration management
β βββ types.py # Data models and types
β βββ scanner/ # File system scanning
β βββ processor/ # Content processing and chunking
β βββ query/ # Vector database (Milvus)
β βββ graph/ # Graph database (Neo4j)
β βββ embedding/ # OpenAI embedding service
β βββ search/ # Search and reranking
β βββ mcp/ # FastMCP server
β βββ utils/ # Utilities and logging
βββ main.py # MCP server entry point
βββ requirements.txt # Python dependencies
βββ docker-compose.yml # Database setup
βββ .env.example # Environment template
βββ README.md # This file
Testing
Run tests with:
pytest tests/
Code Quality
# Format code
black src/
# Lint code
flake8 src/
# Type checking
mypy src/
π Performance
The system is designed for performance with:
- Parallel Processing: Multi-threaded file scanning and processing
- Efficient Indexing: Optimized chunking and OpenAI embedding generation
- Fast Search: Milvus vector similarity search with BM25 fallback
- Graph Acceleration: Neo4j for relationship queries
- Caching: Intelligent caching for frequently accessed data
π Search Flow
The search process follows this sophisticated flow:
- Vector Search: Semantic similarity using OpenAI embeddings in Milvus
- BM25 Search: Keyword-based exact matching
- Hybrid Combination: Weighted combination of both approaches
- Graph Enhancement: Enrich results with code relationships
- Reranking: Reorder results based on graph context
- Final Ranking: Produce optimally ordered results
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Milvus for vector database capabilities
- Neo4j for graph database functionality
- OpenAI for embedding services
- FastMCP for MCP server framework
- The open-source community for various tools and libraries