Codex7

Shawnzheng011019/Codex7

3.1

If you are the rightful owner of Codex7 and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

An MCP server that provides GitHub trending projects to LLM.

Code Retrieval System - MCP Server

A comprehensive code retrieval system built in Python that combines vector search with graph-based relationships for intelligent code analysis and search, implemented as a Model Context Protocol (MCP) server.

🌟 Features

  • πŸ” Local Codebase Analysis: Scan and index your local projects for intelligent search
  • πŸ“š Multi-Content Support: Analyze code, documentation, configuration files, and more
  • ⚑ Semantic Search: Advanced hybrid search combining vector similarity and BM25
  • πŸ€– MCP Integration: Model Context Protocol server for seamless AI IDE integration
  • 🎯 Multi-language Support: AST Splitter for JavaScript, TypeScript, Python, Go, Rust, Java, C++, and more
  • πŸ“Š Knowledge Graph: Build code dependency graphs for impact analysis
  • πŸ—„οΈ Milvus Vector Database: High-performance vector similarity search
  • πŸ•ΈοΈ Neo4j Graph Database: Rich relationship modeling and querying

πŸ—οΈ Architecture

The system is built entirely in Python with a clean, modular architecture:

Core Components

  • Scanner: Local codebase file system scanner (src/scanner/)
  • Processor: Content chunking and embedding generation (src/processor/)
  • Vector Database: Milvus client for fast similarity search (src/query/)
  • Graph Database: Neo4j client for code relationships (src/graph/)
  • Embedding Service: OpenAI embedding generation optimized for Milvus (src/embedding/)
  • Search Engine: Hybrid search with BM25 and reranking (src/search/)
  • MCP Server: FastMCP-based server for AI tool integration (src/mcp/)

Processing Pipeline

  1. Scan β†’ Discover and categorize local project files
  2. Extract β†’ Parse code and documentation content
  3. Chunk β†’ Intelligent text segmentation with context preservation
  4. Embed β†’ Generate semantic embeddings using OpenAI
  5. Index β†’ Store in Milvus vector database and Neo4j graph database
  6. Search β†’ Hybrid search with reranking for optimal results

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Docker (for Milvus and Neo4j)
  • OpenAI API key for embeddings

Installation

  1. Clone the repository:
git clone <repository-url>
cd code-retrieval-system
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
cp .env.example .env
# Edit .env with your OpenAI API key and database settings
  1. Start databases with Docker:
docker-compose up -d
  1. Run the MCP server:
# Using stdio transport (default)
python main.py --stdio

# Using SSE transport
python main.py --sse --port 8000

Configuration

Edit the .env file to configure:

  • Database Settings: Milvus and Neo4j connection parameters
  • OpenAI Settings: API key and embedding model
  • Search Parameters: BM25 weights, reranking thresholds
  • File Processing: Chunk sizes, supported extensions

πŸ“– MCP Tools

The system provides the following MCP tools:

Core Tools

  • index_codebase - Index a codebase for search
  • search_code - Hybrid search with reranking
  • search_in_file - Search within specific files
  • clear_database - Clear all data from databases

Graph Analysis Tools

  • get_function_dependencies - Get function dependency graph
  • get_class_hierarchy - Get class inheritance hierarchy
  • get_file_structure - Get file structure analysis

System Tools

  • get_system_stats - Get system statistics and health

Example Usage

Index a Codebase
# Using MCP client
result = await client.call_tool("index_codebase", {
    "root_path": "/path/to/your/code",
    "max_workers": 4
})
Search Code
# Using MCP client
result = await client.call_tool("search_code", {
    "query": "how to implement authentication",
    "top_k": 10,
    "use_graph": true,
    "use_reranking": true
})
Get Function Dependencies
# Using MCP client
result = await client.call_tool("get_function_dependencies", {
    "function_name": "user.authenticate"
})
Search in File
# Using MCP client
result = await client.call_tool("search_in_file", {
    "file_path": "src/main.py",
    "query": "database connection"
})

πŸ”§ Advanced Configuration

OpenAI Embeddings

# Required for vector search
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=text-embedding-ada-002

Search Parameters

  • BM25_K1: BM25 parameter for term frequency saturation (default: 1.2)
  • BM25_B: BM25 parameter for document length normalization (default: 0.75)
  • TOP_K_RESULTS: Number of results to return (default: 10)
  • RERANK_THRESHOLD: Threshold for graph-based reranking (default: 0.5)

πŸ› οΈ Development

Project Structure

code-retrieval-system/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config.py              # Configuration management
β”‚   β”œβ”€β”€ types.py               # Data models and types
β”‚   β”œβ”€β”€ scanner/               # File system scanning
β”‚   β”œβ”€β”€ processor/             # Content processing and chunking
β”‚   β”œβ”€β”€ query/                 # Vector database (Milvus)
β”‚   β”œβ”€β”€ graph/                 # Graph database (Neo4j)
β”‚   β”œβ”€β”€ embedding/             # OpenAI embedding service
β”‚   β”œβ”€β”€ search/                # Search and reranking
β”‚   β”œβ”€β”€ mcp/                   # FastMCP server
β”‚   └── utils/                 # Utilities and logging
β”œβ”€β”€ main.py                    # MCP server entry point
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ docker-compose.yml         # Database setup
β”œβ”€β”€ .env.example              # Environment template
└── README.md                 # This file

Testing

Run tests with:

pytest tests/

Code Quality

# Format code
black src/

# Lint code
flake8 src/

# Type checking
mypy src/

πŸ“Š Performance

The system is designed for performance with:

  • Parallel Processing: Multi-threaded file scanning and processing
  • Efficient Indexing: Optimized chunking and OpenAI embedding generation
  • Fast Search: Milvus vector similarity search with BM25 fallback
  • Graph Acceleration: Neo4j for relationship queries
  • Caching: Intelligent caching for frequently accessed data

πŸ” Search Flow

The search process follows this sophisticated flow:

  1. Vector Search: Semantic similarity using OpenAI embeddings in Milvus
  2. BM25 Search: Keyword-based exact matching
  3. Hybrid Combination: Weighted combination of both approaches
  4. Graph Enhancement: Enrich results with code relationships
  5. Reranking: Reorder results based on graph context
  6. Final Ranking: Produce optimally ordered results

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Milvus for vector database capabilities
  • Neo4j for graph database functionality
  • OpenAI for embedding services
  • FastMCP for MCP server framework
  • The open-source community for various tools and libraries