YevhenHryhorenko/mcp-codebase-analyser
If you are the rightful owner of mcp-codebase-analyser and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Codebase Analyser is an intelligent server designed to analyze GitHub repositories and provide AI-powered recommendations for identifying relevant code sections for specific features.
🔍 MCP Codebase Analyser
An intelligent Model Context Protocol (MCP) server that analyzes GitHub repositories and provides AI-powered recommendations for finding the right code sections for your features.
🎯 What Does It Do?
This MCP server allows you to ask questions like:
"I need to create a product fetch popup modal. Find the best section from jolly-commerce/jolly-sections."
The server will:
- Fetch & analyze the repository (clones and caches locally)
- Parse source code files (JS/TS only, skips minified bundles)
- Extract functions, classes, and components with full code
- Index sections using semantic embeddings (OpenAI or Ollama)
- Search semantically for relevant code
- Provide AI recommendations with reasoning and implementation advice
✨ Features
- 🔎 Semantic Code Search - Find code by meaning, not just keywords (0.7+ relevance scores)
- 🤖 AI-Powered Recommendations - Get intelligent suggestions with high confidence
- 📁 Smart Filtering - Automatically skips minified files (.min.js, .chunk.js, bundles)
- 💻 JavaScript/TypeScript Focus - Optimized for modern web development
- 💾 Persistent Cache - Repositories and embeddings cached for instant repeated queries
- 🐳 Docker Ready - Production-ready containerized deployment
- 🔌 MCP Compatible - Works seamlessly with Cursor IDE and other MCP clients
- ⚡ Production Tested - Successfully indexes jolly-commerce/jolly-sections (152 sections, 100% success rate)
🚀 Quick Start
Prerequisites: Docker Desktop, OpenAI API Key
1. Setup Environment
cp env.example .env
# Edit .env and add your OpenAI API key:
# LLM_API_KEY=sk-your-key-here
2. Start Server
docker-compose up -d
docker-compose logs -f # Wait for "All components initialized"
3. Configure Cursor
Add to Cursor MCP settings (Cmd/Ctrl + , → search "MCP"):
{
"mcpServers": {
"codebase-analyser": {
"url": "http://localhost:8050/sse",
"transport": {"type": "sse"}
}
}
}
Restart Cursor.
4. Test It
In Cursor, try:
Analyze the repository jolly-commerce/jolly-sections
Wait 2-3 minutes, then:
I need to create a product fetch popup modal. Find the best section.
✅ Expected: FetchModalProductPopUp class with code snippet
Alternative: Local Development (without Docker)
pip install uv
uv pip install -e .
# Edit .env with your API key
uv run src/main.py
📖 Usage Examples
1. Analyze a Repository
First, analyze the repository you want to search:
User: Analyze the repository jolly-commerce/jolly-sections
AI uses: analyze_repository("jolly-commerce/jolly-sections")
Result:
{
"success": true,
"message": "Successfully analyzed jolly-commerce/jolly-sections",
"indexing": {
"indexed": 152,
"errors": 0,
"total": 152
},
"files_analyzed": 43,
"section_types": {
"functions": 37,
"classes": 74,
"react_component": 32
}
}
2. Find the Best Section for Your Feature
User: I need to create a product fetch popup modal.
Find the best section from jolly-commerce/jolly-sections.
AI uses: find_best_section(
feature_description="product fetch popup modal",
repo_identifier="jolly-commerce/jolly-sections"
)
Real Response:
{
"found_match": true,
"best_match": "src/templates/components/g-modal.js",
"best_match_name": "FetchModalProductPopUp",
"confidence": "high",
"score": 0.69,
"reasoning": "The FetchModalProductPopUp class is specifically designed to handle product fetching in a modal context, which aligns perfectly with the user's request...",
"usage_advice": "To use this section, instantiate the FetchModalProductPopUp class and call the onTrigger method with the appropriate event and trigger element...",
"alternatives": ["DynamicModal", "GModal"],
"code_snippet": "class FetchModalProductPopUp extends FetchModal { ... }"
}
3. Search for Specific Code Patterns
User: Find cart drawer components in jolly-commerce/jolly-sections
AI uses: search_code_sections(
query="cart drawer shopping cart sidebar",
repo_identifier="jolly-commerce/jolly-sections",
limit=10
)
Real Response:
{
"success": true,
"results_count": 3,
"results": [
{
"file": "src/templates/components/g-cart.js",
"name": "CartDrawer",
"type": "class",
"score": 0.72,
"code": "class CartDrawer extends Cart { ... }"
},
{
"file": "src/templates/components/g-cart.js",
"name": "CartDrawerItem",
"type": "class",
"score": 0.70
}
]
}
4. Get Repository Structure
User: What sections are available in jolly-sections?
Tool: get_repository_structure("jolly-sections/jolly-sections")
5. List All Analyzed Repositories
User: What repositories have been analyzed?
Tool: list_analyzed_repositories()
🛠️ Available MCP Tools
| Tool | Description | Example |
|---|---|---|
analyze_repository | Index a GitHub repository | analyze_repository("owner/repo") |
find_best_section | Get AI recommendation for feature | find_best_section("hero section", "owner/repo") |
search_code_sections | Semantic code search | search_code_sections("button component", "owner/repo") |
get_repository_structure | View indexed sections | get_repository_structure("owner/repo") |
list_analyzed_repositories | List all indexed repos | list_analyzed_repositories() |
clear_repository_cache | Clear cached data | clear_repository_cache("owner/repo") |
🏗️ Architecture
┌─────────────────────────────────────────────────────────────┐
│ Cursor IDE │
│ (MCP Client) │
└────────────────────────┬────────────────────────────────────┘
│ MCP Protocol (SSE/STDIO)
↓
┌─────────────────────────────────────────────────────────────┐
│ MCP Server (FastMCP) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Repository Fetcher │ │
│ │ • Clone GitHub repos │ │
│ │ • Cache locally │ │
│ │ • Handle authentication │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Code Parser │ │
│ │ • Parse JS/TS/Python/etc. │ │
│ │ • Extract functions/classes/components │ │
│ │ • Generate metadata │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Embedding System │ │
│ │ • Generate embeddings (OpenAI/Ollama) │ │
│ │ • ChromaDB vector database │ │
│ │ • Semantic search │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Recommendation System │ │
│ │ • LLM analysis (GPT-4/Ollama) │ │
│ │ • Context-aware suggestions │ │
│ │ • Implementation advice │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ │ │
↓ ↓ ↓
repo_cache/ chroma_db/ GitHub API
📁 Project Structure
mcp-codebase-analyser/
├── src/
│ ├── main.py # MCP server & tools
│ ├── repo_fetcher.py # GitHub repository fetching
│ ├── code_parser.py # Code parsing & extraction
│ ├── embedding_system.py # Vector embeddings & search
│ ├── recommendation_system.py # AI recommendations
│ └── utils.py # Utility functions
├── repo_cache/ # Cached repositories
├── chroma_db/ # Vector database
├── Dockerfile # Docker image
├── docker-compose.yml # Docker compose config
├── pyproject.toml # Python dependencies
├── env.example # Environment variables template
└── README.md # This file
🔑 Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
HOST | Server host | 0.0.0.0 | No |
PORT | Server port | 8050 | No |
TRANSPORT | Transport type (sse or stdio) | sse | No |
LLM_PROVIDER | LLM provider (openai, ollama) | openai | Yes |
LLM_API_KEY | API key for LLM | - | Yes (OpenAI) |
LLM_MODEL | Model name | gpt-4o-mini | No |
EMBEDDING_MODEL | Embedding model | text-embedding-3-small | No |
GITHUB_TOKEN | GitHub personal access token | - | No |
REPO_CACHE_DIR | Repository cache directory | ./repo_cache | No |
CHROMA_PERSIST_DIR | ChromaDB directory | ./chroma_db | No |
MAX_FILE_SIZE_KB | Max file size to parse | 500 | No |
SUPPORTED_EXTENSIONS | File extensions to parse | .js,.jsx,.ts,.tsx | No |
Note: The system automatically skips:
- Minified files (
.min.js) - Bundle files (
.chunk.js,bundle-*.js) - Assets directory compiled code
- This ensures only source code is indexed for better quality results
🧪 Testing
Test the server with the production-tested repository:
# 1. Start the server
docker-compose up -d
# 2. Check logs for success messages
docker-compose logs -f
# Look for: "Initialized embedding system with openai provider"
# Look for: "All components initialized successfully"
# 3. In Cursor, try these verified examples:
Test 1: Analyze Repository (2-3 minutes)
Analyze the repository jolly-commerce/jolly-sections
Expected: 152 sections indexed, 0 errors
Test 2: Find Specific Component (10-15 seconds)
I need to create a product fetch popup modal. Find the best section.
Expected: FetchModalProductPopUp class with high confidence (0.69+)
Test 3: Search Components (5-10 seconds)
Find cart drawer components in jolly-commerce/jolly-sections
Expected: CartDrawer, CartDrawerItem with relevance scores
🐛 Troubleshooting
Server won't start
- Check Docker Desktop is running
- Verify
.envfile exists and has correct values - Check logs:
docker-compose logs -f
Can't clone private repositories
- Add
GITHUB_TOKENto.env - Ensure token has
reposcope
Out of memory / slow indexing
- Reduce
MAX_FILE_SIZE_KBin.env - Limit
SUPPORTED_EXTENSIONSto only needed types - Increase Docker memory limits
Embeddings API errors
- Verify
LLM_API_KEYis correct - Check OpenAI account has credits
- Try with Ollama for local embeddings
🔄 Using with Ollama (Local LLM)
For completely local setup:
-
Install Ollama: https://ollama.ai
-
Pull models:
ollama pull qwen2.5-coder:7b ollama pull nomic-embed-text -
Update
.env:LLM_PROVIDER=ollama LLM_MODEL=qwen2.5-coder:7b LLM_BASE_URL=http://host.docker.internal:11434 EMBEDDING_MODEL=nomic-embed-text -
Restart server:
docker-compose restart
📝 Development
Adding New File Type Support
Edit src/code_parser.py and add patterns for your language:
PATTERNS = {
"your_language": [
r"pattern_to_match_functions",
r"pattern_to_match_classes",
]
}
Customizing Recommendations
Edit the system prompt in src/recommendation_system.py:
def _get_system_prompt(self) -> str:
return """Your custom prompt here..."""
🤝 Contributing
Contributions welcome! Please feel free to submit issues and pull requests.
📊 Performance Stats
Production Tested on jolly-commerce/jolly-sections:
- ✅ 152 sections indexed (100% success rate)
- ✅ 0 errors during indexing
- ✅ 43 source files analyzed
- ✅ Average search score: 0.7+ (high relevance)
- ✅ No minified files indexed (smart filtering)
- ✅ Query time: 5-15 seconds
- ✅ Initial indexing: 2-3 minutes
Code Quality:
- Only source files indexed (src/templates/)
- Full code content extracted (not just names)
- Semantic search with high accuracy
- AI recommendations with reasoning
📄 License
MIT License - see LICENSE file for details
🙏 Acknowledgments
- Built with FastMCP
- Powered by ChromaDB
- Uses OpenAI Embeddings or Ollama
- Production tested on jolly-commerce/jolly-sections
📞 Support
For issues and questions, please open a GitHub issue.
Happy Coding! 🚀