mat1312/mcp-server-datapizza
If you are the rightful owner of mcp-server-datapizza and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
DataPizza MCP Server provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.
DataPizza MCP Server 🍕
A Model Context Protocol (MCP) server that provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.
Overview
This MCP server enables AI assistants and applications to query the comprehensive datapizza-ai documentation using natural language queries. It indexes documentation from the datapizza-ai repository and provides contextual, relevant responses through a RAG (Retrieval-Augmented Generation) pipeline.
Features
- Intelligent Documentation Search: Natural language queries across datapizza-ai documentation
- Vector-Based Retrieval: Uses OpenAI embeddings and Qdrant vector database for semantic search
- MCP Protocol Compliance: Standard Model Context Protocol implementation for broad compatibility
- Automatic Indexing: Downloads and indexes documentation from GitHub automatically
- Cloud-Ready: Supports Qdrant Cloud for scalable vector storage
- Configurable: Environment-based configuration for flexible deployment
Architecture
The server consists of four main components:
- MCP Server: FastMCP-based server exposing the
query_datapizzatool - Indexer: Downloads and processes datapizza-ai documentation into searchable chunks
- Retriever: RAG engine for semantic search and response generation
- Configuration: Environment-based settings management with validation
Prerequisites
- Python 3.10 or higher
- OpenAI API key
- Qdrant Cloud account and API key
- Internet connection for documentation indexing
Installation
- Clone the repository:
git clone https://github.com/datapizza-labs/mcp_server_datapizza.git
cd datapizza-mcp-server
- Navigate to the package directory:
cd datapizza-mcp-server
- Install the package with development dependencies:
pip install -e ".[dev]"
Configuration
Create a .env file in the datapizza-mcp-server directory with the following variables:
# Required Configuration
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=your_qdrant_cloud_url
QDRANT_API_KEY=your_qdrant_api_key
# Optional Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
COLLECTION_NAME=datapizza_docs
MAX_RESULTS=5
CHUNK_SIZE=1024
CHUNK_OVERLAP=200
LOG_LEVEL=INFO
Required Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key for generating embeddings |
QDRANT_URL | Qdrant Cloud instance URL |
QDRANT_API_KEY | Qdrant Cloud API key |
Optional Environment Variables
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL | text-embedding-3-small | OpenAI embedding model |
EMBEDDING_DIMENSIONS | 1536 | Embedding vector dimensions |
COLLECTION_NAME | datapizza_docs | Qdrant collection name |
MAX_RESULTS | 5 | Maximum search results returned |
CHUNK_SIZE | 1024 | Document chunk size for indexing |
CHUNK_OVERLAP | 200 | Overlap between document chunks |
LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
Usage
1. Index Documentation
Before using the server, index the datapizza-ai documentation:
python -m datapizza_mcp.indexer
To force re-indexing (clears existing data):
python -m datapizza_mcp.indexer --force
2. Start the MCP Server
python -m datapizza_mcp.server
Or use the provided Windows batch script:
../run_datapizza.bat
3. Query the Documentation
The server exposes a query_datapizza tool that can be called by MCP clients:
# Example query
result = await client.call_tool("query_datapizza", {
"query": "come creare un agente con OpenAI",
"max_results": 5
})
MCP Tools and Resources
Tools
query_datapizza: Search datapizza-ai documentationquery(string): Natural language search querymax_results(int, optional): Maximum number of results (default: 5)
Resources
datapizza://status: System status and configuration information
Development
Code Quality Tools
# Format code
black src/
# Lint code
ruff check src/
ruff check src/ --fix # Auto-fix issues
# Type checking
mypy src/
# Run tests
pytest
Project Structure
datapizza-mcp-server/
├── src/datapizza_mcp/
│ ├── __init__.py # Package exports
│ ├── config.py # Configuration management
│ ├── server.py # MCP server implementation
│ ├── indexer.py # Documentation indexing
│ └── retriever.py # RAG retrieval engine
├── pyproject.toml # Package configuration
├── .env # Environment variables
└── README.md # This file
Dependencies
Core Dependencies
- mcp: Model Context Protocol framework
- datapizza-ai-core: Core datapizza-ai functionality
- datapizza-ai-embedders-openai: OpenAI embedding integration
- datapizza-ai-vectorstores-qdrant: Qdrant vector store integration
- openai: OpenAI API client
- qdrant-client: Qdrant database client
- requests: HTTP client for GitHub API
- python-dotenv: Environment variable management
Development Dependencies
- pytest: Testing framework
- black: Code formatter
- ruff: Linter and code style checker
- mypy: Static type checker
Troubleshooting
Common Issues
-
Authentication Errors
- Verify
OPENAI_API_KEYis set correctly - Check Qdrant Cloud credentials (
QDRANT_URLandQDRANT_API_KEY)
- Verify
-
Empty Search Results
- Ensure documentation is indexed:
python -m datapizza_mcp.indexer - Check system status: query the
datapizza://statusresource
- Ensure documentation is indexed:
-
Connection Issues
- Verify internet connectivity for GitHub and Qdrant Cloud access
- Check firewall settings for outbound HTTPS connections
Debugging
Enable debug logging by setting LOG_LEVEL=DEBUG in your .env file.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes following the code style guidelines
- Run the full test suite and code quality checks
- Submit a pull request
License
This project is licensed under the MIT License. See the LICENSE file for details.
Support
For issues and questions:
- GitHub Issues: datapizza-mcp-server/issues
- DataPizza AI Documentation: datapizza-ai