mcp-server-datapizza

mat1312/mcp-server-datapizza

3.4

If you are the rightful owner of mcp-server-datapizza and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

DataPizza MCP Server provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.

Tools
1
Resources
0
Prompts
0

DataPizza MCP Server 🍕

A Model Context Protocol (MCP) server that provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.

Overview

This MCP server enables AI assistants and applications to query the comprehensive datapizza-ai documentation using natural language queries. It indexes documentation from the datapizza-ai repository and provides contextual, relevant responses through a RAG (Retrieval-Augmented Generation) pipeline.

Features

  • Intelligent Documentation Search: Natural language queries across datapizza-ai documentation
  • Vector-Based Retrieval: Uses OpenAI embeddings and Qdrant vector database for semantic search
  • MCP Protocol Compliance: Standard Model Context Protocol implementation for broad compatibility
  • Automatic Indexing: Downloads and indexes documentation from GitHub automatically
  • Cloud-Ready: Supports Qdrant Cloud for scalable vector storage
  • Configurable: Environment-based configuration for flexible deployment

Architecture

The server consists of four main components:

  • MCP Server: FastMCP-based server exposing the query_datapizza tool
  • Indexer: Downloads and processes datapizza-ai documentation into searchable chunks
  • Retriever: RAG engine for semantic search and response generation
  • Configuration: Environment-based settings management with validation

Prerequisites

  • Python 3.10 or higher
  • OpenAI API key
  • Qdrant Cloud account and API key
  • Internet connection for documentation indexing

Installation

  1. Clone the repository:
git clone https://github.com/datapizza-labs/mcp_server_datapizza.git
cd datapizza-mcp-server
  1. Navigate to the package directory:
cd datapizza-mcp-server
  1. Install the package with development dependencies:
pip install -e ".[dev]"

Configuration

Create a .env file in the datapizza-mcp-server directory with the following variables:

# Required Configuration
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=your_qdrant_cloud_url
QDRANT_API_KEY=your_qdrant_api_key

# Optional Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
COLLECTION_NAME=datapizza_docs
MAX_RESULTS=5
CHUNK_SIZE=1024
CHUNK_OVERLAP=200
LOG_LEVEL=INFO

Required Environment Variables

VariableDescription
OPENAI_API_KEYOpenAI API key for generating embeddings
QDRANT_URLQdrant Cloud instance URL
QDRANT_API_KEYQdrant Cloud API key

Optional Environment Variables

VariableDefaultDescription
EMBEDDING_MODELtext-embedding-3-smallOpenAI embedding model
EMBEDDING_DIMENSIONS1536Embedding vector dimensions
COLLECTION_NAMEdatapizza_docsQdrant collection name
MAX_RESULTS5Maximum search results returned
CHUNK_SIZE1024Document chunk size for indexing
CHUNK_OVERLAP200Overlap between document chunks
LOG_LEVELINFOLogging level (DEBUG, INFO, WARNING, ERROR)

Usage

1. Index Documentation

Before using the server, index the datapizza-ai documentation:

python -m datapizza_mcp.indexer

To force re-indexing (clears existing data):

python -m datapizza_mcp.indexer --force

2. Start the MCP Server

python -m datapizza_mcp.server

Or use the provided Windows batch script:

../run_datapizza.bat

3. Query the Documentation

The server exposes a query_datapizza tool that can be called by MCP clients:

# Example query
result = await client.call_tool("query_datapizza", {
    "query": "come creare un agente con OpenAI",
    "max_results": 5
})

MCP Tools and Resources

Tools

  • query_datapizza: Search datapizza-ai documentation
    • query (string): Natural language search query
    • max_results (int, optional): Maximum number of results (default: 5)

Resources

  • datapizza://status: System status and configuration information

Development

Code Quality Tools

# Format code
black src/

# Lint code
ruff check src/
ruff check src/ --fix  # Auto-fix issues

# Type checking
mypy src/

# Run tests
pytest

Project Structure

datapizza-mcp-server/
├── src/datapizza_mcp/
│   ├── __init__.py          # Package exports
│   ├── config.py            # Configuration management
│   ├── server.py            # MCP server implementation
│   ├── indexer.py           # Documentation indexing
│   └── retriever.py         # RAG retrieval engine
├── pyproject.toml           # Package configuration
├── .env                     # Environment variables
└── README.md               # This file

Dependencies

Core Dependencies

  • mcp: Model Context Protocol framework
  • datapizza-ai-core: Core datapizza-ai functionality
  • datapizza-ai-embedders-openai: OpenAI embedding integration
  • datapizza-ai-vectorstores-qdrant: Qdrant vector store integration
  • openai: OpenAI API client
  • qdrant-client: Qdrant database client
  • requests: HTTP client for GitHub API
  • python-dotenv: Environment variable management

Development Dependencies

  • pytest: Testing framework
  • black: Code formatter
  • ruff: Linter and code style checker
  • mypy: Static type checker

Troubleshooting

Common Issues

  1. Authentication Errors

    • Verify OPENAI_API_KEY is set correctly
    • Check Qdrant Cloud credentials (QDRANT_URL and QDRANT_API_KEY)
  2. Empty Search Results

    • Ensure documentation is indexed: python -m datapizza_mcp.indexer
    • Check system status: query the datapizza://status resource
  3. Connection Issues

    • Verify internet connectivity for GitHub and Qdrant Cloud access
    • Check firewall settings for outbound HTTPS connections

Debugging

Enable debug logging by setting LOG_LEVEL=DEBUG in your .env file.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes following the code style guidelines
  4. Run the full test suite and code quality checks
  5. Submit a pull request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For issues and questions: