owaisnaveed00-hue/dcp-mcp-server
If you are the rightful owner of dcp-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The DCP MCP Server is a Model Context Protocol server implementation with Retrieval-Augmented Generation capabilities designed for intelligent document processing and question-answering.
DCP MCP Server - RAG System
A Model Context Protocol (MCP) server implementation with Retrieval-Augmented Generation (RAG) capabilities for intelligent document processing and question-answering.
🚀 Features
- RAG System: Advanced retrieval-augmented generation for accurate document-based responses
- Document Processing: Support for multiple document formats (PDF, TXT, DOCX, etc.)
- Vector Search: Efficient semantic search using embeddings
- Context-Aware Responses: Generate responses based on retrieved document context
- MCP Protocol: Standardized Model Context Protocol implementation
- Web Interface: User-friendly web interface for document upload and querying
📋 Table of Contents
🛠 Installation
Prerequisites
- Python 3.8+
- pip or conda
- Git
Setup
-
Clone the repository
git clone https://github.com/owaisnaveed00-hue/dcp-mcp-server.git cd dcp-mcp-server -
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate -
Install dependencies
pip install -r requirements.txt -
Set up environment variables
cp .env.example .env # Edit .env with your configuration
🚀 Quick Start
-
Start the server
python app.py -
Access the web interface
- Open your browser to
http://localhost:5000 - Upload documents to build your knowledge base
- Ask questions about your documents
- Open your browser to
-
Use the API
curl -X POST http://localhost:5000/api/query \ -H "Content-Type: application/json" \ -d '{"query": "What is the main topic of the document?"}'
💡 Usage
Web Interface
-
Upload Documents
- Navigate to the upload page
- Select your documents (PDF, TXT, DOCX)
- Documents are automatically processed and indexed
-
Query Documents
- Use the search interface to ask questions
- Get context-aware responses based on your documents
- View source citations and confidence scores
API Usage
Upload Document
import requests
files = {'file': open('document.pdf', 'rb')}
response = requests.post('http://localhost:5000/api/upload', files=files)
print(response.json())
Query Documents
import requests
query = {
"query": "What are the key findings?",
"top_k": 5,
"temperature": 0.7
}
response = requests.post('http://localhost:5000/api/query', json=query)
result = response.json()
print(result['answer'])
📚 API Documentation
Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/upload | Upload and process documents |
| POST | /api/query | Query the RAG system |
| GET | /api/documents | List uploaded documents |
| DELETE | /api/documents/<id> | Delete a document |
| GET | /api/health | Health check |
Request/Response Examples
Query Request
{
"query": "What is machine learning?",
"top_k": 3,
"temperature": 0.7,
"max_tokens": 500
}
Query Response
{
"answer": "Machine learning is a subset of artificial intelligence...",
"sources": [
{
"document": "ml_guide.pdf",
"page": 5,
"content": "Machine learning algorithms...",
"score": 0.95
}
],
"confidence": 0.92,
"processing_time": 1.23
}
⚙️ Configuration
Environment Variables
# Database
DATABASE_URL=sqlite:///rag_system.db
# Vector Store
VECTOR_STORE_TYPE=chroma # or faiss, pinecone
CHROMA_PERSIST_DIRECTORY=./chroma_db
# Model Configuration
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
LLM_MODEL=gpt-3.5-turbo
LLM_API_KEY=your_openai_api_key
# Server Configuration
HOST=0.0.0.0
PORT=5000
DEBUG=False
Model Options
- Embedding Models: sentence-transformers, OpenAI embeddings
- LLM Models: OpenAI GPT, Anthropic Claude, local models
- Vector Stores: Chroma, FAISS, Pinecone, Weaviate
🏗 Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Web Client │ │ API Server │ │ RAG Engine │
│ │◄──►│ │◄──►│ │
│ - Upload UI │ │ - Flask/FastAPI │ │ - Document Parser│
│ - Query UI │ │ - Authentication│ │ - Embedding Gen │
│ - Results UI │ │ - Rate Limiting │ │ - Vector Search │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Database │ │ Vector Store │
│ │ │ │
│ - Document Meta │ │ - Embeddings │
│ - User Sessions │ │ - Similarity │
│ - Query History │ │ - Indexing │
└─────────────────┘ └─────────────────┘
🔧 Development
Project Structure
dcp-mcp-server/
├── app.py # Main application entry point
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
├── src/
│ ├── rag/ # RAG system implementation
│ │ ├── __init__.py
│ │ ├── document_parser.py
│ │ ├── embedding_generator.py
│ │ ├── vector_store.py
│ │ └── query_engine.py
│ ├── api/ # API endpoints
│ │ ├── __init__.py
│ │ ├── routes.py
│ │ └── middleware.py
│ └── models/ # Data models
│ ├── __init__.py
│ ├── document.py
│ └── query.py
├── templates/ # HTML templates
├── static/ # Static files (CSS, JS)
└── tests/ # Test files
Running Tests
# Install test dependencies
pip install -r requirements-test.txt
# Run tests
pytest tests/
# Run with coverage
pytest --cov=src tests/
Code Quality
# Format code
black src/ tests/
# Lint code
flake8 src/ tests/
# Type checking
mypy src/
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Guidelines
- Follow PEP 8 style guidelines
- Write comprehensive tests for new features
- Update documentation for API changes
- Ensure backward compatibility
📊 Performance
Benchmarks
| Metric | Value |
|---|---|
| Document Processing | ~100 pages/second |
| Query Response Time | ~2-5 seconds |
| Vector Search Speed | ~1000 queries/second |
| Memory Usage | ~2GB for 10k documents |
Optimization Tips
- Use GPU acceleration for embedding generation
- Implement document chunking strategies
- Cache frequently accessed embeddings
- Use efficient vector store configurations
🐛 Troubleshooting
Common Issues
-
Out of Memory Errors
- Reduce batch size for document processing
- Use smaller embedding models
- Implement document chunking
-
Slow Query Performance
- Optimize vector store configuration
- Use approximate nearest neighbor search
- Implement result caching
-
Poor Response Quality
- Adjust top_k parameter
- Fine-tune embedding model
- Improve document preprocessing
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- LangChain for RAG framework inspiration
- Chroma for vector storage
- Sentence Transformers for embeddings
📞 Support
- 📧 Email: support@dcp-mcp-server.com
- 💬 Discord: Join our community
- 📖 Documentation: docs.dcp-mcp-server.com
- 🐛 Issues: GitHub Issues
Made with ❤️ by the DCP MCP Server team