tomas-rampas/refdata-mcp
If you are the rightful owner of refdata-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
MCP-RAG Reference Data System
A .NET-based Model Context Protocol server with Retrieval-Augmented Generation capabilities designed for reference data teams. This system provides natural language query capabilities over policies, procedures, and reference data while maintaining compliance and audit requirements.
System Requirements
- Docker & Docker Compose: Docker Compose v2 (latest version)
- RAM: Minimum 16GB (32GB recommended)
- Storage: At least 20GB free space for Docker volumes
- CPU: Multi-core processor recommended for concurrent services
Architecture Overview
The system consists of the following services:
- PostgreSQL: Shared database for Jira and Confluence
- MongoDB: Chat history and metadata storage
- Elasticsearch: Vector search engine
- Ollama: Local LLM service (Phi-3.5-mini model)
- Jira: Document source and issue tracking
- Confluence: Document source and knowledge base
Quick Start
1. Clone the Repository
git clone <repository-url>
cd refdata-mcp
2. Start All Services
# Start all Docker services in detached mode
docker compose -f docker/docker-compose.yml up -d
3. Verify Services are Running
# Check service status
docker compose -f docker/docker-compose.yml ps
# View service logs
docker compose -f docker/docker-compose.yml logs -f
4. Install Ollama Model
After the Ollama service is running, install the Phi-3.5-mini model:
# Pull the Phi-3.5-mini model
docker exec ollama_llm ollama pull phi3.5:3.8b-mini-instruct-q4_K_M
# Verify model installation
docker exec ollama_llm ollama list
5. Verify Service Endpoints
Once all services are running, verify they are accessible:
Core Services
- Ollama API: http://localhost:11434
- MongoDB: localhost:27017
- Elasticsearch: http://localhost:9200
- PostgreSQL: localhost:5432
External Applications
- Jira: http://localhost:8080 (requires initial setup)
- Confluence: http://localhost:8090 (requires initial setup)
Service Configuration
Default Credentials
Service | Username | Password | Database |
---|---|---|---|
PostgreSQL | myuser | mysecretpassword | postgres, jiradb, confluencedb |
MongoDB | myuser | mysecretpassword | - |
Elasticsearch | N/A | No authentication (dev only) | - |
Connecting to Services
PostgreSQL Connection
# Connect via Docker (recommended)
docker exec -it postgres_db psql -U myuser -d postgres
# Connect from host (requires psql client)
psql -h localhost -p 5432 -U myuser -d postgres
# List all databases
docker exec postgres_db psql -U myuser -d postgres -c "\l"
MongoDB Connection
# Connect via Docker
docker exec -it mongo_db mongosh --username myuser --password mysecretpassword
# Connect from host (requires mongosh client)
mongosh "mongodb://myuser:mysecretpassword@localhost:27017"
Resource Allocation
- Elasticsearch: 1GB JVM heap
- Ollama: 6GB memory limit
- Other services: Default Docker limits
Installation Steps
Step 1: Prerequisites
Ensure Docker and Docker Compose are installed:
# Check Docker version
docker --version
# Check Docker Compose version
docker compose version
Step 2: Environment Setup
# Create necessary directories (if not already present)
mkdir -p docker
# Ensure docker-compose.yml is in place
ls docker/docker-compose.yml
Step 3: Start Infrastructure Services
# Start database and search services first
docker compose -f docker/docker-compose.yml up -d postgres mongodb elasticsearch
# Wait for services to be ready (check logs)
docker compose -f docker/docker-compose.yml logs postgres mongodb elasticsearch
Step 4: Start Application Services
# Start Jira, Confluence, and Ollama
docker compose -f docker/docker-compose.yml up -d jira confluence ollama
# Monitor startup progress
docker compose -f docker/docker-compose.yml logs -f jira confluence ollama
Step 5: Configure Ollama Models
# Wait for Ollama to be ready
docker exec ollama_llm ollama --version
# Pull the required model (this may take several minutes)
docker exec ollama_llm ollama pull phi3.5:3.8b-mini-instruct-q4_K_M
# Optional: Pull additional models for different use cases
docker exec ollama_llm ollama pull llama3.2:3b-instruct-q4_K_M
Verification Commands
Check Service Health
# Check all container status
docker compose -f docker/docker-compose.yml ps
# Test Ollama API
curl http://localhost:11434/api/tags
# Test Elasticsearch
curl http://localhost:9200/_cluster/health
# Test MongoDB connection
docker exec mongo_db mongosh --username myuser --password mysecretpassword --eval "db.adminCommand('ismaster')"
# Test PostgreSQL connection
docker exec postgres_db psql -U myuser -d postgres -c "\l"
Test Model Inference
# Test Ollama model with a simple query
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "phi3.5:3.8b-mini-instruct-q4_K_M",
"prompt": "What is artificial intelligence?",
"stream": false
}'
Service Management
Starting Services
# Start all services
docker compose -f docker/docker-compose.yml up -d
# Start specific service
docker compose -f docker/docker-compose.yml up -d ollama
Stopping Services
# Stop all services
docker compose -f docker/docker-compose.yml down
# Stop and remove volumes (WARNING: This deletes all data)
docker compose -f docker/docker-compose.yml down -v
Viewing Logs
# View all service logs
docker compose -f docker/docker-compose.yml logs -f
# View specific service logs
docker compose -f docker/docker-compose.yml logs -f ollama
Initial Setup Requirements
Jira Setup (First Time)
-
Navigate to http://localhost:8080
-
Follow the setup wizard
-
When prompted for database configuration, use these settings:
IMPORTANT: Use
postgres_db
as hostname (NOT localhost)Setting Value Database Type PostgreSQL Hostname postgres_db
Port 5432
Database jiradb
Username myuser
Password mysecretpassword
JDBC URL jdbc:postgresql://postgres_db:5432/jiradb
-
Complete the license and administrator account setup
Confluence Setup (First Time)
-
Navigate to http://localhost:8090
-
Follow the setup wizard
-
When prompted for database configuration, use these settings:
IMPORTANT: Use
postgres_db
as hostname (NOT localhost)Setting Value Database Type PostgreSQL Hostname postgres_db
Port 5432
Database confluencedb
Username myuser
Password mysecretpassword
JDBC URL jdbc:postgresql://postgres_db:5432/confluencedb
-
Complete the license and administrator account setup
Container Networking
Important: Container-to-Container Communication
- From your host machine: Use
localhost:5432
to connect to PostgreSQL - From containers (Jira/Confluence): Use
postgres_db:5432
to connect to PostgreSQL
Container Service Names
All containers can reach each other using these hostnames:
postgres_db
- PostgreSQL databasemongo_db
- MongoDBelasticsearch_node
- Elasticsearchollama_llm
- Ollama LLM servicejira_instance
- Jiraconfluence_instance
- Confluence
Troubleshooting
Common Issues
Out of Memory Errors
# Increase Docker memory allocation
# Go to Docker Desktop Settings > Resources > Memory
# Allocate at least 16GB RAM
Ollama Model Download Fails
# Check Ollama service status
docker logs ollama_llm
# Try pulling model manually
docker exec -it ollama_llm bash
ollama pull phi3.5:3.8b-mini-instruct-q4_K_M
Service Connection Issues
# Check if ports are available
netstat -tulpn | grep -E ':(5432|27017|9200|11434|8080|8090)'
# Restart problematic service
docker compose -f docker/docker-compose.yml restart <service-name>
PostgreSQL Connection Issues
# Connect to PostgreSQL (correct way)
docker exec -it postgres_db psql -U myuser -d postgres
# If you get "No such file or directory" error when using psql directly:
# This means you're trying to connect via Unix socket instead of TCP
# Use the Docker exec method above, or install PostgreSQL client:
sudo apt-get install postgresql-client-common postgresql-client
# Then connect via TCP:
psql -h localhost -p 5432 -U myuser -d postgres
# Check PostgreSQL logs
docker logs postgres_db
# Verify databases were created
docker exec postgres_db psql -U myuser -d postgres -c "SELECT datname FROM pg_database;"
Jira/Confluence Database Connection Issues
If Jira or Confluence shows "Connection refused" errors:
# 1. Ensure PostgreSQL is running and healthy
docker logs postgres_db
# 2. Verify databases were created correctly
docker exec postgres_db psql -U myuser -d postgres -c "\l"
# 3. Check if containers are on the same network
docker network ls
docker network inspect refdata-mcp_refdata_network
# 4. Test connectivity between containers
docker exec jira_instance ping postgres_db
docker exec confluence_instance ping postgres_db
# 5. If databases weren't created correctly, reset PostgreSQL:
docker compose -f docker/docker-compose.yml stop postgres jira confluence
docker compose -f docker/docker-compose.yml rm postgres
docker volume ls | grep postgres # Find the volume name
docker volume rm docker_postgres_data # Replace with actual volume name
docker compose -f docker/docker-compose.yml up -d postgres
# Wait 10-15 seconds for PostgreSQL to initialize
docker compose -f docker/docker-compose.yml up -d jira confluence
# 6. Verify database creation
docker exec postgres_db psql -U myuser -d postgres -c "SELECT datname FROM pg_database WHERE datname IN ('jiradb', 'confluencedb');"
Common Issue: If you see a database named "jiradb,confluencedb"
instead of two separate databases, this indicates the initialization script didn't run properly. Follow step 5 above to reset and recreate the PostgreSQL container.
Logs and Debugging
# Check system resources
docker system df
docker system prune # Clean up unused resources
# Monitor resource usage
docker stats
# Access service containers
docker exec -it ollama_llm bash
docker exec -it mongo_db mongosh
Development Workflow
For development of the .NET application:
- Start infrastructure services
- Build and run .NET projects locally
- Configure connection strings to use localhost ports
- Test integration with Docker services
See CLAUDE.md
for detailed development guidance.
Scripts
Sync-ConfluenceDoc.ps1
A cross-platform PowerShell script for scraping Oracle documentation and creating Confluence pages.
Prerequisites
- PowerShell 7+ (recommended for full cross-platform support)
- .NET SDK (for package management on Linux/macOS)
Usage
Windows:
# Test cross-platform compatibility
.\scripts\Test-CrossPlatform.ps1 -Verbose
# Scrape and save JSON files
.\scripts\Sync-ConfluenceDoc.ps1 -Mode Save -Verbose
# Apply directly to Confluence
.\scripts\Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -Verbose
# Apply and overwrite existing pages
.\scripts\Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -OverwriteExisting -Verbose
Linux/macOS:
# Test cross-platform compatibility
pwsh scripts/Test-CrossPlatform.ps1 -Verbose
# Scrape and save JSON files
pwsh scripts/Sync-ConfluenceDoc.ps1 -Mode Save -Verbose
# Apply directly to Confluence
pwsh scripts/Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -Verbose
# Apply and overwrite existing pages
pwsh scripts/Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -OverwriteExisting -Verbose
Features
- Cross-platform compatibility (Windows, Linux, macOS)
- Automatic dependency management (HtmlAgilityPack)
- Confluence REST API integration
- Graceful rate limiting
- Robust error handling
Populate-ElasticsearchVectors.ps1 ā”
A comprehensive PowerShell script that creates a vector database from Confluence content for Retrieval-Augmented Generation (RAG) operations. This script reads pages directly from Confluence via API, processes them using semantic structure-aware chunking, generates embeddings with Ollama, and indexes them in Elasticsearch.
Prerequisites
- PowerShell 7+ (cross-platform support)
- All Docker services running (Confluence, Elasticsearch, Ollama)
- Confluence space with content (use Sync-ConfluenceDoc.ps1 first)
- Personal Access Token for Confluence API
Key Features
- š§ Semantic Chunking: Structure-aware processing that preserves document integrity
- Uses HTML headers (H1-H6) as natural section boundaries
- Keeps complete procedures, tables, and figures intact
- No arbitrary token limits - prioritizes content completeness
- ā” Parallel Processing: Batch embedding generation with configurable concurrency
- š Vector Search Ready: Creates 384-dimensional embeddings using all-minilm model
- š Comprehensive Monitoring: Progress tracking, statistics, and detailed logging
- š”ļø Robust Error Handling: Graceful failure recovery and service validation
Usage
Basic Usage:
# Dry run to test configuration (recommended first)
pwsh scripts/Populate-ElasticsearchVectors.ps1 -SpaceKey "REF" -PersonalAccessToken "YOUR_TOKEN" -DryRun -Verbose
# Full processing with default settings
pwsh scripts/Populate-ElasticsearchVectors.ps1 -SpaceKey "REF" -PersonalAccessToken "YOUR_TOKEN" -Verbose
# Custom configuration
pwsh scripts/Populate-ElasticsearchVectors.ps1 \
-SpaceKey "ORACLE_DOCS" \
-PersonalAccessToken "YOUR_TOKEN" \
-IndexName "banking-knowledge-base" \
-BatchSize 5 \
-RequestDelay 2000 \
-Verbose
Advanced Options:
# Force recreation of Elasticsearch index
pwsh scripts/Populate-ElasticsearchVectors.ps1 -SpaceKey "REF" -PersonalAccessToken "YOUR_TOKEN" -Force -Verbose
# Custom service endpoints
pwsh scripts/Populate-ElasticsearchVectors.ps1 \
-SpaceKey "REF" \
-PersonalAccessToken "YOUR_TOKEN" \
-ConfluenceBaseUrl "http://localhost:8090" \
-ElasticsearchUrl "http://localhost:9200" \
-OllamaUrl "http://localhost:11434" \
-Verbose
Parameters
Parameter | Description | Default |
---|---|---|
SpaceKey | Confluence space to process (required) | - |
PersonalAccessToken | Confluence API token (required) | - |
ConfluenceBaseUrl | Confluence instance URL | http://localhost:8090 |
ElasticsearchUrl | Elasticsearch instance URL | http://localhost:9200 |
OllamaUrl | Ollama service URL | http://localhost:11434 |
IndexName | Elasticsearch index name | confluence-vectors |
EmbeddingModel | Ollama embedding model | all-minilm |
MaxChunkTokens | Maximum tokens per chunk (fallback only) | 2000 |
BatchSize | Parallel processing batch size | 10 |
RequestDelay | Delay between API requests (ms) | 1000 |
DryRun | Test run without indexing | false |
Force | Recreate index if exists | false |
Processing Pipeline
- Service Validation: Tests connectivity to Confluence, Elasticsearch, and Ollama
- Page Retrieval: Fetches all pages from specified Confluence space with pagination
- Semantic Chunking: Analyzes HTML structure and creates logical content chunks
- Vectorization: Generates embeddings using Ollama's all-minilm model
- Elasticsearch Indexing: Bulk indexes documents with metadata for search
- Statistics & Reporting: Provides comprehensive processing metrics
Output Example
š PROCESSING STATISTICS
========================
Confluence Space: REF
Total Pages Found: 45
Pages Processed: 43
Pages Skipped: 2
Total Chunks Created: 127
Chunks with Embeddings: 125
Average Chunks per Page: 2.95
Average Tokens per Chunk: 456
Total Estimated Tokens: 57,912
Elasticsearch Index: confluence-vectors
Processing Time: 0:12:34
ā
PROCESSING COMPLETED SUCCESSFULLY
Vector database is ready for semantic search and RAG operations!
š” Example Elasticsearch Queries:
Search by content: GET http://localhost:9200/confluence-vectors/_search?q=content:"network code"
Get all chunks from a page: GET http://localhost:9200/confluence-vectors/_search?q=page_title:"Purpose"
Vector similarity search: POST http://localhost:9200/confluence-vectors/_search (with kNN query)
Integration with RAG System
The created vector database integrates seamlessly with the MCP-RAG Reference Data System:
- Elasticsearch Schema: Optimized for vector similarity search with cosine similarity
- Metadata Fields: Includes page IDs, section titles, chunk types for filtering
- Document Structure: Preserves hierarchical relationships for context
- Search Ready: Supports both keyword and semantic vector queries
Troubleshooting
Common Issues:
- No Confluence content: Run
Sync-ConfluenceDoc.ps1
first to populate Confluence - Missing Ollama models: Run
docker exec ollama_llm ollama pull all-minilm
- Memory issues: Reduce
BatchSize
parameter for lower memory usage - API rate limits: Increase
RequestDelay
for more conservative API usage
Service Dependencies:
# Ensure all required services are running
docker compose -f docker/docker-compose.yml ps
# Test individual services
curl http://localhost:8090/rest/api/space
curl http://localhost:9200/_cluster/health
curl http://localhost:11434/api/tags
Support
For issues and questions:
- Check service logs first
- Ensure all prerequisites are met
- Verify port availability
- Check Docker resource allocation
- Run
Test-CrossPlatform.ps1
for PowerShell script issues