refdata-mcp by tomas-rampas - MCP Server

MCP-RAG Reference Data System

A .NET-based Model Context Protocol server with Retrieval-Augmented Generation capabilities designed for reference data teams. This system provides natural language query capabilities over policies, procedures, and reference data while maintaining compliance and audit requirements.

System Requirements

Docker & Docker Compose: Docker Compose v2 (latest version)
RAM: Minimum 16GB (32GB recommended)
Storage: At least 20GB free space for Docker volumes
CPU: Multi-core processor recommended for concurrent services

Architecture Overview

The system consists of the following services:

PostgreSQL: Shared database for Jira and Confluence
MongoDB: Chat history and metadata storage
Elasticsearch: Vector search engine
Ollama: Local LLM service (Phi-3.5-mini model)
Jira: Document source and issue tracking
Confluence: Document source and knowledge base

Quick Start

1. Clone the Repository

git clone <repository-url>
cd refdata-mcp

2. Start All Services

# Start all Docker services in detached mode
docker compose -f docker/docker-compose.yml up -d

3. Verify Services are Running

# Check service status
docker compose -f docker/docker-compose.yml ps

# View service logs
docker compose -f docker/docker-compose.yml logs -f

4. Install Ollama Model

After the Ollama service is running, install the Phi-3.5-mini model:

# Pull the Phi-3.5-mini model
docker exec ollama_llm ollama pull phi3.5:3.8b-mini-instruct-q4_K_M

# Verify model installation
docker exec ollama_llm ollama list

5. Verify Service Endpoints

Once all services are running, verify they are accessible:

Core Services

Ollama API: http://localhost:11434
MongoDB: localhost:27017
Elasticsearch: http://localhost:9200
PostgreSQL: localhost:5432

External Applications

Jira: http://localhost:8080 (requires initial setup)
Confluence: http://localhost:8090 (requires initial setup)

Service Configuration

Default Credentials

Service	Username	Password	Database
PostgreSQL	myuser	mysecretpassword	postgres, jiradb, confluencedb
MongoDB	myuser	mysecretpassword	-
Elasticsearch	N/A	No authentication (dev only)	-

Connecting to Services

PostgreSQL Connection

# Connect via Docker (recommended)
docker exec -it postgres_db psql -U myuser -d postgres

# Connect from host (requires psql client)
psql -h localhost -p 5432 -U myuser -d postgres

# List all databases
docker exec postgres_db psql -U myuser -d postgres -c "\l"

MongoDB Connection

# Connect via Docker
docker exec -it mongo_db mongosh --username myuser --password mysecretpassword

# Connect from host (requires mongosh client)
mongosh "mongodb://myuser:mysecretpassword@localhost:27017"

Resource Allocation

Elasticsearch: 1GB JVM heap
Ollama: 6GB memory limit
Other services: Default Docker limits

Installation Steps

Step 1: Prerequisites

Ensure Docker and Docker Compose are installed:

# Check Docker version
docker --version

# Check Docker Compose version
docker compose version

Step 2: Environment Setup

# Create necessary directories (if not already present)
mkdir -p docker

# Ensure docker-compose.yml is in place
ls docker/docker-compose.yml

Step 3: Start Infrastructure Services

# Start database and search services first
docker compose -f docker/docker-compose.yml up -d postgres mongodb elasticsearch

# Wait for services to be ready (check logs)
docker compose -f docker/docker-compose.yml logs postgres mongodb elasticsearch

Step 4: Start Application Services

# Start Jira, Confluence, and Ollama
docker compose -f docker/docker-compose.yml up -d jira confluence ollama

# Monitor startup progress
docker compose -f docker/docker-compose.yml logs -f jira confluence ollama

Step 5: Configure Ollama Models

# Wait for Ollama to be ready
docker exec ollama_llm ollama --version

# Pull the required model (this may take several minutes)
docker exec ollama_llm ollama pull phi3.5:3.8b-mini-instruct-q4_K_M

# Optional: Pull additional models for different use cases
docker exec ollama_llm ollama pull llama3.2:3b-instruct-q4_K_M

Verification Commands

Check Service Health

# Check all container status
docker compose -f docker/docker-compose.yml ps

# Test Ollama API
curl http://localhost:11434/api/tags

# Test Elasticsearch
curl http://localhost:9200/_cluster/health

# Test MongoDB connection
docker exec mongo_db mongosh --username myuser --password mysecretpassword --eval "db.adminCommand('ismaster')"

# Test PostgreSQL connection
docker exec postgres_db psql -U myuser -d postgres -c "\l"

Test Model Inference

# Test Ollama model with a simple query
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi3.5:3.8b-mini-instruct-q4_K_M",
    "prompt": "What is artificial intelligence?",
    "stream": false
  }'

Service Management

Starting Services

# Start all services
docker compose -f docker/docker-compose.yml up -d

# Start specific service
docker compose -f docker/docker-compose.yml up -d ollama

Stopping Services

# Stop all services
docker compose -f docker/docker-compose.yml down

# Stop and remove volumes (WARNING: This deletes all data)
docker compose -f docker/docker-compose.yml down -v

Viewing Logs

# View all service logs
docker compose -f docker/docker-compose.yml logs -f

# View specific service logs
docker compose -f docker/docker-compose.yml logs -f ollama

Initial Setup Requirements

Jira Setup (First Time)

Navigate to http://localhost:8080
Follow the setup wizard
When prompted for database configuration, use these settings:

IMPORTANT: Use postgres_db as hostname (NOT localhost)

Setting Value
Database Type PostgreSQL
Hostname postgres_db
Port 5432
Database jiradb
Username myuser
Password mysecretpassword
JDBC URL jdbc:postgresql://postgres_db:5432/jiradb
Complete the license and administrator account setup

Setting	Value
Database Type	PostgreSQL
Hostname	`postgres_db`
Port	`5432`
Database	`jiradb`
Username	`myuser`
Password	`mysecretpassword`
JDBC URL	`jdbc:postgresql://postgres_db:5432/jiradb`

Confluence Setup (First Time)

Navigate to http://localhost:8090
Follow the setup wizard
When prompted for database configuration, use these settings:

IMPORTANT: Use postgres_db as hostname (NOT localhost)

Setting Value
Database Type PostgreSQL
Hostname postgres_db
Port 5432
Database confluencedb
Username myuser
Password mysecretpassword
JDBC URL jdbc:postgresql://postgres_db:5432/confluencedb
Complete the license and administrator account setup

Setting	Value
Database Type	PostgreSQL
Hostname	`postgres_db`
Port	`5432`
Database	`confluencedb`
Username	`myuser`
Password	`mysecretpassword`
JDBC URL	`jdbc:postgresql://postgres_db:5432/confluencedb`

Container Networking

Important: Container-to-Container Communication

From your host machine: Use localhost:5432 to connect to PostgreSQL
From containers (Jira/Confluence): Use postgres_db:5432 to connect to PostgreSQL

Container Service Names

All containers can reach each other using these hostnames:

postgres_db - PostgreSQL database
mongo_db - MongoDB
elasticsearch_node - Elasticsearch
ollama_llm - Ollama LLM service
jira_instance - Jira
confluence_instance - Confluence

Troubleshooting

Common Issues

Out of Memory Errors

# Increase Docker memory allocation
# Go to Docker Desktop Settings > Resources > Memory
# Allocate at least 16GB RAM

Ollama Model Download Fails

# Check Ollama service status
docker logs ollama_llm

# Try pulling model manually
docker exec -it ollama_llm bash
ollama pull phi3.5:3.8b-mini-instruct-q4_K_M

Service Connection Issues

# Check if ports are available
netstat -tulpn | grep -E ':(5432|27017|9200|11434|8080|8090)'

# Restart problematic service
docker compose -f docker/docker-compose.yml restart <service-name>

PostgreSQL Connection Issues

# Connect to PostgreSQL (correct way)
docker exec -it postgres_db psql -U myuser -d postgres

# If you get "No such file or directory" error when using psql directly:
# This means you're trying to connect via Unix socket instead of TCP
# Use the Docker exec method above, or install PostgreSQL client:
sudo apt-get install postgresql-client-common postgresql-client

# Then connect via TCP:
psql -h localhost -p 5432 -U myuser -d postgres

# Check PostgreSQL logs
docker logs postgres_db

# Verify databases were created
docker exec postgres_db psql -U myuser -d postgres -c "SELECT datname FROM pg_database;"

Jira/Confluence Database Connection Issues

If Jira or Confluence shows "Connection refused" errors:

# 1. Ensure PostgreSQL is running and healthy
docker logs postgres_db

# 2. Verify databases were created correctly
docker exec postgres_db psql -U myuser -d postgres -c "\l"

# 3. Check if containers are on the same network
docker network ls
docker network inspect refdata-mcp_refdata_network

# 4. Test connectivity between containers
docker exec jira_instance ping postgres_db
docker exec confluence_instance ping postgres_db

# 5. If databases weren't created correctly, reset PostgreSQL:
docker compose -f docker/docker-compose.yml stop postgres jira confluence
docker compose -f docker/docker-compose.yml rm postgres
docker volume ls | grep postgres  # Find the volume name
docker volume rm docker_postgres_data  # Replace with actual volume name
docker compose -f docker/docker-compose.yml up -d postgres
# Wait 10-15 seconds for PostgreSQL to initialize
docker compose -f docker/docker-compose.yml up -d jira confluence

# 6. Verify database creation
docker exec postgres_db psql -U myuser -d postgres -c "SELECT datname FROM pg_database WHERE datname IN ('jiradb', 'confluencedb');"

Common Issue: If you see a database named "jiradb,confluencedb" instead of two separate databases, this indicates the initialization script didn't run properly. Follow step 5 above to reset and recreate the PostgreSQL container.

Logs and Debugging

# Check system resources
docker system df
docker system prune # Clean up unused resources

# Monitor resource usage
docker stats

# Access service containers
docker exec -it ollama_llm bash
docker exec -it mongo_db mongosh

Development Workflow

For development of the .NET application:

Start infrastructure services
Build and run .NET projects locally
Configure connection strings to use localhost ports
Test integration with Docker services

See CLAUDE.md for detailed development guidance.

Scripts

Sync-ConfluenceDoc.ps1

A cross-platform PowerShell script for scraping Oracle documentation and creating Confluence pages.

Prerequisites

PowerShell 7+ (recommended for full cross-platform support)
.NET SDK (for package management on Linux/macOS)

Usage

Windows:

# Test cross-platform compatibility
.\scripts\Test-CrossPlatform.ps1 -Verbose

# Scrape and save JSON files
.\scripts\Sync-ConfluenceDoc.ps1 -Mode Save -Verbose

# Apply directly to Confluence
.\scripts\Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -Verbose

# Apply and overwrite existing pages
.\scripts\Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -OverwriteExisting -Verbose

Linux/macOS:

# Test cross-platform compatibility
pwsh scripts/Test-CrossPlatform.ps1 -Verbose

# Scrape and save JSON files
pwsh scripts/Sync-ConfluenceDoc.ps1 -Mode Save -Verbose

# Apply directly to Confluence
pwsh scripts/Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -Verbose

# Apply and overwrite existing pages
pwsh scripts/Sync-ConfluenceDoc.ps1 -Mode Apply -ConfluenceBaseUrl "http://localhost:8090" -ConfluenceSpaceKey "DOCS" -PersonalAccessToken "YOUR_TOKEN_HERE" -OverwriteExisting -Verbose

Features

Cross-platform compatibility (Windows, Linux, macOS)
Automatic dependency management (HtmlAgilityPack)
Confluence REST API integration
Graceful rate limiting
Robust error handling

Populate-ElasticsearchVectors.ps1 ⚡

A comprehensive PowerShell script that creates a vector database from Confluence content for Retrieval-Augmented Generation (RAG) operations. This script reads pages directly from Confluence via API, processes them using semantic structure-aware chunking, generates embeddings with Ollama, and indexes them in Elasticsearch.

Prerequisites

PowerShell 7+ (cross-platform support)
All Docker services running (Confluence, Elasticsearch, Ollama)
Confluence space with content (use Sync-ConfluenceDoc.ps1 first)
Personal Access Token for Confluence API

Key Features

🧠 Semantic Chunking: Structure-aware processing that preserves document integrity
- Uses HTML headers (H1-H6) as natural section boundaries
- Keeps complete procedures, tables, and figures intact
- No arbitrary token limits - prioritizes content completeness
⚡ Parallel Processing: Batch embedding generation with configurable concurrency
🔍 Vector Search Ready: Creates 384-dimensional embeddings using all-minilm model
📊 Comprehensive Monitoring: Progress tracking, statistics, and detailed logging
🛡️ Robust Error Handling: Graceful failure recovery and service validation

Usage

Basic Usage:

# Dry run to test configuration (recommended first)
pwsh scripts/Populate-ElasticsearchVectors.ps1 -SpaceKey "REF" -PersonalAccessToken "YOUR_TOKEN" -DryRun -Verbose

# Full processing with default settings
pwsh scripts/Populate-ElasticsearchVectors.ps1 -SpaceKey "REF" -PersonalAccessToken "YOUR_TOKEN" -Verbose

# Custom configuration
pwsh scripts/Populate-ElasticsearchVectors.ps1 \
  -SpaceKey "ORACLE_DOCS" \
  -PersonalAccessToken "YOUR_TOKEN" \
  -IndexName "banking-knowledge-base" \
  -BatchSize 5 \
  -RequestDelay 2000 \
  -Verbose

Advanced Options:

# Force recreation of Elasticsearch index
pwsh scripts/Populate-ElasticsearchVectors.ps1 -SpaceKey "REF" -PersonalAccessToken "YOUR_TOKEN" -Force -Verbose

# Custom service endpoints
pwsh scripts/Populate-ElasticsearchVectors.ps1 \
  -SpaceKey "REF" \
  -PersonalAccessToken "YOUR_TOKEN" \
  -ConfluenceBaseUrl "http://localhost:8090" \
  -ElasticsearchUrl "http://localhost:9200" \
  -OllamaUrl "http://localhost:11434" \
  -Verbose

Parameters

Parameter	Description	Default
`SpaceKey`	Confluence space to process (required)	-
`PersonalAccessToken`	Confluence API token (required)	-
`ConfluenceBaseUrl`	Confluence instance URL	http://localhost:8090
`ElasticsearchUrl`	Elasticsearch instance URL	http://localhost:9200
`OllamaUrl`	Ollama service URL	http://localhost:11434
`IndexName`	Elasticsearch index name	confluence-vectors
`EmbeddingModel`	Ollama embedding model	all-minilm
`MaxChunkTokens`	Maximum tokens per chunk (fallback only)	2000
`BatchSize`	Parallel processing batch size	10
`RequestDelay`	Delay between API requests (ms)	1000
`DryRun`	Test run without indexing	false
`Force`	Recreate index if exists	false

Processing Pipeline

Service Validation: Tests connectivity to Confluence, Elasticsearch, and Ollama
Page Retrieval: Fetches all pages from specified Confluence space with pagination
Semantic Chunking: Analyzes HTML structure and creates logical content chunks
Vectorization: Generates embeddings using Ollama's all-minilm model
Elasticsearch Indexing: Bulk indexes documents with metadata for search
Statistics & Reporting: Provides comprehensive processing metrics

Output Example

📊 PROCESSING STATISTICS
========================
Confluence Space: REF
Total Pages Found: 45
Pages Processed: 43
Pages Skipped: 2
Total Chunks Created: 127
Chunks with Embeddings: 125
Average Chunks per Page: 2.95
Average Tokens per Chunk: 456
Total Estimated Tokens: 57,912
Elasticsearch Index: confluence-vectors
Processing Time: 0:12:34

✅ PROCESSING COMPLETED SUCCESSFULLY
Vector database is ready for semantic search and RAG operations!

💡 Example Elasticsearch Queries:
Search by content: GET http://localhost:9200/confluence-vectors/_search?q=content:"network code"
Get all chunks from a page: GET http://localhost:9200/confluence-vectors/_search?q=page_title:"Purpose"
Vector similarity search: POST http://localhost:9200/confluence-vectors/_search (with kNN query)

Integration with RAG System

The created vector database integrates seamlessly with the MCP-RAG Reference Data System:

Elasticsearch Schema: Optimized for vector similarity search with cosine similarity
Metadata Fields: Includes page IDs, section titles, chunk types for filtering
Document Structure: Preserves hierarchical relationships for context
Search Ready: Supports both keyword and semantic vector queries

Troubleshooting

Common Issues:

No Confluence content: Run Sync-ConfluenceDoc.ps1 first to populate Confluence
Missing Ollama models: Run docker exec ollama_llm ollama pull all-minilm
Memory issues: Reduce BatchSize parameter for lower memory usage
API rate limits: Increase RequestDelay for more conservative API usage

Service Dependencies:

# Ensure all required services are running
docker compose -f docker/docker-compose.yml ps

# Test individual services
curl http://localhost:8090/rest/api/space
curl http://localhost:9200/_cluster/health
curl http://localhost:11434/api/tags

Support

For issues and questions:

Check service logs first
Ensure all prerequisites are met
Verify port availability
Check Docker resource allocation
Run Test-CrossPlatform.ps1 for PowerShell script issues