iceberg-mcp-server

ambaricloud/iceberg-mcp-server

3.1

If you are the rightful owner of iceberg-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Apache Iceberg Model Context Protocol (MCP) server provides comprehensive data lakehouse operations.

Tools
16
Resources
0
Prompts
0

Iceberg MCP Server

Apache Iceberg Model Context Protocol (MCP) server providing comprehensive data lakehouse operations.

Features

🗂️ Catalog Operations

  • AWS Glue Catalog support with full credential management
  • Polaris Catalog integration with JWT authentication
  • Automatic catalog discovery and validation

📊 Table Management

  • Create, drop, and list tables and namespaces
  • Schema inspection and evolution
  • Partition management and analysis
  • Table statistics and metadata access

🔍 Data Analysis

  • File-level analysis and statistics
  • Snapshot and history tracking
  • Data quality profiling
  • Query optimization insights

🛠️ Maintenance Operations

  • Table compaction and optimization
  • Snapshot expiration and cleanup
  • Metadata management
  • Performance monitoring

Quick Start

1. Configuration Setup

# Copy configuration template
cp .env.example .env

# Edit with your credentials
vi .env

2. Run Locally

# Install dependencies
pip install -r requirements.txt

# Start server
python iceberg_server.py

3. Run with Docker

# Build and start
docker-compose up --build

# Access server
curl http://localhost:8077/health

4. Validate Configuration

# Check configuration
python config_check.py

# Show all defaults
python config_check.py --show-defaults

# Validate specific catalog
python config_check.py --catalog polaris

Configuration

The server uses a comprehensive environment-based configuration system. All settings are defined in the .env file.

Required Settings

For Polaris Catalog:

ICEBERG_POLARIS_URI=https://your-polaris-server.com/polaris/api/catalog
ICEBERG_POLARIS_CREDENTIAL=your_jwt_token
ICEBERG_POLARIS_WAREHOUSE=your_warehouse_name

For AWS Glue Catalog:

ICEBERG_AWS_ACCESS_KEY_ID=your_aws_access_key
ICEBERG_AWS_SECRET_ACCESS_KEY=your_aws_secret_key

Optional Settings

# Server configuration
SERVER_HOST=0.0.0.0
SERVER_PORT=8077
SERVER_DEBUG=False
SERVER_TRANSPORT=stdio

# Logging
LOG_LEVEL=INFO

# Spark configuration (for advanced operations)
SPARK_APP_NAME=IcebergMCP
SPARK_PACKAGES=org.apache.hadoop:hadoop-aws:3.3.4,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0

See for complete details.

Project Structure

iceberg_mcp/iceberg/
├── iceberg_server.py          # Main MCP server
├── config.py                  # Configuration management
├── config_check.py            # Configuration validation utility
├── requirements.txt           # Python dependencies
├── .env.example              # Configuration template
├── .gitignore                # Git ignore rules
├── dockerfile                # Docker build configuration  
├── docker-compose.yml        # Docker Compose setup
├── models/                   # Data models and utilities
│   ├── iceberg_BaseModels.py # Core Pydantic models
│   ├── iceberg_utils.py      # Utility functions
│   ├── iceberg_sampling_*.py # MCP sampling integration
│   └── __init__.py
├── tests/                    # Test suite
│   ├── test_mcp_server.py    # Main server tests
│   ├── test_docker_health.py # Docker health tests
│   ├── test_quick.py         # Quick smoke tests
│   └── run_all_tests.py      # Test runner
├── analysis/                 # Analysis tools and utilities
│   ├── table_rewrite_analysis.py # Table analysis tools
│   └── table_rewrite_streamlit.py # Streamlit dashboard
└── docs/                     # Documentation
    ├── README.md             # This file
    ├── CLAUDE.md             # Claude Code integration guide
    ├── CONFIG_ARCHITECTURE.md # Configuration system details
    └── DOCKER.md             # Docker deployment guide

MCP Tools Available

The server provides 20+ MCP tools for comprehensive Iceberg operations:

Table Operations

  • list_namespaces - List all namespaces
  • list_tables - List tables in namespace
  • create_namespace - Create new namespace
  • create_table - Create new table
  • drop_table - Drop existing table

Metadata Access

  • get_table_schema - Get table schema
  • get_table_partitions - Get partition information
  • get_table_snapshots - Get snapshot history
  • get_table_metadata - Get complete metadata

Data Analysis

  • analyze_table_files - Analyze data files
  • get_table_statistics - Get table statistics
  • calculate_table_size - Calculate storage usage
  • get_partition_sizes - Analyze partition sizes

Maintenance

  • expire_snapshots - Clean up old snapshots
  • optimize_table - Compact and optimize tables
  • vacuum_table - Remove orphaned files

Development

Setup Development Environment

# Clone repository
git clone <repository-url>
cd iceberg_mcp/iceberg

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Set up configuration
cp .env.example .env
# Edit .env with your development credentials

Run Tests

# Run all tests
python tests/run_all_tests.py

# Run specific test
python -m pytest tests/test_mcp_server.py

# Run with coverage
python -m pytest --cov=. tests/

Code Quality

# Format code
black iceberg_server.py config.py

# Type checking
mypy iceberg_server.py

# Linting
pylint iceberg_server.py

Docker Deployment

Development

# Build and run
docker-compose up --build

# Run in background
docker-compose up -d

# View logs
docker-compose logs -f iceberg-mcp

Production

# Use production configuration
cp .env.production .env

# Deploy with restart policy
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d

# Health check
curl -f http://localhost:8077/health

See for detailed deployment instructions.

Security

Best Practices

  • ✅ Store all credentials in .env file
  • ✅ Use different .env files per environment
  • ✅ Never commit .env files to version control
  • ✅ Use least-privilege IAM policies for AWS
  • ✅ Rotate credentials regularly
  • ✅ Monitor JWT token expiration for Polaris

Credential Management

# Development
ICEBERG_POLARIS_CREDENTIAL=dev_jwt_token

# Production (use secret management)
ICEBERG_POLARIS_CREDENTIAL=$(aws secretsmanager get-secret-value --secret-id polaris-prod-token --query SecretString --output text)

Troubleshooting

Common Issues

Configuration Problems:

# Validate configuration
python config_check.py

# Check specific catalog
python config_check.py --catalog polaris

Connection Issues:

# Test catalog connectivity
python -c "
from models.iceberg_BaseModels import *
catalog = get_iceberg_catalog('polaris')
print('Connection successful')
"

Docker Issues:

# Check container logs
docker-compose logs iceberg-mcp

# Verify health
curl http://localhost:8077/health

# Check environment
docker exec -it iceberg-mcp-server env | grep ICEBERG

Performance Tuning

  • Adjust Docker memory limits in docker-compose.yml
  • Configure appropriate log levels (LOG_LEVEL=WARNING for production)
  • Use connection pooling for high-throughput scenarios
  • Monitor table file counts and sizes

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make changes following the existing patterns
  4. Add tests for new functionality
  5. Update documentation as needed
  6. Run tests and ensure they pass
  7. Submit a pull request

Development Guidelines

  • Follow existing code patterns and naming conventions
  • Add comprehensive tests for new features
  • Update documentation for any API changes
  • Use type hints and proper error handling
  • Follow security best practices

License

This project is part of the IcebergMCP suite. See the main project repository for license details.

Support

For issues and questions:

  • Check the
  • Run configuration validation: python config_check.py
  • Review logs for error details
  • Open an issue in the main repository