ambaricloud/iceberg-mcp-server
If you are the rightful owner of iceberg-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Apache Iceberg Model Context Protocol (MCP) server provides comprehensive data lakehouse operations.
Iceberg MCP Server
Apache Iceberg Model Context Protocol (MCP) server providing comprehensive data lakehouse operations.
Features
🗂️ Catalog Operations
- AWS Glue Catalog support with full credential management
- Polaris Catalog integration with JWT authentication
- Automatic catalog discovery and validation
📊 Table Management
- Create, drop, and list tables and namespaces
- Schema inspection and evolution
- Partition management and analysis
- Table statistics and metadata access
🔍 Data Analysis
- File-level analysis and statistics
- Snapshot and history tracking
- Data quality profiling
- Query optimization insights
🛠️ Maintenance Operations
- Table compaction and optimization
- Snapshot expiration and cleanup
- Metadata management
- Performance monitoring
Quick Start
1. Configuration Setup
# Copy configuration template
cp .env.example .env
# Edit with your credentials
vi .env
2. Run Locally
# Install dependencies
pip install -r requirements.txt
# Start server
python iceberg_server.py
3. Run with Docker
# Build and start
docker-compose up --build
# Access server
curl http://localhost:8077/health
4. Validate Configuration
# Check configuration
python config_check.py
# Show all defaults
python config_check.py --show-defaults
# Validate specific catalog
python config_check.py --catalog polaris
Configuration
The server uses a comprehensive environment-based configuration system. All settings are defined in the .env file.
Required Settings
For Polaris Catalog:
ICEBERG_POLARIS_URI=https://your-polaris-server.com/polaris/api/catalog
ICEBERG_POLARIS_CREDENTIAL=your_jwt_token
ICEBERG_POLARIS_WAREHOUSE=your_warehouse_name
For AWS Glue Catalog:
ICEBERG_AWS_ACCESS_KEY_ID=your_aws_access_key
ICEBERG_AWS_SECRET_ACCESS_KEY=your_aws_secret_key
Optional Settings
# Server configuration
SERVER_HOST=0.0.0.0
SERVER_PORT=8077
SERVER_DEBUG=False
SERVER_TRANSPORT=stdio
# Logging
LOG_LEVEL=INFO
# Spark configuration (for advanced operations)
SPARK_APP_NAME=IcebergMCP
SPARK_PACKAGES=org.apache.hadoop:hadoop-aws:3.3.4,org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0
See for complete details.
Project Structure
iceberg_mcp/iceberg/
├── iceberg_server.py # Main MCP server
├── config.py # Configuration management
├── config_check.py # Configuration validation utility
├── requirements.txt # Python dependencies
├── .env.example # Configuration template
├── .gitignore # Git ignore rules
├── dockerfile # Docker build configuration
├── docker-compose.yml # Docker Compose setup
├── models/ # Data models and utilities
│ ├── iceberg_BaseModels.py # Core Pydantic models
│ ├── iceberg_utils.py # Utility functions
│ ├── iceberg_sampling_*.py # MCP sampling integration
│ └── __init__.py
├── tests/ # Test suite
│ ├── test_mcp_server.py # Main server tests
│ ├── test_docker_health.py # Docker health tests
│ ├── test_quick.py # Quick smoke tests
│ └── run_all_tests.py # Test runner
├── analysis/ # Analysis tools and utilities
│ ├── table_rewrite_analysis.py # Table analysis tools
│ └── table_rewrite_streamlit.py # Streamlit dashboard
└── docs/ # Documentation
├── README.md # This file
├── CLAUDE.md # Claude Code integration guide
├── CONFIG_ARCHITECTURE.md # Configuration system details
└── DOCKER.md # Docker deployment guide
MCP Tools Available
The server provides 20+ MCP tools for comprehensive Iceberg operations:
Table Operations
list_namespaces- List all namespaceslist_tables- List tables in namespacecreate_namespace- Create new namespacecreate_table- Create new tabledrop_table- Drop existing table
Metadata Access
get_table_schema- Get table schemaget_table_partitions- Get partition informationget_table_snapshots- Get snapshot historyget_table_metadata- Get complete metadata
Data Analysis
analyze_table_files- Analyze data filesget_table_statistics- Get table statisticscalculate_table_size- Calculate storage usageget_partition_sizes- Analyze partition sizes
Maintenance
expire_snapshots- Clean up old snapshotsoptimize_table- Compact and optimize tablesvacuum_table- Remove orphaned files
Development
Setup Development Environment
# Clone repository
git clone <repository-url>
cd iceberg_mcp/iceberg
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Set up configuration
cp .env.example .env
# Edit .env with your development credentials
Run Tests
# Run all tests
python tests/run_all_tests.py
# Run specific test
python -m pytest tests/test_mcp_server.py
# Run with coverage
python -m pytest --cov=. tests/
Code Quality
# Format code
black iceberg_server.py config.py
# Type checking
mypy iceberg_server.py
# Linting
pylint iceberg_server.py
Docker Deployment
Development
# Build and run
docker-compose up --build
# Run in background
docker-compose up -d
# View logs
docker-compose logs -f iceberg-mcp
Production
# Use production configuration
cp .env.production .env
# Deploy with restart policy
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# Health check
curl -f http://localhost:8077/health
See for detailed deployment instructions.
Security
Best Practices
- ✅ Store all credentials in
.envfile - ✅ Use different
.envfiles per environment - ✅ Never commit
.envfiles to version control - ✅ Use least-privilege IAM policies for AWS
- ✅ Rotate credentials regularly
- ✅ Monitor JWT token expiration for Polaris
Credential Management
# Development
ICEBERG_POLARIS_CREDENTIAL=dev_jwt_token
# Production (use secret management)
ICEBERG_POLARIS_CREDENTIAL=$(aws secretsmanager get-secret-value --secret-id polaris-prod-token --query SecretString --output text)
Troubleshooting
Common Issues
Configuration Problems:
# Validate configuration
python config_check.py
# Check specific catalog
python config_check.py --catalog polaris
Connection Issues:
# Test catalog connectivity
python -c "
from models.iceberg_BaseModels import *
catalog = get_iceberg_catalog('polaris')
print('Connection successful')
"
Docker Issues:
# Check container logs
docker-compose logs iceberg-mcp
# Verify health
curl http://localhost:8077/health
# Check environment
docker exec -it iceberg-mcp-server env | grep ICEBERG
Performance Tuning
- Adjust Docker memory limits in
docker-compose.yml - Configure appropriate log levels (
LOG_LEVEL=WARNINGfor production) - Use connection pooling for high-throughput scenarios
- Monitor table file counts and sizes
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make changes following the existing patterns
- Add tests for new functionality
- Update documentation as needed
- Run tests and ensure they pass
- Submit a pull request
Development Guidelines
- Follow existing code patterns and naming conventions
- Add comprehensive tests for new features
- Update documentation for any API changes
- Use type hints and proper error handling
- Follow security best practices
License
This project is part of the IcebergMCP suite. See the main project repository for license details.
Support
For issues and questions:
- Check the
- Run configuration validation:
python config_check.py - Review logs for error details
- Open an issue in the main repository