cbioportal-mcp

pickleton89/cbioportal-mcp

3.3

If you are the rightful owner of cbioportal-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The cBioPortal MCP Server is a high-performance, production-ready Model Context Protocol (MCP) server designed to facilitate seamless interaction between AI assistants and cancer genomics data from cBioPortal.

Tools
  1. get_cancer_studies

    List all available cancer studies with pagination and filtering.

  2. search_studies

    Search studies by keyword with full-text search and sorting.

  3. get_study_details

    Retrieve detailed study information with comprehensive metadata.

  4. get_samples_in_study

    Fetch samples for specific studies with paginated results.

  5. get_genes

    Retrieve gene information by ID or symbol with flexible identifiers.

๐Ÿงฌ cBioPortal MCP Server

Python 3.10+ uv MCP FastMCP Tests Code Coverage

A high-performance, production-ready Model Context Protocol (MCP) server that enables AI assistants to seamlessly interact with cancer genomics data from cBioPortal. Built with modern async Python architecture and modular design for enterprise-grade reliability and 4.5x faster performance.

๐ŸŒŸ Overview & Key Features

๐Ÿš€ Performance & Architecture

  • โšก 4.5x Performance Boost: Full async implementation with concurrent API operations
  • ๐Ÿ—๏ธ Modular Architecture: Professional structure with 71% code reduction (1,357 โ†’ 396 lines)
  • ๐Ÿ“ฆ Modern Package Management: uv-based workflow with pyproject.toml
  • ๐Ÿ”„ Concurrent Operations: Bulk fetching of studies and genes with automatic batching

๐Ÿ”ง Enterprise Features

  • โš™๏ธ Multi-layer Configuration: CLI args โ†’ Environment variables โ†’ YAML config โ†’ Defaults
  • ๐Ÿ“‹ Comprehensive Testing: 92 tests across 8 organized test suites with full coverage
  • ๐Ÿ›ก๏ธ Input Validation: Robust parameter validation and error handling
  • ๐Ÿ“Š Pagination Support: Efficient data retrieval with automatic pagination

๐Ÿงฌ Cancer Genomics Capabilities

  • ๐Ÿ” Study Management: Browse, search, and analyze cancer studies
  • ๐Ÿงช Molecular Data: Access mutations, clinical data, and molecular profiles
  • ๐Ÿ“ˆ Bulk Operations: Concurrent fetching of multiple entities
  • ๐Ÿ”Ž Advanced Search: Keyword-based discovery across studies and genes

๐Ÿง ๐Ÿค– AI-Collaborative Development

This project demonstrates cutting-edge human-AI collaboration in bioinformatics software development:

  • ๐Ÿง  Domain Expertise: 20+ years cancer research experience guided architecture and feature requirements
  • ๐Ÿค– AI Implementation: Advanced code generation, API design, and performance optimization through systematic LLM collaboration
  • ๐Ÿ”„ Quality Assurance: Iterative refinement ensuring professional standards and production reliability
  • ๐Ÿ“ˆ Innovation Approach: Showcases how domain experts can effectively leverage AI tools to build enterprise-grade bioinformatics platforms

Methodology: This collaborative approach combines deep biological domain knowledge with AI-powered development capabilities, accelerating innovation while maintaining rigorous code quality and scientific accuracy.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10+ ๐Ÿ
  • uv (modern package manager) - recommended ๐Ÿ“ฆ
  • Git (optional, for cloning)

โšก Installation & Launch

# Install uv if needed
pipx install uv

# Clone and setup
git clone https://github.com/pickleton89/cbioportal-mcp.git
cd cbioportal-mcp
uv sync

# Launch server
uv run cbioportal-mcp

That's it! ๐ŸŽ‰ Your server is running and ready for AI assistant connections.

๐Ÿ“ฆ Installation Options

๐Ÿ”ฅ Option 1: uv (Recommended)

Modern, lightning-fast package management with automatic environment handling:

# Install uv
pipx install uv
# Or with Homebrew: brew install uv

# Clone repository
git clone https://github.com/pickleton89/cbioportal-mcp.git
cd cbioportal-mcp

# One-command setup (creates venv + installs dependencies)
uv sync

# Alternative: development mode with all dev dependencies
uv sync --group dev

๐Ÿ Option 2: pip (Traditional)

Standard Python package management approach:

# Create virtual environment
python -m venv cbioportal-mcp-env

# Activate environment
# Windows: cbioportal-mcp-env\Scripts\activate
# macOS/Linux: source cbioportal-mcp-env/bin/activate

# Install dependencies
pip install -e .

โš™๏ธ Configuration

๐ŸŽ›๏ธ Multi-Layer Configuration System

The server supports flexible configuration with priority: CLI args > Environment variables > Config file > Defaults

YAML Configuration ๐Ÿ“„

Create config.yaml for persistent settings:

# cBioPortal MCP Server Configuration
server:
  base_url: "https://www.cbioportal.org/api"
  transport: "stdio"
  
logging:
  level: "INFO"
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

api:
  timeout: 480
  max_retries: 3
  rate_limit: 100

# Performance settings
performance:
  concurrent_batch_size: 10
  max_concurrent_requests: 20
Environment Variables ๐ŸŒ
export CBIOPORTAL_BASE_URL="https://custom-instance.org/api"
export CBIOPORTAL_LOG_LEVEL="DEBUG"
export CBIOPORTAL_TIMEOUT=600
CLI Options ๐Ÿ’ป
# Basic usage
uv run cbioportal-mcp

# Custom configuration
uv run cbioportal-mcp --config config.yaml --log-level DEBUG

# Custom API endpoint
uv run cbioportal-mcp --base-url https://custom-instance.org/api

# Generate example config
uv run cbioportal-mcp --create-example-config

๐Ÿ”Œ Usage & Integration

๐Ÿ–ฅ๏ธ Claude Desktop Integration

Configure in your Claude Desktop MCP settings:

{
  "mcpServers": {
    "cbioportal": {
      "command": "uv",
      "args": ["run", "cbioportal-mcp"],
      "cwd": "/path/to/cbioportal-mcp",
      "env": {
        "CBIOPORTAL_LOG_LEVEL": "INFO"
      }
    }
  }
}

๐Ÿ”ง VS Code Integration

Add to your workspace settings:

{
  "mcp.servers": {
    "cbioportal": {
      "command": "uv",
      "args": ["run", "cbioportal-mcp"],
      "cwd": "/path/to/cbioportal-mcp"
    }
  }
}

๐Ÿƒโ€โ™‚๏ธ Command Line Usage

# Development server with debug logging
uv run python cbioportal_server.py --log-level DEBUG

# Production server with custom config
uv run cbioportal-mcp --config production.yaml

# Using custom cBioPortal instance
uv run cbioportal-mcp --base-url https://private-instance.org/api

๐Ÿ—๏ธ Architecture

๐Ÿ“ Modern Project Structure

cbioportal-mcp/
โ”œโ”€โ”€ ๐Ÿ“Š cbioportal_server.py      # Main MCP server (396 lines - 71% reduction!)
โ”œโ”€โ”€ ๐ŸŒ api_client.py             # Dedicated HTTP client class
โ”œโ”€โ”€ โš™๏ธ config.py                 # Multi-layer configuration system
โ”œโ”€โ”€ ๐Ÿ“‹ constants.py              # Centralized constants
โ”œโ”€โ”€ ๐Ÿ“ endpoints/                # Domain-specific API modules
โ”‚   โ”œโ”€โ”€ ๐Ÿ”ฌ studies.py           # Cancer studies & search
โ”‚   โ”œโ”€โ”€ ๐Ÿงฌ genes.py             # Gene operations & mutations
โ”‚   โ”œโ”€โ”€ ๐Ÿงช samples.py           # Sample data management
โ”‚   โ””โ”€โ”€ ๐Ÿ“ˆ molecular_profiles.py # Molecular & clinical data
โ”œโ”€โ”€ ๐Ÿ“ utils/                    # Shared utilities
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ pagination.py        # Efficient pagination logic
โ”‚   โ”œโ”€โ”€ โœ… validation.py        # Input validation
โ”‚   โ””โ”€โ”€ ๐Ÿ“ logging.py           # Logging configuration
โ”œโ”€โ”€ ๐Ÿ“ tests/                    # Comprehensive test suite (92 tests)
โ”œโ”€โ”€ ๐Ÿ“ docs/                     # Documentation
โ”œโ”€โ”€ ๐Ÿ“ scripts/                  # Development utilities
โ””โ”€โ”€ ๐Ÿ“„ pyproject.toml           # Modern Python project config

๐ŸŽฏ Design Principles

  • ๐Ÿ”ง Modular: Clear separation of concerns with domain-specific modules
  • โšก Async-First: Full asynchronous implementation for maximum performance
  • ๐Ÿ›ก๏ธ Robust: Comprehensive input validation and error handling
  • ๐Ÿงช Testable: 92 tests ensuring reliability and preventing regressions
  • ๐Ÿ”„ Maintainable: Clean code architecture with 71% reduction in complexity

๐Ÿ› ๏ธ Available Tools

The server provides 12 high-performance tools for AI assistants:

๐Ÿ”ง Tool๐Ÿ“ Descriptionโšก Features
get_cancer_studiesList all available cancer studies๐Ÿ“„ Pagination, ๐Ÿ” Filtering
search_studiesSearch studies by keyword๐Ÿ”Ž Full-text search, ๐Ÿ“Š Sorting
get_study_detailsDetailed study information๐Ÿ“ˆ Comprehensive metadata
get_samples_in_studySamples for specific studies๐Ÿ“„ Paginated results
get_genesGene information by ID/symbol๐Ÿท๏ธ Flexible identifiers
search_genesSearch genes by keyword๐Ÿ” Symbol & name search
get_mutations_in_geneGene mutations in studies๐Ÿงฌ Mutation details
get_clinical_dataPatient clinical information๐Ÿ‘ฅ Patient-centric data
get_molecular_profilesStudy molecular profiles๐Ÿ“Š Profile metadata
get_multiple_studies๐Ÿš€ Concurrent study fetchingโšก Bulk operations
get_multiple_genes๐Ÿš€ Concurrent gene retrieval๐Ÿ“ฆ Automatic batching
get_gene_panels_for_studyGene panels in studies๐Ÿงฌ Panel information

๐ŸŒŸ Performance Features

  • โšก Concurrent Operations: get_multiple_* methods use asyncio.gather for parallel processing
  • ๐Ÿ“ฆ Smart Batching: Automatic batching for large gene lists
  • ๐Ÿ“„ Efficient Pagination: Async generators for memory-efficient data streaming
  • โฑ๏ธ Performance Metrics: Execution timing and batch count reporting

๐Ÿš€ Performance

๐Ÿ“Š Benchmark Results

Our async implementation delivers significant performance improvements:

๐Ÿƒโ€โ™‚๏ธ Sequential Study Fetching:  1.31 seconds (10 studies)
โšก Concurrent Study Fetching:   0.29 seconds (10 studies)
๐ŸŽฏ Performance Improvement:     4.57x faster!

๐Ÿ”ฅ Async Benefits

  • ๐Ÿš€ 4.5x Faster: Concurrent API requests vs sequential operations
  • ๐Ÿ“ฆ Bulk Processing: Efficient batched operations for multiple entities
  • โฑ๏ธ Non-blocking: Asynchronous I/O prevents request blocking
  • ๐Ÿงฎ Smart Batching: Automatic optimization for large datasets

๐Ÿ’ก Performance Tips

  • Use get_multiple_studies for fetching multiple studies concurrently
  • Leverage get_multiple_genes with automatic batching for gene lists
  • Configure concurrent_batch_size in config for optimal performance
  • Monitor execution metrics included in response metadata

๐Ÿ‘จโ€๐Ÿ’ป Development

๐Ÿ”จ Development Workflow

# Setup development environment
uv sync --group dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=.

# Run specific test file
uv run pytest tests/test_server_lifecycle.py

# Update snapshots
uv run pytest --snapshot-update

# Lint code
uv run ruff check .

# Format code  
uv run ruff format .

๐Ÿงช Testing

Comprehensive test suite with 92 tests across 8 categories:

  • ๐Ÿ”„ test_server_lifecycle.py - Server startup/shutdown & tool registration
  • ๐Ÿ“„ test_pagination.py - Pagination logic & edge cases
  • ๐Ÿš€ test_multiple_entity_apis.py - Concurrent operations & bulk fetching
  • โœ… test_input_validation.py - Parameter validation & error handling
  • ๐Ÿ“ธ test_snapshot_responses.py - API response consistency (syrupy)
  • ๐Ÿ’ป test_cli.py - Command-line interface & argument parsing
  • ๐Ÿ›ก๏ธ test_error_handling.py - Error scenarios & network issues
  • โš™๏ธ test_configuration.py - Configuration system validation

๐Ÿ› ๏ธ Development Tools

  • ๐Ÿ“ฆ uv: Modern package management (10-100x faster than pip)
  • ๐Ÿงช pytest: Testing framework with async support
  • ๐Ÿ“ธ syrupy: Snapshot testing for API responses
  • ๐Ÿ” ruff: Lightning-fast linting and formatting
  • ๐Ÿ“Š pytest-cov: Code coverage reporting

๐Ÿค Contributing

  1. ๐Ÿด Fork the repository
  2. ๐ŸŒฟ Create a feature branch (git checkout -b feature/amazing-feature)
  3. โœ… Test your changes (uv run pytest)
  4. ๐Ÿ“ Commit with clear messages (git commit -m 'Add amazing feature')
  5. ๐Ÿš€ Push to branch (git push origin feature/amazing-feature)
  6. ๐Ÿ”„ Create a Pull Request

๐Ÿ”ง Troubleshooting

๐Ÿšจ Common Issues

Server Fails to Start
# Check Python version
python --version  # Should be 3.10+

# Verify dependencies
uv sync

# Check for conflicts
uv run python -c "import mcp, httpx, fastmcp; print('Dependencies OK')"
Claude Desktop Connection Issues
  • โœ… Verify paths in MCP configuration are absolute
  • โœ… Check that uv is in your system PATH
  • โœ… Ensure cwd points to project directory
  • โœ… Review Claude Desktop logs for detailed errors
Performance Issues
  • ๐Ÿ”ง Increase concurrent_batch_size in config
  • ๐Ÿ”ง Adjust max_concurrent_requests for your system
  • ๐Ÿ”ง Use get_multiple_* methods for bulk operations
  • ๐Ÿ”ง Monitor network latency to cBioPortal API
Configuration Problems
# Generate example config
uv run cbioportal-mcp --create-example-config

# Validate configuration
uv run cbioportal-mcp --config your-config.yaml --log-level DEBUG

# Check environment variables
env | grep CBIOPORTAL

๐ŸŒ API Connectivity

# Test cBioPortal API accessibility
curl https://www.cbioportal.org/api/cancer-types

# Test with custom instance
curl https://your-instance.org/api/studies

๐Ÿ’ก Examples & Use Cases

๐Ÿ” Research Queries

"What cancer studies are available for breast cancer research?"
"Search for melanoma studies with genomic data"
"Get mutation data for TP53 in lung cancer studies"
"Find clinical data for patients in the TCGA-BRCA study"
"What molecular profiles are available for pediatric brain tumors?"

๐Ÿงฌ Genomic Analysis

"Compare mutation frequencies between two cancer studies"
"Get all genes in the DNA repair pathway for ovarian cancer"
"Find studies with both RNA-seq and mutation data"
"What are the most frequently mutated genes in glioblastoma?"

๐Ÿ“Š Bulk Operations

"Fetch data for multiple cancer studies concurrently"
"Get information for a list of cancer genes efficiently"
"Compare clinical characteristics across multiple studies"
"Retrieve molecular profiles for several cancer types"

๐Ÿ“œ License

This project is licensed under the MIT License - see the file for details.

๐Ÿ™ Acknowledgments

  • ๐Ÿงฌ cBioPortal - Open-access cancer genomics data platform
  • ๐Ÿ”— Model Context Protocol - Enabling seamless AI-tool interactions
  • โšก FastMCP - High-performance MCP server framework
  • ๐Ÿ“ฆ uv - Modern Python package management
  • ๐Ÿค– AI Collaboration - Demonstrating the power of human-AI partnership in scientific software development

๐ŸŒŸ Built with passion for cancer research and cutting-edge technology! ๐Ÿงฌโœจ