bio-mcp-fastqc

bio-mcp/bio-mcp-fastqc

3.2

If you are the rightful owner of bio-mcp-fastqc and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Bio-MCP FastQC Server is a Model Context Protocol (MCP) server designed to facilitate quality control analysis of sequencing data using FastQC and MultiQC tools.

Tools
  1. fastqc_single

    Run FastQC on a single FASTQ/FASTA file.

  2. fastqc_batch

    Run FastQC on multiple files in a directory.

  3. multiqc_report

    Generate MultiQC report from FastQC results.

  4. fastqc_single_async

    Queue single file analysis for large datasets.

  5. fastqc_batch_async

    Queue batch analysis for large datasets.

Bio-MCP FastQC Server šŸ”¬

Quality Control Analysis via Model Context Protocol

An MCP server that enables AI assistants to run FastQC and MultiQC quality control analysis on sequencing data. Part of the Bio-MCP ecosystem.

šŸŽÆ Purpose

FastQC is essential for quality assessment of high-throughput sequencing data. This MCP server allows AI assistants to:

  • Analyze single files - Get detailed QC reports for individual FASTQ/FASTA files
  • Batch process - Run QC on multiple files simultaneously
  • Generate summary reports - Create MultiQC reports combining multiple analyses
  • Handle large datasets - Queue system support for computationally intensive jobs

šŸš€ Quick Start

Prerequisites

Install FastQC and MultiQC:

# Via conda (recommended)
conda install -c bioconda fastqc multiqc

# Via package managers
# Ubuntu/Debian
sudo apt-get install fastqc
pip install multiqc

# macOS
brew install fastqc
pip install multiqc

Installation

# Clone and install
git clone https://github.com/bio-mcp/bio-mcp-fastqc.git
cd bio-mcp-fastqc
pip install -e .

# Or install directly
pip install git+https://github.com/bio-mcp/bio-mcp-fastqc.git

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "bio-fastqc": {
      "command": "python",
      "args": ["-m", "src.server"],
      "cwd": "/path/to/bio-mcp-fastqc"
    }
  }
}

šŸ”§ Available Tools

Core Analysis Tools

fastqc_single

Run FastQC on a single FASTQ/FASTA file.

Parameters:

  • input_file (required): Path to FASTQ or FASTA file
  • threads (optional): Number of threads (default: 1)
  • contaminants (optional): Path to custom contaminants file
  • adapters (optional): Path to custom adapters file
  • limits (optional): Path to custom limits file

Example:

User: "Run quality control on my_sample.fastq.gz"
AI: [calls fastqc_single] → Returns detailed QC report with pass/warn/fail status for each module
fastqc_batch

Run FastQC on multiple files in a directory.

Parameters:

  • input_dir (required): Directory containing FASTQ/FASTA files
  • file_pattern (optional): File pattern to match (default: ".fastq")
  • threads (optional): Number of threads (default: 4)

Example:

User: "Analyze all fastq files in the data/ directory"
AI: [calls fastqc_batch] → Processes all files and returns summary statistics
multiqc_report

Generate MultiQC report from FastQC results.

Parameters:

  • input_dir (required): Directory containing FastQC and other analysis results
  • title (optional): Custom title for the report
  • comment (optional): Comment to add to the report
  • template (optional): Report template (default, simple, sections, gathered)

Example:

User: "Create a summary report from all the QC results"
AI: [calls multiqc_report] → Generates interactive HTML report combining all analyses

Queue System Tools (when queue enabled)

For large datasets or batch processing:

  • fastqc_single_async - Queue single file analysis
  • fastqc_batch_async - Queue batch analysis
  • multiqc_report_async - Queue report generation
  • get_job_status - Check job progress
  • get_job_result - Retrieve completed results
  • cancel_job - Cancel running jobs

šŸ’” Usage Examples

Basic Quality Assessment

User: "What's the quality of my sequencing data in sample1.fastq.gz?"

AI Response:
āœ… FastQC Analysis Complete for sample1

Module Status Summary:
āœ… Basic Statistics: PASS
āœ… Per base sequence quality: PASS
āš ļø Per sequence quality scores: WARN
āœ… Per base sequence content: PASS
āœ… Per sequence GC content: PASS
āœ… Per base N content: PASS
āœ… Sequence Length Distribution: PASS
āŒ Sequence Duplication Levels: FAIL
āœ… Overrepresented sequences: PASS
āœ… Adapter Content: PASS

Basic Statistics:
• Total Sequences: 2,450,000
• Sequences flagged as poor quality: 0
• Sequence length: 150
• %GC: 47

Batch Analysis

User: "Analyze all FASTQ files in my project/raw_data/ folder"

AI Response:
FastQC Batch Analysis Complete
========================================

Processed 12 files:

āœ… sample01_R1.fastq.gz: 8P/2W/0F
āœ… sample01_R2.fastq.gz: 8P/2W/0F
āš ļø sample02_R1.fastq.gz: 7P/2W/1F
āš ļø sample02_R2.fastq.gz: 7P/2W/1F
...

Overall Summary:
• Total PASS: 94
• Total WARN: 24
• Total FAIL: 2

Tip: Run multiqc_report on this directory to generate a combined report!

Complete Workflow

User: "I have a directory of paired-end FASTQ files. Can you run quality control and create a summary report?"

AI: I'll run a complete QC workflow on your paired-end data:

1. First, let me analyze all FASTQ files in batch:
   [runs fastqc_batch on directory]

2. Now I'll generate a MultiQC summary report:
   [runs multiqc_report on results]

āœ… Complete QC workflow finished!

Summary:
- 24 FASTQ files processed (12 samples, paired-end)
- Average quality score: 32.5
- 2 samples have adapter contamination warnings
- 1 sample shows high duplication levels
- Interactive HTML report generated: multiqc_report.html

The MultiQC report provides detailed visualizations of:
- Quality score distributions across all samples
- GC content comparison
- Sequence length distributions
- Adapter content analysis
- Sample correlation analysis

🐳 Docker Usage

Build and Run

# Build the image
docker build -t bio-mcp-fastqc .

# Run with data mounting
docker run -v /path/to/data:/data bio-mcp-fastqc

Docker Compose (with Queue System)

services:
  fastqc-server:
    build: .
    volumes:
      - ./data:/data
    environment:
      - BIO_MCP_QUEUE_URL=http://queue-api:8000
    depends_on:
      - queue-api

āš™ļø Configuration

Environment Variables

  • BIO_MCP_FASTQC_PATH - Path to FastQC executable (default: "fastqc")
  • BIO_MCP_MULTIQC_PATH - Path to MultiQC executable (default: "multiqc")
  • BIO_MCP_MAX_FILE_SIZE - Maximum file size in bytes (default: 10GB)
  • BIO_MCP_TIMEOUT - Command timeout in seconds (default: 1800)
  • BIO_MCP_TEMP_DIR - Temporary directory for processing

Queue System Integration

To enable async processing for large datasets:

from src.server_with_queue import FastQCServerWithQueue

server = FastQCServerWithQueue(queue_url="http://localhost:8000")

šŸ“Š Output Files

FastQC generates several output files:

  • HTML Report (*_fastqc.html) - Interactive quality report
  • Data File (fastqc_data.txt) - Raw metrics and statistics
  • Summary File (summary.txt) - Pass/warn/fail status for each module
  • Plots - Various quality plots and charts

MultiQC combines these into:

  • MultiQC Report (multiqc_report.html) - Combined interactive report
  • Data Directory (multiqc_data/) - Processed data and statistics
  • General Stats (multiqc_general_stats.txt) - Summary table

šŸ” Quality Metrics Explained

FastQC analyzes multiple quality aspects:

Key Modules

  • Per base sequence quality - Quality scores across read positions
  • Per sequence quality scores - Distribution of mean quality scores
  • Per base sequence content - A/T/G/C content across positions
  • Per sequence GC content - GC% distribution vs expected
  • Sequence duplication levels - PCR duplication assessment
  • Adapter content - Contaminating adapter sequences

Status Interpretation

  • āœ… PASS - Analysis indicates no problems
  • āš ļø WARN - Slightly unusual, may not be problematic
  • āŒ FAIL - Likely problematic, requires attention

🧬 Integration with Bio-MCP Ecosystem

FastQC works seamlessly with other Bio-MCP tools:

User: "Run the complete preprocessing pipeline on my samples"

AI Workflow:
1. fastqc_batch → Initial quality assessment
2. trimmomatic → Trim low-quality bases and adapters  
3. fastqc_batch → Post-trimming QC
4. multiqc_report → Combined before/after report

šŸ¤ Contributing

We welcome contributions! See the Bio-MCP contributing guide.

Development Setup

git clone https://github.com/bio-mcp/bio-mcp-fastqc.git
cd bio-mcp-fastqc
pip install -e ".[dev]"
pytest

šŸ“„ License

MIT License - see file.

šŸ™ Acknowledgments

  • FastQC by Simon Andrews at Babraham Bioinformatics
  • MultiQC by Phil Ewels and the MultiQC community
  • Bio-MCP project and contributors

Part of the Bio-MCP ecosystem - Making bioinformatics accessible to AI assistants.

For more tools: Bio-MCP Organization