bio-mcp/bio-mcp-fastqc
If you are the rightful owner of bio-mcp-fastqc and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Bio-MCP FastQC Server is a Model Context Protocol (MCP) server designed to facilitate quality control analysis of sequencing data using FastQC and MultiQC tools.
fastqc_single
Run FastQC on a single FASTQ/FASTA file.
fastqc_batch
Run FastQC on multiple files in a directory.
multiqc_report
Generate MultiQC report from FastQC results.
fastqc_single_async
Queue single file analysis for large datasets.
fastqc_batch_async
Queue batch analysis for large datasets.
Bio-MCP FastQC Server š¬
Quality Control Analysis via Model Context Protocol
An MCP server that enables AI assistants to run FastQC and MultiQC quality control analysis on sequencing data. Part of the Bio-MCP ecosystem.
šÆ Purpose
FastQC is essential for quality assessment of high-throughput sequencing data. This MCP server allows AI assistants to:
- Analyze single files - Get detailed QC reports for individual FASTQ/FASTA files
- Batch process - Run QC on multiple files simultaneously
- Generate summary reports - Create MultiQC reports combining multiple analyses
- Handle large datasets - Queue system support for computationally intensive jobs
š Quick Start
Prerequisites
Install FastQC and MultiQC:
# Via conda (recommended)
conda install -c bioconda fastqc multiqc
# Via package managers
# Ubuntu/Debian
sudo apt-get install fastqc
pip install multiqc
# macOS
brew install fastqc
pip install multiqc
Installation
# Clone and install
git clone https://github.com/bio-mcp/bio-mcp-fastqc.git
cd bio-mcp-fastqc
pip install -e .
# Or install directly
pip install git+https://github.com/bio-mcp/bio-mcp-fastqc.git
Claude Desktop Configuration
Add to your claude_desktop_config.json
:
{
"mcpServers": {
"bio-fastqc": {
"command": "python",
"args": ["-m", "src.server"],
"cwd": "/path/to/bio-mcp-fastqc"
}
}
}
š§ Available Tools
Core Analysis Tools
fastqc_single
Run FastQC on a single FASTQ/FASTA file.
Parameters:
input_file
(required): Path to FASTQ or FASTA filethreads
(optional): Number of threads (default: 1)contaminants
(optional): Path to custom contaminants fileadapters
(optional): Path to custom adapters filelimits
(optional): Path to custom limits file
Example:
User: "Run quality control on my_sample.fastq.gz"
AI: [calls fastqc_single] ā Returns detailed QC report with pass/warn/fail status for each module
fastqc_batch
Run FastQC on multiple files in a directory.
Parameters:
input_dir
(required): Directory containing FASTQ/FASTA filesfile_pattern
(optional): File pattern to match (default: ".fastq")threads
(optional): Number of threads (default: 4)
Example:
User: "Analyze all fastq files in the data/ directory"
AI: [calls fastqc_batch] ā Processes all files and returns summary statistics
multiqc_report
Generate MultiQC report from FastQC results.
Parameters:
input_dir
(required): Directory containing FastQC and other analysis resultstitle
(optional): Custom title for the reportcomment
(optional): Comment to add to the reporttemplate
(optional): Report template (default, simple, sections, gathered)
Example:
User: "Create a summary report from all the QC results"
AI: [calls multiqc_report] ā Generates interactive HTML report combining all analyses
Queue System Tools (when queue enabled)
For large datasets or batch processing:
fastqc_single_async
- Queue single file analysisfastqc_batch_async
- Queue batch analysismultiqc_report_async
- Queue report generationget_job_status
- Check job progressget_job_result
- Retrieve completed resultscancel_job
- Cancel running jobs
š” Usage Examples
Basic Quality Assessment
User: "What's the quality of my sequencing data in sample1.fastq.gz?"
AI Response:
ā
FastQC Analysis Complete for sample1
Module Status Summary:
ā
Basic Statistics: PASS
ā
Per base sequence quality: PASS
ā ļø Per sequence quality scores: WARN
ā
Per base sequence content: PASS
ā
Per sequence GC content: PASS
ā
Per base N content: PASS
ā
Sequence Length Distribution: PASS
ā Sequence Duplication Levels: FAIL
ā
Overrepresented sequences: PASS
ā
Adapter Content: PASS
Basic Statistics:
⢠Total Sequences: 2,450,000
⢠Sequences flagged as poor quality: 0
⢠Sequence length: 150
⢠%GC: 47
Batch Analysis
User: "Analyze all FASTQ files in my project/raw_data/ folder"
AI Response:
FastQC Batch Analysis Complete
========================================
Processed 12 files:
ā
sample01_R1.fastq.gz: 8P/2W/0F
ā
sample01_R2.fastq.gz: 8P/2W/0F
ā ļø sample02_R1.fastq.gz: 7P/2W/1F
ā ļø sample02_R2.fastq.gz: 7P/2W/1F
...
Overall Summary:
⢠Total PASS: 94
⢠Total WARN: 24
⢠Total FAIL: 2
Tip: Run multiqc_report on this directory to generate a combined report!
Complete Workflow
User: "I have a directory of paired-end FASTQ files. Can you run quality control and create a summary report?"
AI: I'll run a complete QC workflow on your paired-end data:
1. First, let me analyze all FASTQ files in batch:
[runs fastqc_batch on directory]
2. Now I'll generate a MultiQC summary report:
[runs multiqc_report on results]
ā
Complete QC workflow finished!
Summary:
- 24 FASTQ files processed (12 samples, paired-end)
- Average quality score: 32.5
- 2 samples have adapter contamination warnings
- 1 sample shows high duplication levels
- Interactive HTML report generated: multiqc_report.html
The MultiQC report provides detailed visualizations of:
- Quality score distributions across all samples
- GC content comparison
- Sequence length distributions
- Adapter content analysis
- Sample correlation analysis
š³ Docker Usage
Build and Run
# Build the image
docker build -t bio-mcp-fastqc .
# Run with data mounting
docker run -v /path/to/data:/data bio-mcp-fastqc
Docker Compose (with Queue System)
services:
fastqc-server:
build: .
volumes:
- ./data:/data
environment:
- BIO_MCP_QUEUE_URL=http://queue-api:8000
depends_on:
- queue-api
āļø Configuration
Environment Variables
BIO_MCP_FASTQC_PATH
- Path to FastQC executable (default: "fastqc")BIO_MCP_MULTIQC_PATH
- Path to MultiQC executable (default: "multiqc")BIO_MCP_MAX_FILE_SIZE
- Maximum file size in bytes (default: 10GB)BIO_MCP_TIMEOUT
- Command timeout in seconds (default: 1800)BIO_MCP_TEMP_DIR
- Temporary directory for processing
Queue System Integration
To enable async processing for large datasets:
from src.server_with_queue import FastQCServerWithQueue
server = FastQCServerWithQueue(queue_url="http://localhost:8000")
š Output Files
FastQC generates several output files:
- HTML Report (
*_fastqc.html
) - Interactive quality report - Data File (
fastqc_data.txt
) - Raw metrics and statistics - Summary File (
summary.txt
) - Pass/warn/fail status for each module - Plots - Various quality plots and charts
MultiQC combines these into:
- MultiQC Report (
multiqc_report.html
) - Combined interactive report - Data Directory (
multiqc_data/
) - Processed data and statistics - General Stats (
multiqc_general_stats.txt
) - Summary table
š Quality Metrics Explained
FastQC analyzes multiple quality aspects:
Key Modules
- Per base sequence quality - Quality scores across read positions
- Per sequence quality scores - Distribution of mean quality scores
- Per base sequence content - A/T/G/C content across positions
- Per sequence GC content - GC% distribution vs expected
- Sequence duplication levels - PCR duplication assessment
- Adapter content - Contaminating adapter sequences
Status Interpretation
- ā PASS - Analysis indicates no problems
- ā ļø WARN - Slightly unusual, may not be problematic
- ā FAIL - Likely problematic, requires attention
𧬠Integration with Bio-MCP Ecosystem
FastQC works seamlessly with other Bio-MCP tools:
User: "Run the complete preprocessing pipeline on my samples"
AI Workflow:
1. fastqc_batch ā Initial quality assessment
2. trimmomatic ā Trim low-quality bases and adapters
3. fastqc_batch ā Post-trimming QC
4. multiqc_report ā Combined before/after report
š¤ Contributing
We welcome contributions! See the Bio-MCP contributing guide.
Development Setup
git clone https://github.com/bio-mcp/bio-mcp-fastqc.git
cd bio-mcp-fastqc
pip install -e ".[dev]"
pytest
š License
MIT License - see file.
š Acknowledgments
- FastQC by Simon Andrews at Babraham Bioinformatics
- MultiQC by Phil Ewels and the MultiQC community
- Bio-MCP project and contributors
Part of the Bio-MCP ecosystem - Making bioinformatics accessible to AI assistants.
For more tools: Bio-MCP Organization