bio-mcp/bio-mcp-blast
If you are the rightful owner of bio-mcp-blast and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Bio-MCP BLAST is an MCP server designed to facilitate AI assistants in performing BLAST sequence similarity searches using natural language.
blastn
Nucleotide-nucleotide BLAST search
blastp
Protein-protein BLAST search
makeblastdb
Create BLAST database from FASTA file
Bio-MCP BLAST
๐ MCP server for NCBI BLAST sequence similarity search
Enable AI assistants to perform BLAST searches through natural language. Search nucleotide and protein databases, create custom databases, and get formatted results instantly.
๐งฌ Features
- blastn - Nucleotide-nucleotide BLAST search
- blastp - Protein-protein BLAST search
- makeblastdb - Create custom BLAST databases
- Multiple output formats - JSON, XML, tabular, pairwise
- Flexible input - File paths or raw sequences
- Queue support - Async processing for large searches
๐ Quick Start
Installation
# Install BLAST+
conda install -c bioconda blast
# Or via package manager
# macOS: brew install blast
# Ubuntu: sudo apt-get install ncbi-blast+
# Install MCP server
git clone https://github.com/bio-mcp/bio-mcp-blast.git
cd bio-mcp-blast
pip install -e .
Basic Usage
# Start the server
python -m src.server
# Or with queue support
python -m src.main --mode queue
Configuration
Add to your MCP client config:
{
"mcpServers": {
"bio-blast": {
"command": "python",
"args": ["-m", "src.server"],
"cwd": "/path/to/bio-mcp-blast"
}
}
}
๐ก Usage Examples
Simple Sequence Search
User: "BLAST this sequence against nr: ATGCGATCGATCG"
AI: [calls blastn] โ Returns top hits with E-values and alignments
File-Based Search
User: "Search proteins.fasta against SwissProt database"
AI: [calls blastp] โ Processes file and returns similarity results
Database Creation
User: "Create a BLAST database from reference_genomes.fasta"
AI: [calls makeblastdb] โ Creates searchable database files
Long-Running Search
User: "BLAST large_dataset.fasta against nt database"
AI: [calls blastn_async] โ "Job submitted! ID: abc123, checking progress..."
๐ ๏ธ Available Tools
blastn
Nucleotide-nucleotide BLAST search
Parameters:
query
(required) - Path to FASTA file or sequence stringdatabase
(required) - Database name (e.g., "nt", "nr") or pathevalue
- E-value threshold (default: 10)max_hits
- Maximum hits to return (default: 50)output_format
- Output format: "tabular", "xml", "json", "pairwise"
blastp
Protein-protein BLAST search
Parameters:
- Same as blastn, but for protein sequences
makeblastdb
Create BLAST database from FASTA file
Parameters:
input_file
(required) - Path to FASTA filedatabase_name
(required) - Name for output databasedbtype
(required) - "nucl" or "prot"title
- Database title (optional)
Async Variants (Queue Mode)
blastn_async
- Submit nucleotide search to queueblastp_async
- Submit protein search to queueget_job_status
- Check job progressget_job_result
- Retrieve completed results
โ๏ธ Configuration
Environment Variables
# Basic settings
export BIO_MCP_MAX_FILE_SIZE=100000000 # 100MB max file size
export BIO_MCP_TIMEOUT=300 # 5 minute timeout
export BIO_MCP_BLAST_PATH="blastn" # BLAST executable path
# Queue mode settings
export BIO_MCP_QUEUE_URL="http://localhost:8000"
Database Setup
# Download common databases
mkdir -p ~/blast-databases
cd ~/blast-databases
# NCBI databases (large downloads!)
update_blastdb.pl --decompress nt
update_blastdb.pl --decompress nr
update_blastdb.pl --decompress swissprot
# Set environment variable
export BLASTDB=~/blast-databases
๐ณ Docker Deployment
Local Docker
# Build image
docker build -t bio-mcp-blast .
# Run container
docker run -p 5000:5000 \
-v ~/blast-databases:/data/blast-db:ro \
-e BLASTDB=/data/blast-db \
bio-mcp-blast
Docker Compose
services:
blast-server:
build: .
ports:
- "5000:5000"
volumes:
- ./databases:/data/blast-db:ro
environment:
- BLASTDB=/data/blast-db
- BIO_MCP_TIMEOUT=600
๐ Queue System
For long-running BLAST searches, use the queue system:
Setup
# Start queue infrastructure
cd ../bio-mcp-queue
./setup-local.sh
# Start BLAST server with queue support
python -m src.main --mode queue --queue-url http://localhost:8000
Usage
# Submit async job
job_info = await blast_server.submit_job(
job_type="blastn",
parameters={
"query": "large_sequences.fasta",
"database": "nt",
"evalue": 0.001
}
)
# Check status
status = await blast_server.get_job_status(job_info["job_id"])
# Get results when complete
results = await blast_server.get_job_result(job_info["job_id"])
๐ Output Formats
Tabular (Default)
# Fields: query_id, subject_id, percent_identity, alignment_length, ...
Query_1 gi|123456 98.5 500 7 0 1 500 1000 1499 1e-180 633
JSON
{
"BlastOutput2": [{
"report": {
"results": {
"search": {
"query_title": "Query_1",
"hits": [...]
}
}
}
}]
}
XML
Standard BLAST XML format for programmatic parsing.
๐งช Testing
# Run tests
pytest tests/ -v
# Test with real data
python tests/test_integration.py
# Performance testing
python tests/benchmark.py
๐ Performance Tips
Local Optimization
- Use SSD storage for databases
- Increase available RAM
- Use multiple CPU cores:
export BLAST_NUM_THREADS=8
Database Selection
- Use smaller, specific databases when possible
- Consider pre-filtering sequences
- Use appropriate E-value thresholds
Queue Optimization
- Scale workers based on CPU cores
- Use separate queues for different database sizes
- Monitor memory usage with large databases
๐ Security
Input Validation
- File size limits prevent resource exhaustion
- Path validation prevents directory traversal
- Command injection protection
Sandboxing
- Containers run as non-root user
- Temporary files isolated per job
- Network access restricted in production
๐ Troubleshooting
Common Issues
BLAST not found
# Check installation
which blastn
blastn -version
# Install via conda
conda install -c bioconda blast
Database not found
# Check BLASTDB environment variable
echo $BLASTDB
# List available databases
blastdbcmd -list /path/to/databases
Out of memory
# Reduce max_target_seqs
blastn -max_target_seqs 100
# Use streaming for large outputs
# Increase system swap space
Timeout errors
# Increase timeout
export BIO_MCP_TIMEOUT=3600 # 1 hour
# Or use queue mode for long searches
python -m src.main --mode queue
๐ Resources
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
See for detailed guidelines.
๐ License
MIT License - see file.
๐ Support
- ๐ Bug Reports: GitHub Issues
- ๐ก Feature Requests: GitHub Issues
- ๐ Documentation: Bio-MCP Docs
- ๐ฌ Discussions: GitHub Discussions
Happy BLASTing! ๐งฌ๐