QuentinCody/uniprot-mcp-server
If you are the rightful owner of uniprot-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A comprehensive Model Context Protocol server for UniProt and EBI Proteins APIs, built on Cloudflare Workers with advanced data staging capabilities using Durable Objects and SQLite.
UniProt & Proteins API MCP Server
A comprehensive Model Context Protocol server for UniProt and EBI Proteins APIs, built on Cloudflare Workers with advanced data staging capabilities using Durable Objects and SQLite.
Overview
This MCP server provides unified access to:
- UniProtKB: Search and retrieve protein sequence and functional information
- EBI Proteins API: Detailed protein features, variations, and structural data
Key Features
š Unified Interface: Single tool for searching UniProt and fetching detailed protein data
š Advanced Data Staging: Large datasets automatically staged in SQLite for complex queries
š Smart Query Generation: Automatic suggestions for exploring staged data
š Intelligent Bypassing: Small datasets returned directly for efficiency
šļø Scalable Architecture: Built on Cloudflare Workers with Durable Objects
ā” Rate Limit Aware: Intelligent handling of API rate limits
Tools Available
UniProt Database Tools
uniprot_search
Advanced UniProtKB search with comprehensive filtering and pagination:
- Query: Complex search queries with UniProt syntax
- Formats: JSON, TSV, FASTA, XML
- Features: Sorting, facets, compression, isoforms
- Pagination: Up to 500 results per page with automatic staging for large datasets
{
"query": "organism_id:9606 AND reviewed:true",
"format": "json",
"fields": "accession,protein_name,gene_names,organism_name",
"size": 100,
"sort": "score desc",
"compressed": true
}
uniprot_stream
Bulk download tool for large datasets with automatic staging:
- Purpose: Stream large datasets efficiently
- Auto-staging: Always stages responses for SQL querying
- Compression: Built-in compression support
- Formats: JSON, TSV, FASTA, XML
{
"query": "organism_id:9606 AND reviewed:true",
"format": "fasta",
"compressed": true
}
uniprot_entry
Retrieve individual UniProtKB entries by accession:
- Direct Access: Get specific protein entries
- Multiple Formats: JSON, TSV, FASTA, XML
- Isoforms: Include protein isoforms
- Field Selection: Choose specific data fields
{
"accession": "P04637",
"format": "json",
"fields": "accession,protein_name,sequence,organism_name",
"include_isoforms": true
}
uniprot_id_mapping
Map IDs between different database systems:
- Batch Processing: Up to 100,000 IDs per job
- Cross-Database: Map between UniProt, Ensembl, PDB, etc.
- Job-Based: Asynchronous processing with status tracking
- Filtering: Taxonomy-based filtering
{
"from_db": "Gene_Name",
"to_db": "UniProtKB",
"ids": ["TP53", "BRCA1", "BRCA2"],
"taxon_id": "9606"
}
uniprot_blast
Perform BLAST searches against UniProtKB:
- Programs: BLASTP, BLASTX, TBLASTN
- Databases: UniProtKB, UniRef90, UniRef50
- Parameters: E-value, matrix, hit limits
- Async Processing: Job-based with polling
{
"program": "blastp",
"sequence": "MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN",
"database": "uniprotkb",
"threshold": 0.001,
"hits": 50
}
EBI Proteins API Tools
proteins_api_details
Detailed protein information from EBI Proteins API:
- Rich Data: Sequence, functional annotations, isoforms
- Formats: JSON, XML
- Isoforms: Include protein variants
{
"accession": "P04637",
"format": "json",
"include_isoforms": true
}
proteins_api_features
Protein sequence features and annotations:
- Categories: Domains, sites, regions, PTMs
- Formats: JSON, XML, GFF
- Filtering: Specific feature categories
{
"accession": "P04637",
"categories": ["DOMAINS_AND_SITES", "PTM"],
"format": "json"
}
proteins_api_variation
Protein sequence variations and disease variants:
- Sources: UniProt, large-scale studies
- Consequences: Missense, nonsense, synonymous
- Disease Filter: Disease-associated variants only
- Clinical Data: ClinVar, COSMIC integration
{
"accession": "P04637",
"sources": ["uniprot", "large_scale_studies"],
"consequences": ["missense", "nonsense"],
"disease_filter": true
}
proteins_api_proteomics
Proteomics data from various studies:
- Studies: PeptideAtlas, MaxQB, ProteomicsDB
- Tissues: Brain, liver, heart, etc.
- Quantitative: Expression levels and modifications
{
"accession": "P04637",
"tissues": ["brain", "liver"],
"format": "json"
}
proteins_api_genome
Genome coordinate mappings:
- Assemblies: GRCh38, GRCh37
- Coordinates: Protein to genomic position mapping
- Exon Structure: Gene structure information
{
"accession": "P04637",
"assembly": "GRCh38",
"format": "json"
}
Data Management Tools
data_manager
Query, analyze, and manage staged datasets:
- Operations: Query, schema, cleanup, export
- SQL Interface: Full SQLite support
- Export: JSON, CSV, TSV formats
- Analytics: Built-in query suggestions
{
"operation": "query",
"data_access_id": "uniprot_1234567890_abc123",
"sql": "SELECT * FROM protein WHERE JSON_EXTRACT(data, '$.organism.scientificName') = 'Homo sapiens' LIMIT 10"
}
Quick Start
1. Setup
npm install
2. Development
npm run dev
The server will be available at:
- MCP Endpoint:
http://localhost:8787/mcp
- SSE Endpoint:
http://localhost:8787/sse
3. Testing Examples
Search for Human Proteins
{
"method": "tools/call",
"params": {
"name": "uniprot_query",
"arguments": {
"operation": "search",
"query": "organism_id:9606 AND reviewed:true",
"limit": 10
}
}
}
Get Protein Details
{
"method": "tools/call",
"params": {
"name": "uniprot_query",
"arguments": {
"operation": "protein_details",
"accession": "P04637"
}
}
}
Stage and Query Multiple Proteins
{
"method": "tools/call",
"params": {
"name": "data_manager",
"arguments": {
"operation": "fetch_and_stage",
"accessions": "P04637,Q92793",
"fields": "accession,protein_name,gene_names,organism_name"
}
}
}
Data Staging & SQL Querying
For large datasets, the server automatically stages data in SQLite tables within Durable Objects, enabling complex analytical queries:
Automatic Table Creation
Data is normalized into tables like:
proteins
: Core protein informationgene_names
: Gene names and synonymsfeatures
: Protein sequence featureskeywords
: Functional keywordsreferences
: Literature references
Example SQL Queries
-- Query staged JSON using SQLite JSON1
SELECT
json_extract(data, '$.primaryAccession') as accession,
json_extract(data, '$.genes[0].geneName.value') as gene_name,
json_extract(data, '$.sequence.length') as length
FROM protein
WHERE json_extract(data, '$.organism.scientificName') = 'Homo sapiens'
LIMIT 10;
API Endpoints and Rate Limits
UniProtKB REST API
- Base URL:
https://rest.uniprot.org/uniprotkb/
- Rate Limits: IP-based, ~3 requests/second recommended
- Formats: JSON, TSV, FASTA, GFF, XML
EBI Proteins API
- Base URL:
https://www.ebi.ac.uk/proteins/api/
- Rate Limits: ~10 requests/second per IP
- Authentication: None required for public data
Architecture
Components
- UniProtMCP: Main MCP agent implementing ToolContext interface
- ToolRegistry: Manages and registers all available tools
- JsonToSqlDO: Durable Object for data staging and SQL operations
- ChunkingEngine: Handles large dataset chunking for efficient processing
- DataInsertionEngine: Optimized bulk data insertion with conflict resolution
- SchemaInferenceEngine: Automatic schema discovery and documentation
Data Flow
- Request: Tool receives search/fetch request
- API Call: Fetches data from UniProt/Proteins APIs
- Parsing: Normalizes JSON responses into structured entities
- Staging Decision: Determines if staging is beneficial
- Storage: Creates optimized SQLite tables in Durable Objects
- Querying: Enables complex SQL analysis of staged data
Deployment
Cloudflare Workers
npm run deploy
Configuration
Ensure wrangler.jsonc
includes:
- Durable Object bindings for
UniProtMCP
andJsonToSqlDO
- Node.js compatibility flags
- Proper migration configuration
Environment Variables
No API keys required - both UniProt and EBI Proteins APIs are open access.
Connect to Claude Desktop
You can connect to your remote MCP server from Claude Desktop using the mcp-remote proxy.
Update your Claude Desktop configuration:
{
"mcpServers": {
"uniprot": {
"command": "npx",
"args": [
"mcp-remote",
"http://localhost:8787/sse" // or your-uniprot-server.workers.dev/sse
]
}
}
}
Rate Limiting Strategy
See for detailed information about:
- API-specific rate limits and best practices
- Intelligent request throttling and retry logic
- Monitoring and optimization strategies
- Bulk operation handling
Examples
Research Workflow: Cancer-related Proteins
- Search for cancer-related proteins:
{
"operation": "search",
"query": "keyword:Cancer AND organism_id:9606",
"limit": 100
}
- Stage for analysis:
{
"operation": "fetch_and_stage",
"accessions": "P04637,P53_HUMAN,BRCA1_HUMAN,BRCA2_HUMAN"
}
- Analyze with SQL:
SELECT
p.accession,
p.protein_name,
COUNT(f.feature_id) as feature_count,
GROUP_CONCAT(DISTINCT k.keyword) as keywords
FROM proteins p
LEFT JOIN features f ON p.accession = f.accession
LEFT JOIN keywords k ON p.accession = k.accession
WHERE k.keyword LIKE '%cancer%'
GROUP BY p.accession
ORDER BY feature_count DESC;
Protein Family Analysis
- Search protein family:
{
"operation": "search",
"query": "family:\"protein kinase\" AND reviewed:true",
"fields": "accession,protein_name,gene_names,ec"
}
- Get detailed features:
{
"operation": "protein_features",
"accession": "P06493",
"features": "DOMAIN,BINDING,ACT_SITE"
}
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with both APIs
- Submit a pull request
License
MIT License - see LICENSE file for details.