muwenbo/phenotype-analysis-toolset-mcp
If you are the rightful owner of phenotype-analysis-toolset-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Phenotype Analysis Toolset is an AI-powered system designed for advanced phenotype analysis, providing semantic search and structured workflows for mapping clinical symptoms to HPO terms.
Phenotype Analysis Toolset
A comprehensive AI-powered phenotype analysis system that provides advanced semantic search and structured workflows for mapping clinical symptoms to HPO (Human Phenotype Ontology) terms. Built with FastMCP framework, supporting both Chinese and English medical terminology with RAG-based analysis.
Key Features
🔍 Advanced Semantic Search
- Vector-based HPO term matching using FAISS + VoyageAI embeddings
- High-accuracy similarity search (0.5-0.6 confidence scores)
- Support for both Chinese and English medical terminology
🧬 Comprehensive Database Queries
- Bidirectional mappings between phenotypes, genes, and diseases
- 6 core relationship query functions with >100K annotations
- Real-time database health monitoring
🌐 Multilingual Clinical Analysis
- Chinese phenotype analysis with translation and standardization
- English phenotype analysis with streamlined processing
- Structured 4-step workflows with confidence scoring
⚙️ Robust Architecture
- FastMCP-based server with path-independent operations
- SQLite database with comprehensive test suite
- Graceful error handling and API key management
Project Structure
.
├── mcp_server.py # Main FastMCP server with 11 tools
├── api_server.py # Alternative FastAPI REST interface
├── hpo_annotations.db # SQLite database with phenotype data
├── embeddings/voyage_3/ # FAISS vector store for semantic search
├── data/ # Raw HPO and gene-disease data files
│ ├── phenotype.hpoa # HPO annotations (100K+ entries)
│ ├── hp.json # HPO ontology structure
│ ├── genes_to_disease.txt # Gene-disease mappings
│ └── phenotype_to_genes.txt # Phenotype-gene associations
├── src/ # Core analysis modules
│ ├── phenotype_analysis_rag.py # RAG-based phenotype processing
│ ├── embedding.py # Vector embedding utilities
│ └── utils.py # Common utilities
├── scripts/ # Database and embedding setup
│ ├── create_db.py # Initialize SQLite database
│ ├── populate_db.py # Load data into database
│ ├── generate_embeddings.py # Create FAISS vector store
│ └── update_gene_ids.py # Update gene identifiers
├── tests/ # Comprehensive test suite
│ ├── test_search_hpo_terms.py # Vector search validation
│ ├── test_voyage_api_key_config.py # API key configuration
│ └── test_path_independence.py # Cross-directory functionality
└── test_search_hpo_terms.py # Main HPO search testing script
Available MCP Tools
🔍 Core Database Queries
get_genes_by_hpo(hpo_id)- Get genes associated with HPO termget_hpo_by_gene(gene_id)- Get HPO terms for specific geneget_diseases_by_gene(gene_id)- Get diseases associated with geneget_genes_by_disease(disease_id)- Get genes linked to diseaseget_diseases_by_hpo(hpo_id)- Get diseases for HPO termget_hpo_by_disease(disease_id)- Get HPO terms for diseaseget_hpo_name_by_id(hpo_id)- Resolve HPO ID to human-readable name
🧠 Semantic Search & Analysis
search_hpo_for_symptom(english_symptom, k=5)- Vector-based HPO term matchingchinese_phenotype_analysis_workflow()- 4-step Chinese clinical text analysisenglish_phenotype_analysis_workflow()- Streamlined English phenotype analysis
⚙️ System Management
get_server_status()- Database health, embeddings status, table statisticsget_api_key_configuration()- VOYAGE_API_KEY setup and validation
Quick Start
1. Installation
git clone <repository-url>
cd phenotype_analysis_toolset
pip install -r requirements.txt
2. API Key Configuration
Get your VoyageAI API key from voyage.ai and configure it:
Option A: Environment Variable
export VOYAGE_API_KEY='your_voyage_api_key_here'
Option B: .env File
echo "VOYAGE_API_KEY=your_voyage_api_key_here" > .env
3. Database Setup
# Create and populate database (one-time setup)
python3 scripts/create_db.py
python3 scripts/populate_db.py
python3 scripts/update_gene_ids.py
# Generate semantic search embeddings (requires API key)
python3 scripts/generate_embeddings.py
4. Run the Server
# Start FastMCP server (primary interface)
python3 mcp_server.py
# OR start FastAPI server (REST interface)
uvicorn api_server:app --reload --port 8000
5. Test the Setup
# Validate vector search functionality
python3 test_search_hpo_terms.py
# Test from different directory (path independence)
cd /tmp && python3 /path/to/phenotype_analysis_toolset/test_search_hpo_terms.py
Usage Examples
Basic Database Queries
# Get genes associated with developmental delay
result = get_genes_by_hpo("HP:0001263")
# Returns: {"hpo_id": "HP:0001263", "hpo_name": "Global developmental delay", "genes": [...]}
# Get HPO terms for a specific gene
result = get_hpo_by_gene("2260")
# Returns: {"ncbi_gene_id": "2260", "gene_symbol": "FGFR1", "hpo_terms": [...]}
Semantic Search
# Search for HPO terms matching English symptoms
result = search_hpo_for_symptom("developmental delay", k=5)
# Returns top 5 HPO candidates with similarity scores
# Example result:
{
"symptom": "developmental delay",
"candidates": [
{
"hpo_id": "http://purl.obolibrary.org/obo/HP_0001263",
"hpo_name": "Global developmental delay",
"similarity_score": 0.534,
"description": "..."
}
]
}
Clinical Text Analysis Workflows
Chinese Phenotype Analysis:
workflow = chinese_phenotype_analysis_workflow()
# Returns 4-step workflow for Chinese clinical text:
# 1. Extract symptoms with Chinese → English translation
# 2. Vector search for each symptom
# 3. LLM-based HPO term selection
# 4. Compile results with confidence scoring
English Phenotype Analysis:
workflow = english_phenotype_analysis_workflow()
# Returns streamlined 4-step workflow for English text:
# 1. Extract and standardize symptoms
# 2. Vector search for each symptom
# 3. Select best HPO matches
# 4. Compile results
System Status Monitoring
status = get_server_status()
# Returns comprehensive system health check:
{
"status": "healthy",
"database": {"exists": true, "size_mb": 45.2, "tables": {...}},
"embeddings": {"vector_store": "loaded successfully", "api_key_status": "configured"},
"server_info": {"framework": "FastMCP", "python_version": "3.12.7"}
}
Data Sources & Statistics
- HPO Annotations: 100,000+ phenotype-disease associations from phenotype.hpoa
- Gene-Disease Mappings: Comprehensive OMIM and database cross-references
- Phenotype-Gene Links: Curated associations with NCBI gene identifiers
- Vector Embeddings: 10,000+ HPO terms with VoyageAI semantic embeddings
- Supported Languages: English and Chinese medical terminology
Architecture Details
Vector Search Engine
- Model: VoyageAI Voyage-3 embeddings (1536 dimensions)
- Index: FAISS for high-performance similarity search
- Performance: Sub-second search across 10K+ HPO terms
- Accuracy: 0.5-0.6 similarity scores for clinically relevant matches
Database Schema
-- Core tables with relationship mappings
hpo_annotations -- HPO_ID → Disease mappings (100K+ rows)
genes_to_disease -- Gene → Disease relationships
genes_to_phenotype -- Gene → HPO bidirectional mappings
phenotype_to_genes -- HPO → Gene reverse mappings
Workflow Processing
- Chinese Pipeline: Extract → Translate → Standardize → Search → Select → Map
- English Pipeline: Extract → Standardize → Search → Select → Map
- Confidence Thresholds: 0.7+ for reliable clinical mappings
- Error Handling: Graceful degradation with detailed error reporting