pubmed-search-mcp

u9401066/pubmed-search-mcp

3.3

If you are the rightful owner of pubmed-search-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

PubMed Search MCP is a Domain-Driven Design (DDD) based MCP server that acts as an intelligent research assistant for AI agents, providing task-oriented literature search and analysis capabilities.

Tools
7
Resources
0
Prompts
0

PubMed Search MCP

PyPI version Python 3.10+ License: Apache 2.0 MCP Test Coverage

Professional Literature Research Assistant for AI Agents - More than just an API wrapper

A Domain-Driven Design (DDD) based MCP server that serves as an intelligent research assistant for AI agents, providing task-oriented literature search and analysis capabilities.

✨ What's Included:

  • 🔧 40 MCP Tools - Streamlined PubMed, Europe PMC, CORE, NCBI database access, and Research Timeline
  • 📚 22 Claude Skills - Ready-to-use workflow guides for AI agents (Claude Code-specific)
  • 📖 Copilot Instructions - VS Code GitHub Copilot integration guide

🌐 Language: English |


🚀 Quick Install

Prerequisites

  • Python 3.10+Download
  • uv (recommended) — Install uv
    # macOS / Linux
    curl -LsSf https://astral.sh/uv/install.sh | sh
    # Windows
    powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
    
  • NCBI Email — Required by NCBI API policy. Any valid email address.
  • NCBI API Key (optional)Get one here for higher rate limits (10 req/s vs 3 req/s)

Install & Run

# Option 1: Zero-install with uvx (recommended for trying out)
uvx pubmed-search-mcp

# Option 2: Add as project dependency
uv add pubmed-search-mcp

# Option 3: pip install
pip install pubmed-search-mcp

⚙️ Configuration

This MCP server works with any MCP-compatible AI tool. Choose your preferred client:

VS Code / Cursor (.vscode/mcp.json)

{
  "servers": {
    "pubmed-search": {
      "type": "stdio",
      "command": "uvx",
      "args": ["pubmed-search-mcp"],
      "env": {
        "NCBI_EMAIL": "your@email.com"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json)

{
  "mcpServers": {
    "pubmed-search": {
      "command": "uvx",
      "args": ["pubmed-search-mcp"],
      "env": {
        "NCBI_EMAIL": "your@email.com"
      }
    }
  }
}

Config file location:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Claude Code

claude mcp add pubmed-search -- uvx pubmed-search-mcp

Or add to .mcp.json in your project root:

{
  "mcpServers": {
    "pubmed-search": {
      "command": "uvx",
      "args": ["pubmed-search-mcp"],
      "env": {
        "NCBI_EMAIL": "your@email.com"
      }
    }
  }
}

Zed AI (settings.json)

Zed editor (z.ai) supports MCP servers natively. Add to your Zed settings.json:

{
  "context_servers": {
    "pubmed-search": {
      "command": "uvx",
      "args": ["pubmed-search-mcp"],
      "env": {
        "NCBI_EMAIL": "your@email.com"
      }
    }
  }
}

Tip: Open Command Palette → zed: open settings to edit, or go to Agent Panel → Settings → "Add Custom Server".

OpenClaw 🦞 (~/.openclaw/openclaw.json)

OpenClaw uses MCP servers via the mcp-adapter plugin. Install the adapter first:

openclaw plugins install mcp-adapter

Then add to ~/.openclaw/openclaw.json:

{
  "plugins": {
    "entries": {
      "mcp-adapter": {
        "enabled": true,
        "config": {
          "servers": [
            {
              "name": "pubmed-search",
              "transport": "stdio",
              "command": "uvx",
              "args": ["pubmed-search-mcp"],
              "env": {
                "NCBI_EMAIL": "your@email.com"
              }
            }
          ]
        }
      }
    }
  }
}

Restart the gateway after configuration:

openclaw gateway restart
openclaw plugins list  # Should show: mcp-adapter | loaded

Cline (cline_mcp_settings.json)

{
  "mcpServers": {
    "pubmed-search": {
      "command": "uvx",
      "args": ["pubmed-search-mcp"],
      "env": {
        "NCBI_EMAIL": "your@email.com"
      },
      "alwaysAllow": [],
      "disabled": false
    }
  }
}

Other MCP Clients

Any MCP-compatible client can use this server via stdio transport:

# Command
uvx pubmed-search-mcp

# With environment variable
NCBI_EMAIL=your@email.com uvx pubmed-search-mcp

Note: NCBI_EMAIL is required by NCBI API policy. Optionally set NCBI_API_KEY for higher rate limits (10 req/s vs 3 req/s).

📖 Detailed Integration Guides: See for all environment variables, Copilot Studio setup, Docker deployment, proxy configuration, and troubleshooting.


🎯 Design Philosophy

Core Positioning: The intelligent middleware between AI Agents and academic search engines.

Why This Server?

Other tools give you raw API access. We give you vocabulary translation + intelligent routing + research analysis:

ChallengeOur Solution
Agent uses ICD codes, PubMed needs MeSHAuto ICD→MeSH conversion
Multiple databases, different APIsUnified Search single entry point
Clinical questions need structured searchPICO toolkit (parse_pico + generate_search_queries for Agent-driven workflow)
Typos in medical termsESpell auto-correction
Too many results from one sourceParallel multi-source with dedup
Need to trace research evolutionResearch Timeline & Tree with landmark detection and sub-topic branching
Citation context is unclearCitation Tree forward/backward/network
Can't access full textMulti-source fulltext (Europe PMC, CORE, CrossRef)
Gene/drug info scattered across DBsNCBI Extended (Gene, PubChem, ClinVar)
Need cutting-edge preprintsPreprint search (arXiv, medRxiv, bioRxiv) with peer-review filtering
Export to reference managersOne-click export (RIS, BibTeX, CSV, MEDLINE)

Key Differentiators

  1. Vocabulary Translation Layer - Agent speaks naturally, we translate to each database's terminology (MeSH, ICD-10, text-mined entities)
  2. Unified Search Gateway - One unified_search() call, auto-dispatch to PubMed/Europe PMC/CORE/OpenAlex
  3. PICO Toolkit - parse_pico() decomposes clinical questions into P/I/C/O elements; Agent then calls generate_search_queries() per element and builds Boolean query
  4. Research Timeline & Lineage Tree - Detect milestones (FDA approvals, Phase 3, guidelines), identify landmark papers via multi-signal scoring, and visualize research evolution as branching trees by sub-topic
  5. Citation Network Analysis - Build multi-level citation trees to map an entire research landscape from a single paper
  6. Full Research Lifecycle - From search → discovery → full text → analysis → export, all in one server
  7. Agent-First Design - Output optimized for machine decision-making, not human reading

📡 External APIs & Data Sources

This MCP server integrates with multiple academic databases and APIs:

Core Data Sources

SourceCoverageVocabularyAuto-ConvertDescription
NCBI PubMed36M+ articlesMeSH✅ NativePrimary biomedical literature
NCBI EntrezMulti-DBMeSH✅ NativeGene, PubChem, ClinVar
Europe PMC33M+Text-mined✅ ExtractionFull text XML access
CORE200M+None➡️ Free-textOpen access aggregator
Semantic Scholar200M+S2 Fields➡️ Free-textAI-powered recommendations
OpenAlex250M+Concepts➡️ Free-textOpen scholarly metadata
NIH iCitePubMedN/AN/ACitation metrics (RCR)

🔑 Key: ✅ = Full vocabulary support | ➡️ = Query pass-through (no controlled vocabulary)

ICD Codes: Auto-detected and converted to MeSH before PubMed search

Environment Variables

# Required
NCBI_EMAIL=your@email.com          # Required by NCBI policy

# Optional - For higher rate limits
NCBI_API_KEY=your_ncbi_api_key     # Get from: https://www.ncbi.nlm.nih.gov/account/settings/
CORE_API_KEY=your_core_api_key     # Get from: https://core.ac.uk/services/api
S2_API_KEY=your_s2_api_key         # Get from: https://www.semanticscholar.org/product/api

# Optional - Network settings
HTTP_PROXY=http://proxy:8080       # HTTP proxy for API requests
HTTPS_PROXY=https://proxy:8080     # HTTPS proxy for API requests

🔄 How It Works: The Middleware Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              AI AGENT                                        │
│                                                                              │
│   "Find papers about I10 hypertension treatment in diabetic patients"       │
│                                                                              │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     🔄 PUBMED SEARCH MCP (MIDDLEWARE)                        │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │  1️⃣ VOCABULARY TRANSLATION                                              ││
│  │     • ICD-10 "I10" → MeSH "Hypertension"                                ││
│  │     • "diabetic" → MeSH "Diabetes Mellitus"                             ││
│  │     • ESpell: "hypertention" → "hypertension"                           ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │  2️⃣ INTELLIGENT ROUTING                                                 ││
│  │     ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐             ││
│  │     │ PubMed   │  │Europe PMC│  │   CORE   │  │ OpenAlex │             ││
│  │     │  36M+    │  │   33M+   │  │  200M+   │  │  250M+   │             ││
│  │     │  (MeSH)  │  │(fulltext)│  │  (OA)    │  │(metadata)│             ││
│  │     └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘             ││
│  │          └──────────────┴──────────────┴──────────────┘                 ││
│  │                              ▼                                          ││
│  │  3️⃣ RESULT AGGREGATION: Dedupe + Rank + Enrich                         ││
│  └─────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         UNIFIED RESULTS                                      │
│   • 150 unique papers (deduplicated from 4 sources)                          │
│   • Ranked by relevance + citation impact (RCR)                              │
│   • Full text links enriched from Europe PMC                                 │
└─────────────────────────────────────────────────────────────────────────────┘

🛠️ MCP Tools Overview

🔍 Search & Query Intelligence

┌─────────────────────────────────────────────────────────────────┐
│                      SEARCH ENTRY POINT                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   unified_search()          ← 🌟 Single entry for all sources    │
│        │                                                         │
│        ├── Quick search     → Direct multi-source query          │
│        ├── PICO hints       → Detects comparison, shows P/I/C/O  │
│        └── ICD expansion    → Auto ICD→MeSH conversion           │
│                                                                  │
│   Sources: PubMed · Europe PMC · CORE · OpenAlex                 │
│   Auto: Deduplicate → Rank → Enrich full-text links              │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│   QUERY INTELLIGENCE                                             │
│                                                                  │
│   generate_search_queries() → MeSH expansion + synonym discovery │
│   parse_pico()              → PICO element decomposition         │
│   analyze_search_query()    → Query analysis without execution   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

🔬 Discovery Tools (After Finding Key Papers)

                        Found important paper (PMID)
                                   │
           ┌───────────────────────┼───────────────────────┐
           │                       │                       │
           ▼                       ▼                       ▼
    ┌─────────────┐        ┌─────────────┐        ┌─────────────┐
    │  BACKWARD   │        │  SIMILAR    │        │  FORWARD    │
    │  ◀──────    │        │  ≈≈≈≈≈≈     │        │  ──────▶    │
    │             │        │             │        │             │
    │ get_article │        │find_related │        │find_citing  │
    │ _references │        │ _articles   │        │ _articles   │
    │             │        │             │        │             │
    │ Foundation  │        │  Similar    │        │ Follow-up   │
    │  papers     │        │   topic     │        │  research   │
    └─────────────┘        └─────────────┘        └─────────────┘

    fetch_article_details()   → Detailed article metadata
    get_citation_metrics()    → iCite RCR, citation percentile
    build_citation_tree()     → Full network visualization (6 formats)

📚 Full Text & Export

CategoryTools
Full Textget_fulltext → Multi-source retrieval (Europe PMC, CORE, PubMed, CrossRef)
Text Miningget_text_mined_terms → Extract genes, diseases, chemicals
Exportprepare_export → RIS, BibTeX, CSV, MEDLINE, JSON

🧬 NCBI Extended Databases

ToolDescription
search_geneSearch NCBI Gene database
get_gene_detailsGene details by NCBI Gene ID
get_gene_literaturePubMed articles linked to a gene
search_compoundSearch PubChem compounds
get_compound_detailsCompound details by PubChem CID
get_compound_literaturePubMed articles linked to a compound
search_clinvarSearch ClinVar clinical variants

🕰️ Research Timeline & Lineage Tree

ToolDescription
build_research_timelineBuild timeline/tree with landmark detection. Output: text, tree, mermaid, mindmap, json
analyze_timeline_milestonesAnalyze milestone distribution
compare_timelinesCompare multiple topic timelines

🏥 Institutional Access & ICD Conversion

ToolDescription
configure_institutional_accessConfigure institution's link resolver
get_institutional_linkGenerate OpenURL access link
list_resolver_presetsList resolver presets
test_institutional_accessTest resolver configuration
convert_icd_meshConvert between ICD codes and MeSH terms (bidirectional)
search_by_icdSearch PubMed using ICD code (auto-converts to MeSH)

💾 Session Management

ToolDescription
get_session_pmidsRetrieve cached PMID lists
get_cached_articleGet article from session cache (no API cost)
get_session_summarySession status overview

� Pipeline Management

ToolDescription
save_pipelineSave a pipeline config for later reuse (YAML/JSON, auto-validated)
list_pipelinesList saved pipelines (filter by tag/scope)
load_pipelineLoad pipeline from name or file for review/editing
delete_pipelineDelete pipeline and its execution history
get_pipeline_historyView execution history with article diff analysis
schedule_pipelineSchedule periodic execution (Phase 4)

�👁️ Vision & Image Search

ToolDescription
analyze_figure_for_searchAnalyze scientific figure for search
search_biomedical_imagesSearch biomedical images across Open-i (X-ray, microscopy, photos, diagrams)

📄 Preprint Search

Search arXiv, medRxiv, and bioRxiv preprint servers via unified_search:

ParameterDefaultDescription
include_preprintsFalseEnable preprint search (arXiv, medRxiv, bioRxiv). Results shown in a separate section
peer_reviewed_onlyTrueFilter out preprints from main results (OpenAlex, CrossRef, Semantic Scholar may return preprints)

How they work together:

include_preprintspeer_reviewed_onlyBehavior
False (default)True (default)No preprints — standard peer-reviewed results only
TrueTruePreprints in separate section + main results peer-reviewed only
TrueFalsePreprints everywhere — separate section + mixed into main results
FalseFalseNo dedicated preprint search, but preprints from other sources kept in results

Preprint detection — articles are identified as preprints by:

  • Article type from source API (OpenAlex, CrossRef, Semantic Scholar)
  • arXiv ID present without PubMed ID
  • Known preprint server source or journal name
  • DOI prefix matching preprint servers (e.g., 10.1101/ → bioRxiv/medRxiv, 10.48550/ → arXiv)

📋 Agent Usage Examples

1️⃣ Quick Search (Simplest)

# Agent just asks naturally - middleware handles everything
unified_search(query="remimazolam ICU sedation", limit=20)

# Or with clinical codes - auto-converted to MeSH
unified_search(query="I10 treatment in E11.9 patients")
#                     ↑ ICD-10           ↑ ICD-10
#                     Hypertension       Type 2 Diabetes

2️⃣ PICO Clinical Question

Simple pathunified_search can search directly (no PICO decomposition):

# unified_search searches as-is; detects "A vs B" pattern and shows PICO hints in metadata
unified_search(query="Is remimazolam better than propofol for ICU sedation?")
# → Multi-source keyword search + PICO hint metadata in output
# ⚠️ This does NOT auto-decompose PICO or expand MeSH!
# For structured PICO search, use the Agent workflow below

Agent workflow — PICO decomposition + MeSH expansion (recommended for clinical questions):

┌─────────────────────────────────────────────────────────────────────────┐
│  "Is remimazolam better than propofol for ICU sedation?"                │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         parse_pico()                                     │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐                     │
│  │    P    │  │    I    │  │    C    │  │    O    │                     │
│  │  ICU    │  │remimaz- │  │propofol │  │sedation │                     │
│  │patients │  │  olam   │  │         │  │outcomes │                     │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘                     │
└───────┼────────────┼────────────┼────────────┼──────────────────────────┘
        │            │            │            │
        ▼            ▼            ▼            ▼
┌─────────────────────────────────────────────────────────────────────────┐
│              generate_search_queries() × 4 (parallel)                    │
│                                                                          │
│  P → "Intensive Care Units"[MeSH]                                        │
│  I → "remimazolam" [Supplementary Concept], "CNS 7056"                   │
│  C → "Propofol"[MeSH], "Diprivan"                                        │
│  O → "Conscious Sedation"[MeSH], "Deep Sedation"[MeSH]                   │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│              Agent combines with Boolean logic                           │
│                                                                          │
│  (P) AND (I) AND (C) AND (O)  ← High precision                           │
│  (P) AND (I OR C) AND (O)     ← High recall                              │
└─────────────────────────────────┬───────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────┐
│              unified_search() (auto multi-source + dedup)                │
│                                                                          │
│  PubMed + Europe PMC + CORE + OpenAlex → Auto deduplicate & rank         │
└─────────────────────────────────────────────────────────────────────────┘
# Step 1: Parse clinical question
parse_pico("Is remimazolam better than propofol for ICU sedation?")
# Returns: P=ICU patients, I=remimazolam, C=propofol, O=sedation outcomes

# Step 2: Get MeSH for each element (parallel!)
generate_search_queries(topic="ICU patients")   # P
generate_search_queries(topic="remimazolam")    # I
generate_search_queries(topic="propofol")       # C
generate_search_queries(topic="sedation")       # O

# Step 3: Agent combines with Boolean
query = '("Intensive Care Units"[MeSH]) AND (remimazolam OR "CNS 7056") AND propofol AND sedation'

# Step 4: Search (auto multi-source, dedup, rank)
unified_search(query=query)

3️⃣ Explore from Key Paper

# Found landmark paper PMID: 33475315
find_related_articles(pmid="33475315")   # Similar methodology
find_citing_articles(pmid="33475315")    # Who built on this?
get_article_references(pmid="33475315")  # What's the foundation?

# Build complete research map
build_citation_tree(pmid="33475315", depth=2, output_format="mermaid")

4️⃣ Gene/Drug Research

# Research a gene
search_gene(query="BRCA1", organism="human")
get_gene_literature(gene_id="672", limit=20)

# Research a drug compound
search_compound(query="propofol")
get_compound_literature(cid="4943", limit=20)

5️⃣ Export Results

# Export last search results
prepare_export(pmids="last", format="ris")      # → EndNote/Zotero
prepare_export(pmids="last", format="bibtex")   # → LaTeX

# Check open access availability
analyze_fulltext_access(pmids="last")

6️⃣ Preprint Search

# Include preprints alongside peer-reviewed results
unified_search("COVID-19 vaccine efficacy", include_preprints=True)
# → Main results (peer-reviewed) + Separate preprint section (arXiv, medRxiv, bioRxiv)

# Include preprints mixed into main results
unified_search("CRISPR gene therapy", include_preprints=True, peer_reviewed_only=False)
# → All results mixed together, preprints marked as such

# Only peer-reviewed (default behavior)
unified_search("diabetes treatment")
# → Preprints from any source automatically filtered out

7️⃣ Pipeline (Reusable Search Plans)

# Save a template-based pipeline
save_pipeline(
    name="icu_sedation_weekly",
    config="template: pico\nparams:\n  P: ICU patients\n  I: remimazolam\n  C: propofol\n  O: delirium",
    tags="anesthesia,sedation",
    description="Weekly ICU sedation monitoring"
)

# Save a custom DAG pipeline
save_pipeline(
    name="brca1_comprehensive",
    config="""
steps:
  - id: expand
    action: expand
    params: { topic: BRCA1 breast cancer }
  - id: pubmed
    action: search
    params: { query: BRCA1, sources: pubmed, limit: 50 }
  - id: expanded
    action: search
    inputs: [expand]
    params: { strategy: mesh, sources: pubmed,openalex, limit: 50 }
  - id: merged
    action: merge
    inputs: [pubmed, expanded]
    params: { method: rrf }
  - id: enriched
    action: metrics
    inputs: [merged]
output:
  limit: 30
  ranking: quality
"""
)

# Execute a saved pipeline
unified_search(pipeline="saved:icu_sedation_weekly")

# List & manage
list_pipelines(tag="anesthesia")
load_pipeline(source="brca1_comprehensive")  # Review YAML
get_pipeline_history(name="icu_sedation_weekly")  # View past runs

🔍 Search Mode Comparison

┌─────────────────────────────────────────────────────────────────────────┐
│                        SEARCH MODE DECISION TREE                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   "What kind of search do I need?"                                       │
│         │                                                                │
│         ├── Know exactly what to search?                                 │
│         │   └── unified_search(query="topic keywords")                   │
│         │       → Quick, auto-routing to best sources                    │
│         │                                                                │
│         ├── Have a clinical question (A vs B)?                           │
│         │   └── parse_pico() → generate_search_queries() × N             │
│         │       → Agent builds Boolean → unified_search()                │
│         │                                                                │
│         ├── Need comprehensive systematic coverage?                      │
│         │   └── generate_search_queries() → parallel search              │
│         │       → MeSH expansion, multiple strategies, merge             │
│         │                                                                │
│         └── Exploring from a key paper?                                  │
│             └── find_related/citing/references → build_citation_tree     │
│                 → Citation network, research context                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
ModeEntry PointBest ForAuto-Features
Quickunified_search()Fast topic searchICD→MeSH, multi-source, dedup
PICOparse_pico() → AgentClinical questionsAgent: decompose → MeSH expand → Boolean
Systematicgenerate_search_queries()Literature reviewsMeSH expansion, synonyms
Explorationfind_*_articles()From key paperCitation network, related

🤖 Claude Skills (AI Agent Workflows)

Pre-built workflow guides in .claude/skills/, divided into Usage Skills (for using the MCP server) and Development Skills (for maintaining the project):

📚 Usage Skills (9) — For AI Agents Using This MCP Server

SkillDescription
pubmed-quick-searchBasic search with filters
pubmed-systematic-searchMeSH expansion, comprehensive
pubmed-pico-searchClinical question decomposition
pubmed-paper-explorationCitation tree, related articles
pubmed-gene-drug-researchGene/PubChem/ClinVar
pubmed-fulltext-accessEurope PMC, CORE full text
pubmed-export-citationsRIS/BibTeX/CSV export
pubmed-multi-source-searchCross-database unified search
pubmed-mcp-tools-referenceComplete tool reference guide
pipeline-persistenceSave, load, reuse search plans

🔧 Development Skills (13) — For Project Contributors

SkillDescription
changelog-updaterAuto-update CHANGELOG.md
code-refactorDDD architecture refactoring
code-reviewerCode quality & security review
ddd-architectDDD scaffold for new features
git-doc-updaterSync docs before commits
git-precommitPre-commit workflow orchestration
memory-checkpointSave context to Memory Bank
memory-updaterUpdate Memory Bank files
project-initInitialize new projects
readme-i18nMultilingual README sync
readme-updaterSync README with code changes
roadmap-updaterUpdate ROADMAP.md status
test-generatorGenerate test suites

📁 Location: .claude/skills/*/SKILL.md (Claude Code-specific)


🏗️ Architecture (DDD)

This project uses Domain-Driven Design (DDD) architecture, with literature research domain knowledge as the core model.

src/pubmed_search/
├── domain/                     # Core business logic
│   └── entities/article.py     # UnifiedArticle, Author, etc.
├── application/                # Use cases
│   ├── search/                 # QueryAnalyzer, ResultAggregator
│   ├── export/                 # Citation export (RIS, BibTeX...)
│   └── session/                # SessionManager
├── infrastructure/             # External systems
│   ├── ncbi/                   # Entrez, iCite, Citation Exporter
│   ├── sources/                # Europe PMC, CORE, CrossRef...
│   └── http/                   # HTTP clients
├── presentation/               # User interfaces
│   ├── mcp_server/             # MCP tools, prompts, resources
│   │   └── tools/              # discovery, strategy, pico, export...
│   └── api/                    # REST API (Copilot Studio)
└── shared/                     # Cross-cutting concerns
    ├── exceptions.py           # Unified error handling
    └── async_utils.py          # Rate limiter, retry, circuit breaker

Internal Mechanisms (Transparent to Agent)

MechanismDescription
SessionAuto-create, auto-switch
CacheAuto-cache search results, avoid duplicate API calls
Rate LimitAuto-comply with NCBI API limits (0.34s/0.1s)
MeSH Lookupgenerate_search_queries() auto-queries NCBI MeSH database
ESpellAuto spelling correction (remifentanylremifentanil)
Query AnalysisEach suggested query shows how PubMed actually interprets it

Vocabulary Translation Layer (Key Feature)

Our Core Value: We are the intelligent middleware between Agent and Search Engines, automatically handling vocabulary standardization so Agent doesn't need to know each database's terminology.

Different data sources use different controlled vocabulary systems. This server provides automatic conversion:

API / DatabaseVocabulary SystemAuto-Conversion
PubMed / NCBIMeSH (Medical Subject Headings)✅ Full support via expand_with_mesh()
ICD CodesICD-10-CM / ICD-9-CM✅ Auto-detect & convert to MeSH
Europe PMCText-mined entities (Gene, Disease, Chemical)get_text_mined_terms() extraction
OpenAlexOpenAlex Concepts (deprecated)❌ Free-text only
Semantic ScholarS2 Field of Study❌ Free-text only
CORENone❌ Free-text only
CrossRefNone❌ Free-text only
Automatic ICD → MeSH Conversion

When searching with ICD codes (e.g., I10 for Hypertension), unified_search() automatically:

  1. Detects ICD-10/ICD-9 patterns via detect_and_expand_icd_codes()
  2. Looks up corresponding MeSH terms from internal mapping (ICD10_TO_MESH, ICD9_TO_MESH)
  3. Expands query with MeSH synonyms for comprehensive search
# Agent calls unified_search with clinical terminology
unified_search(query="I10 treatment outcomes")

# Server auto-expands to PubMed-compatible query
"(I10 OR Hypertension[MeSH]) treatment outcomes"

📖 Full architecture documentation:

MeSH Auto-Expansion + Query Analysis

When calling generate_search_queries("remimazolam sedation"), internally it:

  1. ESpell Correction - Fix spelling errors
  2. MeSH Query - Entrez.esearch(db="mesh") to get standard vocabulary
  3. Synonym Extraction - Get synonyms from MeSH Entry Terms
  4. Query Analysis - Analyze how PubMed interprets each query
{
  "mesh_terms": [
    {
      "input": "remimazolam",
      "preferred": "remimazolam [Supplementary Concept]",
      "synonyms": ["CNS 7056", "ONO 2745"]
    }
  ],
  "all_synonyms": ["CNS 7056", "ONO 2745", ...],
  "suggested_queries": [
    {
      "id": "q1_title",
      "query": "(remimazolam sedation)[Title]",
      "purpose": "Exact title match - highest precision",
      "estimated_count": 8,
      "pubmed_translation": "\"remimazolam sedation\"[Title]"
    },
    {
      "id": "q3_and",
      "query": "(remimazolam AND sedation)",
      "purpose": "All keywords required",
      "estimated_count": 561,
      "pubmed_translation": "(\"remimazolam\"[Supplementary Concept] OR \"remimazolam\"[All Fields]) AND (\"sedate\"[All Fields] OR ...)"
    }
  ]
}

Value of Query Analysis: Agent thinks remimazolam AND sedation only searches these two words, but PubMed actually expands to Supplementary Concept + synonyms, results go from 8 to 561. This helps Agent understand the difference between intent and actual search.


🔒 HTTPS Deployment

Enable HTTPS secure communication for production environments.

Quick Start

# Step 1: Generate SSL certificates
./scripts/generate-ssl-certs.sh

# Step 2: Start HTTPS service (Docker)
./scripts/start-https-docker.sh up

# Verify deployment
curl -k https://localhost/

HTTPS Endpoints

ServiceURLDescription
MCP SSEhttps://localhost/sseSSE connection (MCP)
Messageshttps://localhost/messagesMCP POST
Healthhttps://localhost/healthHealth check

Claude Desktop Configuration

{
  "mcpServers": {
    "pubmed-search": {
      "url": "https://localhost/sse"
    }
  }
}

🏢 Microsoft Copilot Studio Integration

Integrate PubMed Search MCP with Microsoft 365 Copilot (Word, Teams, Outlook)!

Quick Start

# Start with Streamable HTTP transport (required by Copilot Studio)
python run_server.py --transport streamable-http --port 8765

# Or use the dedicated script with ngrok
./scripts/start-copilot-studio.sh --with-ngrok

Copilot Studio Configuration

FieldValue
Server namePubMed Search
Server URLhttps://your-server.com/mcp
AuthenticationNone (or API Key)

📖 Full documentation:

⚠️ Note: SSE transport deprecated since Aug 2025. Use streamable-http.


📖 More documentation:

  • Architecture →
  • Deployment guide →
  • Copilot Studio →

🔐 Security

Security Features

LayerFeatureDescription
HTTPSTLS 1.2/1.3 encryptionAll traffic encrypted via Nginx
Rate Limiting30 req/sNginx level protection
Security HeadersXSS/CSRF protectionX-Frame-Options, X-Content-Type-Options
SSE Optimization24h timeoutLong-lived connections for real-time
No DatabaseStatelessNo SQL injection risk
No SecretsIn-memory onlyNo credentials stored

See for detailed deployment instructions.


📤 Export Formats

Export your search results in formats compatible with major reference managers:

FormatCompatible WithUse Case
RISEndNote, Zotero, MendeleyUniversal import
BibTeXLaTeX, Overleaf, JabRefAcademic writing
CSVExcel, Google SheetsData analysis
MEDLINEPubMed native formatArchiving
JSONProgrammatic accessCustom processing

Exported Fields

  • Core: PMID, Title, Authors, Journal, Year, Volume, Issue, Pages
  • Identifiers: DOI, PMC ID, ISSN
  • Content: Abstract (HTML tags cleaned)
  • Metadata: Language, Publication Type, Keywords
  • Access: DOI URL, PMC URL, Full-text availability

Special Character Handling

  • BibTeX exports use pylatexenc for proper LaTeX encoding
  • Nordic characters (ø, æ, å), umlauts (ü, ö, ä), and accents are correctly converted
  • Example: Søren HansenS{\o}ren Hansen

📄 License

Apache License 2.0 - see


🔗 Links