zotmcp

nicsuzor/zotmcp

3.2

If you are the rightful owner of zotmcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

ZotMCP is an MCP server designed for semantic search and literature review across a shared Zotero academic library.

Tools
6
Resources
0
Prompts
0

ZotMCP

MCP server for semantic search and literature review across a shared Zotero academic library.

Features

  • Semantic Search - Vector-based search across library items
  • Citation Retrieval - Get properly formatted academic citations
  • Similar Items - Find related works by similarity
  • Author Search - Find all works by specific authors
  • 7 Specialized Tools - Search, retrieve, and analyze academic literature

MCP Client Configuration

{
  "mcpServers": {
    "zotero": {
      "command": "docker",
      "args": ["run", "--rm", "-i", "us-central1-docker.pkg.dev/prosocial-443205/reg/zotmcp:latest"]
    }
  }
}

Available Tools

  1. search - Semantic search across library (primary tool)
  2. get_item - Retrieve full text and metadata by Zotero key
  3. get_similar_items - Find works similar to a given item
  4. search_by_author - Find all works by an author
  5. get_collection_info - Library statistics and metadata
  6. assisted_search - LLM-assisted literature review (experimental)

Distribution

The Docker image includes:

  • ✅ All dependencies (FastMCP, ChromaDB, Pydantic)
  • ✅ ChromaDB vectors (baked in, ~3GB)
  • ✅ No local Python setup required

Colleagues just need to:

  1. Install Docker Desktop
  2. Pull image: docker pull us-central1-docker.pkg.dev/prosocial-443205/reg/zotmcp:latest
  3. Configure MCP client (see above)
  4. Restart client

Architecture

ZotMCP is an MCP (Model Context Protocol) server that provides semantic search and literature review capabilities for a shared Zotero academic library.

  • Zotero Library: prosocial group library
  • ChromaDB Collection: prosocial_zot
  • Embedding Model: Google gemini-embedding-001 (3072 dimensions)

The ChromaDB is created by the Buttermilk vectorization pipeline.

  • Full text sourced from Zotero group first
  • Where full text is not available, we extract full text from source PDF
  • Each source is chunked using a semantic splitter into approximately 1000 tokens, with overlap of 250 tokens.
  • Citations are generated by a LLM based on the first page of full text in the format: Authors (Year). Title. Outlet