arxiv-mcp-server

r-uben/arxiv-mcp-server

3.1

If you are the rightful owner of arxiv-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

A Model Context Protocol (MCP) server for searching and retrieving academic papers from arXiv.

Tools
5
Resources
0
Prompts
0

arXiv MCP Server

License: MIT Python 3.11+ MCP

I built this MCP server to access 2.4M+ arXiv papers directly in Claude Desktop. It uses GROBID for academic PDF extraction and builds citation networks to track research connections.

What It Does

  • Search arXiv by keywords, authors, categories, and dates
  • Extract full text from PDFs using GROBID (handles equations and references)
  • Build citation networks using Semantic Scholar integration
  • Manage a local library with collections and tags
  • Generate summaries and compare papers side-by-side

PDF Extraction

I implemented three extraction tiers that adapt to document complexity:

  • FAST: pdfplumber for simple documents (~1s)
  • SMART: GROBID for academic papers (~5s) - preserves equations and references
  • PREMIUM: Mistral OCR for complex layouts (~2s) - requires API key

🚀 Quick Start

Installation

Option 1: Install via npm (Recommended)
# Install globally
npm install -g arxiv-mcp-server

# Or install locally in a project
npm install arxiv-mcp-server
Option 2: Install from source
# Clone the repository
git clone https://github.com/r-uben/arxiv-mcp-server.git
cd arxiv-mcp-server

# Install dependencies with Poetry
poetry install

# Test the server
poetry run arxiv-mcp-server

Claude Desktop Integration

For npm installation:

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv": {
      "command": "npx",
      "args": ["arxiv-mcp-server"],
      "cwd": "/path/to/your/project"
    }
  }
}

Or for global installation:

{
  "mcpServers": {
    "arxiv": {
      "command": "arxiv-mcp-server"
    }
  }
}
For Poetry installation:
{
  "mcpServers": {
    "arxiv": {
      "command": "poetry",
      "args": ["run", "arxiv-mcp-server"],
      "cwd": "/path/to/arxiv-mcp-server"
    }
  }
}

Restart Claude Desktop and you're ready to go!

Examples

"Search for recent papers on large language models in the last 6 months"
"Find all papers by Geoffrey Hinton on deep learning"
"Build a citation network around paper 2301.00001"
"Save paper 2301.00001 to my 'Transformers' collection"
"Summarize the key findings from paper 2301.00001"

⚙️ Configuration

API Keys (Optional)

For enhanced features, set these environment variables:

# For premium PDF extraction (Mistral OCR)
export MISTRAL_API_KEY="your-mistral-api-key"

# For faster citation lookups (Semantic Scholar)
export SEMANTIC_SCHOLAR_API_KEY="your-semantic-scholar-api-key"

External Services (Optional)

GROBID Server - For enhanced academic paper processing:

docker run --rm -it --init -p 8070:8070 lfoppiano/grobid:0.8.0

Configuration Options

VariablePurposeDefault
MISTRAL_API_KEYPremium OCR extractionNone
SEMANTIC_SCHOLAR_API_KEYCitation discovery APINone
GROBID_SERVERGROBID server URLhttp://localhost:8070
FORCE_SMARTAlways use SMART tier for academic paperstrue

Available Tools

I've implemented 25 tools across four categories:

  • Search & Discovery: search papers, find by author, get recent papers, find similar papers
  • Library Management: save papers, manage collections, track reading status, search library
  • Citation Analysis: extract references, find citing papers, build citation networks
  • Content Analysis: extract PDFs, summarize papers, compare papers, extract key findings

How It Works

The server automatically:

  1. Analyzes PDF complexity and selects the best extraction method
  2. Caches papers locally to reduce API calls
  3. Respects rate limits (arXiv: 3 req/s, Semantic Scholar: 1-4 req/s)
  4. Falls back gracefully when services are unavailable

Development

# Development setup
poetry install
poetry run pytest                    # Run tests
poetry run black .                   # Format code  
poetry run ruff check .              # Lint code

# Testing individual components
poetry run python -m pytest tests/  # Full test suite
poetry run arxiv-mcp-server          # Start server manually

arXiv Categories

FieldPopular Categories
Computer Sciencecs.AI, cs.LG, cs.CV, cs.CL, cs.RO
Mathematicsmath.CO, math.NT, math.AG, math.ST
Physicsastro-ph, cond-mat, hep-ph, quant-ph
Biologyq-bio.BM, q-bio.CB, q-bio.GN

Complete arXiv taxonomy →

License

MIT License © 2025 Ruben Fernández-Fuertes