MaksPyn/arxiv-mcp-server
If you are the rightful owner of arxiv-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol (MCP) server that provides access to the arXiv API for searching and retrieving scientific papers.
arXiv MCP Server
An enhanced Model Context Protocol (MCP) server that provides comprehensive access to arXiv for searching, downloading, reading, and analyzing scientific papers with AI-powered research prompts.
Overview
This MCP server enables AI assistants to not only search and retrieve scientific papers from arXiv but also download them locally, extract content, and generate research-oriented analysis prompts. Perfect for researchers, students, and anyone working with academic literature.
Features
Core Features
- Search papers by title, author, abstract, category, or any combination
- Retrieve papers by their arXiv ID
- Get recent papers in specific categories
- Find all papers by a specific author
- Rate limiting compliance with arXiv API guidelines
Enhanced Features (v1.1.0)
- š„ Paper Download & Storage: Download and store papers locally with metadata tracking
- š PDF Content Reading: Extract and read text content from downloaded papers
- š In-Paper Search: Search for specific terms within downloaded papers
- š¤ AI Research Prompts: Pre-configured analysis prompts for paper summarization, methodology analysis, and more
- š Storage Management: Track storage usage and manage downloaded papers
- š§ Enhanced Error Handling: Robust error handling with retry logic
- š Structured Logging: Winston-based logging with configurable levels
Installation
Prerequisites
- Node.js (v16 or higher)
- npm or yarn
Option 1: Install from npm
npm install -g @arxiv/mcp-server
Then run:
arxiv-mcp-server
Option 2: Install from Source
- Clone this repository:
git clone https://github.com/MaksPyn/arxiv-mcp-server.git
cd arxiv-mcp-server
- Install dependencies:
npm install
- Build the TypeScript code:
npm run build
Option 3: Docker
- Using Docker Compose:
cd docker
docker-compose up -d
- Using Docker directly:
docker build -t arxiv-mcp-server -f docker/Dockerfile .
docker run -it --name arxiv-mcp-server arxiv-mcp-server
Configuration
Copy the example environment file and customize as needed:
cp config/.env.example .env
Available configuration options:
LOG_LEVEL
: Logging level (debug, info, warn, error)STORAGE_DIR
: Directory for storing downloaded papersMAX_STORAGE_SIZE_MB
: Maximum storage size for papersRATE_LIMIT_DELAY_MS
: Delay between API requests (default: 3000ms)
Usage
Running the Server
Start the server using:
npm start
Or for development (build + run):
npm run dev
Integrating with Claude Desktop
Add the following to your Claude Desktop configuration file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/claude/claude_desktop_config.json
{
"mcpServers": {
"arxiv": {
"command": "node",
"args": ["C:/path/to/arxiv-server/build/index.js"]
}
}
}
Replace C:/path/to/arxiv-server
with the actual path to your arxiv-server directory.
Available Tools
1. search_papers
Search for papers using various criteria.
Parameters:
query
(string, optional): Raw search querytitle
(string, optional): Search in paper titlesauthor
(string, optional): Search by author nameabstract
(string, optional): Search in abstractscategory
(string, optional): Filter by category (e.g., "cs.AI", "math.CO")all
(string, optional): Search in all fieldsstart
(number, optional): Starting index (default: 0)maxResults
(number, optional): Maximum results to return, max 2000 (default: 10)sortBy
(string, optional): Sort by "relevance", "lastUpdatedDate", or "submittedDate" (default: "relevance")sortOrder
(string, optional): "ascending" or "descending" (default: "descending")
Example:
{
"title": "neural networks",
"category": "cs.AI",
"maxResults": 5
}
2. get_paper_by_id
Retrieve specific papers by their arXiv IDs.
Parameters:
ids
(array of strings, required): Array of arXiv IDs
Example:
{
"ids": ["2301.00001", "2312.12345"]
}
3. get_recent_papers
Get the most recent papers in a specific category.
Parameters:
category
(string, required): Category to filter by (e.g., "cs.AI", "physics.quant-ph")maxResults
(number, optional): Maximum results (default: 10)
Example:
{
"category": "cs.LG",
"maxResults": 20
}
4. search_author
Find all papers by a specific author.
Parameters:
author
(string, required): Author name to search formaxResults
(number, optional): Maximum results (default: 20)sortBy
(string, optional): "submittedDate" or "lastUpdatedDate" (default: "submittedDate")
Example:
{
"author": "Yann LeCun",
"maxResults": 10
}
5. download_paper
Download a paper PDF and store it locally.
Parameters:
arxivId
(string, required): arXiv ID of the paper to download
Example:
{
"arxivId": "2301.00001"
}
6. list_downloaded_papers
List all locally downloaded papers with metadata.
Parameters: None
7. delete_paper
Delete a downloaded paper from local storage.
Parameters:
arxivId
(string, required): arXiv ID of the paper to delete
8. get_storage_stats
Get storage statistics for downloaded papers.
Parameters: None
Returns:
- Total number of papers
- Total storage size
- Formatted size string
9. read_paper_content
Read and extract text content from a downloaded paper.
Parameters:
arxivId
(string, required): arXiv ID of the paper to read
Returns:
- Extracted sections (abstract, introduction, methodology, results, discussion, conclusion, references)
- Full text length
10. search_in_paper
Search for text within a downloaded paper.
Parameters:
arxivId
(string, required): arXiv ID of the paper to search insearchTerm
(string, required): Text to search forcaseSensitive
(boolean, optional): Whether the search should be case sensitive (default: false)
Returns:
- Total matches found
- Matches organized by section with context
11. get_analysis_prompts
Get available research analysis prompts.
Parameters: None
Returns:
- List of available prompts with IDs, names, descriptions, and required variables
12. analyze_paper
Generate an analysis prompt for a paper.
Parameters:
arxivId
(string, required): arXiv ID of the paper to analyzepromptId
(string, required): ID of the analysis prompt to use
Available Prompt IDs:
summary
: Comprehensive paper summarykey_findings
: Extract key findings and resultsmethodology_analysis
: Analyze research methodologyliterature_review
: Create literature review entryresearch_gaps
: Identify research gaps and future directionstechnical_deep_dive
: Technical analysis with algorithms and implementation detailscomparison
: Compare with related workpractical_applications
: Identify real-world applications
Example:
{
"arxivId": "2301.00001",
"promptId": "summary"
}
arXiv Categories
Common arXiv category codes include:
Computer Science
cs.AI
- Artificial Intelligencecs.LG
- Machine Learningcs.CV
- Computer Vision and Pattern Recognitioncs.CL
- Computation and Languagecs.NE
- Neural and Evolutionary Computing
Physics
physics.gen-ph
- General Physicsphysics.optics
- Opticsquant-ph
- Quantum Physics
Mathematics
math.CO
- Combinatoricsmath.PR
- Probabilitymath.ST
- Statistics Theory
For a complete list, visit arXiv Category Taxonomy.
Response Format
All tools return structured JSON responses containing:
- Paper IDs
- Titles
- Authors (with affiliations when available)
- Publication/update dates
- Categories
- Summaries (truncated for search results, full for specific paper retrieval)
- Direct PDF links
- Additional metadata (DOI, journal references, comments)
Rate Limiting
This server respects arXiv's API usage guidelines by implementing a 3-second delay between requests. This is handled automatically and ensures compliance with arXiv's terms of service.
Error Handling
The server includes comprehensive error handling for:
- Network errors
- Invalid parameters
- Rate limiting
- arXiv API errors
Errors are returned in a structured format that can be easily parsed by the MCP client.
Development
Project Structure
arxiv-server/
āāā src/
ā āāā index.ts # Main server implementation
ā āāā types/ # TypeScript type definitions
ā āāā utils/ # Utility functions (logger, errors, file operations)
ā āāā storage/ # Storage management for downloaded papers
ā āāā prompts/ # Research prompt management
āāā build/ # Compiled JavaScript files
āāā storage/ # Downloaded papers storage
āāā prompts/ # Prompt templates
āāā logs/ # Application logs
āāā config/ # Configuration files
āāā docker/ # Docker configuration
ā āāā Dockerfile
ā āāā docker-compose.yml
āāā package.json # Project dependencies
āāā tsconfig.json # TypeScript configuration
āāā .npmignore # NPM publish ignore rules
āāā README.md # This file
Building from Source
npm run build
Running in Development Mode
npm run dev
License
MIT License - see package.json for details.
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Acknowledgments
This server uses the arXiv API to access scientific papers. Please respect arXiv's terms of service and usage guidelines when using this server.