LiamConnell/arxiv_for_agents
If you are the rightful owner of arxiv_for_agents and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The arXiv CLI & MCP Server is a Python toolkit designed for searching and downloading papers from arXiv.org, featuring both a command-line interface and a Model Context Protocol (MCP) server for integration with LLM assistants.
arXiv CLI & MCP Server
A Python toolkit for searching and downloading papers from arXiv.org, with both a command-line interface and a Model Context Protocol (MCP) server for LLM integration.
CLI agents work well with well-documented CLI tools and/or MCP servers. This project provides both options.
Features
- Search arXiv papers by title, author, abstract, category, and more
- Download PDFs automatically with local caching
- MCP Server for integration with LLM assistants (Claude Desktop, etc.)
- Typed responses using Pydantic models for clean data handling
- Rate limiting built-in to respect arXiv API guidelines
- Comprehensive tests with 26 integration tests (no mocking)
Installation
Option 1: Install from GitHub (Recommended)
Install directly from the GitHub repository:
# Install the latest version
uv pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Or with pip
pip install git+https://github.com/LiamConnell/arxiv_for_agents.git
# Now you can use the arxiv command
arxiv --help
Option 2: Install from Source
Clone the repository and install locally:
# Clone the repository
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
# Install in editable mode
uv pip install -e .
# Now you can use the arxiv command
arxiv --help
Option 3: Development Installation
For development with all dependencies:
# Clone and install with dev dependencies
git clone https://github.com/LiamConnell/arxiv_for_agents.git
cd arxiv_for_agents
uv pip install -e ".[dev]"
# Run tests
uv run pytest
Verify Installation
# If installed as package
arxiv --help
# Or if using as module
uv run python -m arxiv --help
Usage
Note: If you installed as a package, use arxiv directly. Otherwise, use uv run python -m arxiv.
Search Papers
Search by title:
# Using installed package
arxiv search "ti:attention is all you need"
# Or using as module
uv run python -m arxiv search "ti:attention is all you need"
Search by author:
arxiv search "au:Hinton" --max-results 20
Search by category:
arxiv search "cat:cs.AI" --max-results 10
Combined search:
arxiv search "ti:transformer AND au:Vaswani"
Get Specific Paper
Get paper metadata and download PDF:
arxiv get 1706.03762
Get metadata only (no download):
arxiv get 1706.03762 --no-download
Force re-download:
arxiv get 1706.03762 --force
Download PDF
Download just the PDF:
arxiv download 1706.03762
List Downloaded PDFs
arxiv list-downloads
JSON Output
Get results as JSON for scripting:
arxiv search "ti:neural" --json
arxiv get 1706.03762 --json --no-download
Search Query Syntax
The arXiv API supports field-specific searches:
ti:- Titleau:- Authorabs:- Abstractcat:- Category (e.g., cs.AI, cs.LG)all:- All fields (default)
You can combine searches with AND, OR, and ANDNOT:
arxiv search "ti:neural AND cat:cs.LG"
arxiv search "au:Hinton OR au:Bengio"
Download Directory
PDFs are downloaded to ./.arxiv by default. Change this with:
arxiv --download-dir ./papers search "ti:transformer"
MCP Server (Model Context Protocol)
The arXiv CLI includes a Model Context Protocol (MCP) server that allows LLM assistants (like Claude Desktop) to search and download arXiv papers programmatically.
Running the MCP Server
# Option 1: Using the script entry point (recommended)
uv run arxiv-mcp
# Option 2: Using the module
uv run python -m arxiv.mcp
The server runs in stdio mode and communicates via JSON-RPC over stdin/stdout.
MCP Tools
The server provides 4 tools for paper discovery and management:
-
search_papers - Search arXiv with advanced query syntax
- Supports field prefixes (ti:, au:, abs:, cat:)
- Boolean operators (AND, OR, ANDNOT)
- Pagination and sorting options
- Returns paper metadata including title, authors, abstract, categories
-
get_paper - Get detailed information about a specific paper
- Accepts flexible ID formats (1706.03762, arXiv:1706.03762, 1706.03762v1)
- Optionally downloads PDF automatically
- Returns complete metadata including DOI, journal references, comments
-
download_paper - Download PDF for a specific paper
- Downloads to local
.arxivdirectory - Returns file path and size information
- Supports force re-download option
- Downloads to local
-
list_downloaded_papers - List all locally downloaded PDFs
- Shows arxiv IDs, file sizes, and paths
- Useful for managing local paper collection
MCP Resources
The server exposes 2 resources for direct access:
- paper://{arxiv_id} - Get formatted paper metadata in markdown
- downloads://list - Get markdown table of all downloaded papers
MCP Prompts
Pre-built prompt templates to guide usage:
- search_arxiv_prompt - Guide for searching arXiv papers
- download_paper_prompt - Guide for downloading and managing papers
Claude Desktop Configuration
Add to your Claude Desktop config file (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
If installed from GitHub/pip:
{
"mcpServers": {
"arxiv": {
"command": "arxiv-mcp"
}
}
}
If running from source/development:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["run", "arxiv-mcp"],
"cwd": "/path/to/arxiv_for_agents"
}
}
}
Or use --directory to avoid needing cwd:
{
"mcpServers": {
"arxiv": {
"command": "uv",
"args": ["--directory", "/path/to/arxiv_for_agents", "run", "arxiv-mcp"]
}
}
}
MCP Use Cases
Once configured, you can ask Claude to:
- "Search arXiv for recent papers on transformer architectures"
- "Find papers by Geoffrey Hinton in the cs.AI category"
- "Download the 'Attention is All You Need' paper"
- "Show me papers about neural networks from 2023"
- "List all the papers I've downloaded"
- "Get the abstract for arXiv:1706.03762"
The MCP integration allows Claude to autonomously search, retrieve, and manage academic papers from arXiv.
Architecture
Module Structure
arxiv/
├── __init__.py # Package exports
├── __main__.py # CLI entry point
├── cli.py # Click commands
├── models.py # Pydantic models
├── services.py # API client service
└── mcp/ # MCP server
├── __init__.py # MCP package exports
├── __main__.py # MCP server entry point
└── server.py # FastMCP server with tools, resources, prompts
tests/
└── test_services.py # Integration tests (26 tests)
Pydantic Models
All API responses are typed using Pydantic:
from arxiv import ArxivService
service = ArxivService()
result = service.search("ti:neural", max_results=5)
# result is typed as ArxivSearchResult
print(f"Total: {result.total_results}")
for entry in result.entries:
# entry is typed as ArxivEntry
print(f"{entry.arxiv_id}: {entry.title}")
print(f"Authors: {', '.join(a.name for a in entry.authors)}")
Key Models
-
ArxivSearchResult: Search results with metadata
total_results: Total matching papersentries: List of ArxivEntry objects
-
ArxivEntry: Individual paper
arxiv_id: Clean ID (e.g., "1706.03762")title,summary: Paper metadataauthors: List of Author objectscategories: Subject categoriespdf_url: Direct PDF linkpublished,updated: Datetime objects
-
Author: Paper author
name: Author nameaffiliation: Optional affiliation
Testing
Run all 26 integration tests (makes real API calls):
uv run pytest tests/test_services.py -v
Run specific test class:
uv run pytest tests/test_services.py::TestArxivServiceSearch -v
The tests are integration tests that hit the real arXiv API, ensuring the service works with actual data.
API Rate Limiting
The service enforces a 3-second delay between API requests by default (arXiv's recommendation). You can adjust this:
from arxiv import ArxivService
service = ArxivService(rate_limit_delay=5.0) # 5 seconds
Examples
Python API
from arxiv import ArxivService
# Initialize service
service = ArxivService(download_dir="./papers")
# Search
results = service.search(
query="ti:attention is all you need",
max_results=5,
sort_by="relevance"
)
print(f"Found {results.total_results} papers")
for entry in results.entries:
print(f"- {entry.title}")
# Get specific paper
entry = service.get("1706.03762", download_pdf=True)
print(f"Downloaded: {entry.title}")
# Just download PDF
pdf_path = service.download_pdf("1706.03762")
print(f"PDF saved to: {pdf_path}")
CLI Examples
# Find recent papers in a category
arxiv search "cat:cs.AI" \
--max-results 10 \
--sort-by submittedDate \
--sort-order descending
# Search and output as JSON for processing
arxiv search "ti:transformer" --json | jq '.entries[].title'
# Batch download multiple papers
for id in 1706.03762 1810.04805 2010.11929; do
arxiv download $id
done
Development
The codebase follows these principles:
- Type safety: Pydantic models for all API responses
- Clean architecture: Separation of CLI, service, and models
- Real tests: Integration tests with actual API calls (no mocks)
- Rate limiting: Respects arXiv API guidelines
- Caching: Automatic local caching to avoid re-downloads
arXiv API Reference
- Base URL: https://export.arxiv.org/api/query
- Format: Atom XML
- Rate limit: 3 seconds between requests (recommended)
- Documentation: https://info.arxiv.org/help/api/user-manual.html
License
This is a personal project for interacting with arXiv's public API.