align-sec/arxiv-mcp
If you are the rightful owner of arxiv-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A complete Model Context Protocol (MCP) implementation for searching arXiv papers using natural language queries powered by Claude.
arXiv MCP Client & Server
A complete Model Context Protocol (MCP) implementation for searching arXiv papers using natural language queries powered by Claude.
Overview
This project includes a client, server, and web interface:
- Client (
arxiv_client.py): Accepts natural language queries, uses Claude to parse them into structured parameters, calls the server, and scores results by relevance - Server (
arxiv_server.py): Implements thefind_paperstool that queries the arXiv API and returns matching papers - Web App (
app.py): Beautiful Streamlit interface for easy paper searching
Features
- Natural Language Processing: Uses Claude to understand queries like "find me 10 recent papers about LLM red teaming as it applies to security of AI agents"
- Intelligent Parameter Extraction: Automatically extracts:
- Search terms (keywords for arXiv search)
- Date ranges (e.g., "recent" → last 6 months, "past year" → 1 year ago)
- Number of results (defaults to 10 if not specified)
- Smart Relevance Scoring: After retrieving papers from the server:
- Calculates relevance score based on search term matches in title and summary
- Scores are computed client-side (no additional API calls)
- Results sorted by relevance score (highest first)
- MCP Integration: Ready to connect to an MCP server implementing the
find_paperstool
Installation
- Clone repo & create new environment with python 3.12
git clone git@github.com:align-sec/arxiv-mcp.git
conda create -n mcp_env python=3.12
conda activate mcp_env
conda install pip
pip install --upgrade pip
pip install -r requirements.txt
- Install dependencies:
pip install -r requirements.txt
- Set your Anthropic API key (for command-line usage):
export ANTHROPIC_API_KEY="your-api-key-here"
Quick Start - Web Interface 🌟
The easiest way to use the system is through the Streamlit web app:
streamlit run app.py
Then:
- Enter your Anthropic API key in the sidebar
- Type your search query (e.g., "find me 10 recent papers about transformers")
- Click "Search Papers"
- View beautifully formatted, relevance-scored results!
The web app features:
- 🎨 Beautiful gradient tiles for each paper (color-coded by relevance)
- ⭐ Visual relevance scores (0.00 to 1.00)
- 📱 Responsive design that works on all devices
- 🔗 Direct links to arXiv papers
- 💡 Example queries and help text
- 🔒 Secure API key input (not stored)
- ⚡ FAST: Only ONE API call (query parsing), everything else is instant!
Note: The Streamlit app uses arxiv_client_simple.py which calls the server functions directly (in-process) for better performance and reliability in a web environment. For true MCP client/server architecture over stdio, see the examples below.
Alternative launcher:
./run_app.sh
Usage
As a Library
import asyncio
from arxiv_client import ArxivMCPClient
async def search_papers():
# Initialize client with your API key
client = ArxivMCPClient(anthropic_api_key="your-api-key")
# Parse a natural language query
query = "find me 10 recent papers about LLM red teaming"
params = client.parse_query_with_claude(query)
print(params)
# Output: {
# "search_terms": ["LLM", "red teaming", "language models", "security"],
# "min_date": "2024-04-21",
# "max_results": 10
# }
# When server is available, connect and search:
# from mcp import StdioServerParameters
# server_params = StdioServerParameters(
# command="python",
# args=["arxiv_server.py"]
# )
# await client.connect_to_server(server_params)
# results = await client.search_papers(query)
# # Results are automatically scored and sorted by relevance
# for paper in results:
# print(f"Score: {paper['relevance_score']:.2f} - {paper['title']}")
# await client.disconnect()
asyncio.run(search_papers())
Command Line Demo
Run the included demo (query parsing only, no server):
python arxiv_client.py
Integration Test
Run the full client-server integration test:
export ANTHROPIC_API_KEY="your-api-key"
python test_integration.py
This will:
- Start the server
- Parse a natural language query
- Query arXiv API
- Score papers for relevance
- Display sorted results
Query Examples
The client can understand various natural language queries:
"find me 10 recent papers about LLM red teaming as it applies to security of AI agents""search for papers on quantum computing from the last year""get 5 papers about neural networks published in the last 3 months""find papers on transformer architectures"
Parsed Parameters Schema
The client extracts queries into the following JSON structure:
{
"search_terms": ["list", "of", "keywords"],
"min_date": "YYYY-MM-DD", // optional
"max_results": 10 // defaults to 10
}
Parameters
- search_terms (required): List of relevant keywords extracted from the query
- min_date (optional): Minimum publication date in YYYY-MM-DD format
- "recent" without specifics → 6 months ago
- "last year" → 1 year ago
- "past 3 months" → 3 months ago
- max_results (optional): Number of papers to return (default: 10)
Relevance Scoring
After retrieving papers from the server, the client automatically:
- Calculates relevance by matching search terms against paper titles and summaries
- Title matches are weighted more heavily (60%)
- Summary matches contribute 40%
- All done client-side with no additional API calls
- Assigns a relevance score from 0.0 to 1.0 based on match percentage
- Sorts results by relevance score (highest first)
This ensures that the most relevant papers appear at the top of the results, even if the arXiv search returned them in a different order.
Performance: Only ONE API call is made (to parse the query). Everything else is fast client-side processing.
Server Implementation
The included arxiv_server.py implements an MCP server with the following features:
find_papers Tool
Accepts parameters:
search_terms(required): List of keywords to search formin_date(optional): Minimum publication date (YYYY-MM-DD)max_results(optional): Maximum number of results (default: 10)
Returns papers with:
title: Paper titlesummary: Paper abstractauthors: List of author namespublished: Publication dateupdated: Last update datearxiv_id: arXiv identifierurl: Full arXiv URLcategories: arXiv categories
arXiv API Integration
The server queries the arXiv API using:
- Search query built from search terms (searches all fields)
- Results sorted by submission date (most recent first)
- XML response parsing with proper namespace handling
- Date filtering applied after retrieval
Running the Complete System
Start the Server
In one terminal:
python arxiv_server.py
Run the Client
Update arxiv_client.py to uncomment the server connection code in the main() function, then:
export ANTHROPIC_API_KEY="your-api-key"
python arxiv_client.py
Full Integration Example
import asyncio
from arxiv_client import ArxivMCPClient
from mcp import StdioServerParameters
async def search_with_server():
# Initialize client
client = ArxivMCPClient(anthropic_api_key="your-key")
# Configure server connection
server_params = StdioServerParameters(
command="python",
args=["arxiv_server.py"]
)
# Connect and search
await client.connect_to_server(server_params)
query = "find me 10 recent papers about LLM red teaming"
results = await client.search_papers(query)
# Results are automatically scored and sorted by relevance
for i, paper in enumerate(results, 1):
print(f"\n{i}. [{paper['relevance_score']:.2f}] {paper['title']}")
print(f" Authors: {', '.join(paper['authors'][:3])}")
print(f" Published: {paper['published'][:10]}")
print(f" URL: {paper['url']}")
await client.disconnect()
asyncio.run(search_with_server())
Requirements
- Python 3.8+
- Anthropic API key
- MCP-compatible server (for full functionality)
License
MIT