borderlessboy/arxiv-research-mcp
If you are the rightful owner of arxiv-research-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The arXiv Research MCP Server is a comprehensive tool for searching and analyzing academic papers from arXiv, utilizing AI-powered relevance ranking and full-text extraction.
arXiv Research MCP Server
A comprehensive Model Context Protocol (MCP) server for searching and analyzing academic papers from arXiv with AI-powered relevance ranking and full-text extraction.
Features
- Smart Search: Search arXiv with date filtering and relevance ranking
- Full Text Extraction: Download and extract complete paper content
- Caching: Intelligent caching to reduce API calls
- Multiple Integrations: Works with Claude, LangChain, Streamlit, and more
- Batch Processing: Process multiple research topics efficiently
- API Wrapper: REST API for easy integration
- Jupyter Integration: Interactive analysis and visualization tools
- Relevance Ranking: TF-IDF based ranking for better results
- PDF Processing: Multi-method text extraction from PDFs
Quick Start
Installation
# Clone the repository
git clone https://github.com/borderlessboy/arxiv-research-mcp
cd arxiv-research-mcp
# Install dependencies
pip install -r requirements.txt
# Create environment configuration
# cp .env.example .env # Create .env file with your configuration
Basic Usage
# Run the MCP server
python scripts/run_server.py
# Or use the Streamlit dashboard
streamlit run integrations/streamlit_app.py
Docker Usage
The project includes a Dockerfile for easy containerized deployment.
Quick Start with Docker
# Build the Docker image
docker build -t arxiv-research-mcp .
# Run the container
docker run -p 8090:8090 arxiv-research-mcp
Docker with Custom Configuration
# Build with custom tag
docker build -t arxiv-research-mcp:latest .
# Run with custom port mapping
docker run -p 8080:8090 arxiv-research-mcp
# Run with volume for persistent cache
docker run -p 8090:8090 -v $(pwd)/cache:/app/cache arxiv-research-mcp
# Run with environment variables
docker run -p 8090:8090 \
-e CACHE_ENABLED=true \
-e CACHE_TTL_HOURS=24 \
-e LOG_LEVEL=INFO \
arxiv-research-mcp
Docker Compose (Recommended)
The project includes a docker-compose.yml file for easy deployment:
# Start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
Or create a custom docker-compose.yml:
services:
arxiv-research-mcp:
build: .
ports:
- "8090:8090"
volumes:
- ./cache:/app/cache
environment:
- CACHE_ENABLED=true
- CACHE_TTL_HOURS=24
- LOG_LEVEL=INFO
restart: unless-stopped
# Start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
Docker Development
# Build for development with all dependencies
docker build -t arxiv-research-mcp:dev .
# Run with mounted source code for development
docker run -p 8090:8090 \
-v $(pwd)/src:/app/src \
-v $(pwd)/config:/app/config \
-v $(pwd)/cache:/app/cache \
arxiv-research-mcp:dev
Installation Options
Docker Installation (Recommended)
# Quick start with Docker
docker build -t arxiv-research-mcp .
docker run -p 8090:8090 arxiv-research-mcp
Full Installation
pip install "arxiv-research-mcp[all]"
Specific Components
# API server only
pip install "arxiv-research-mcp[api]"
# Jupyter integration
pip install "arxiv-research-mcp[jupyter]"
# Dashboard
pip install "arxiv-research-mcp[dashboard]"
# LangChain integration
pip install "arxiv-research-mcp[langchain]"
Usage Examples
1. Basic MCP Server Usage
from src.server import search_arxiv_papers_tool
# Search for papers
result = await search_arxiv_papers_tool({
"query": "transformer models",
"max_results": 10,
"years_back": 4,
"include_full_text": True
})
2. LangChain Integration
from integrations.langchain_tool import ResearchAgent
agent = ResearchAgent()
result = agent.research_topic("quantum machine learning")
3. Jupyter Analysis
from integrations.jupyter_helper import search_papers
# Search and analyze
helper = await search_papers("machine learning", max_results=20)
# Create visualizations
fig = helper.create_publication_timeline()
plt.show()
4. Streamlit Dashboard
streamlit run integrations/streamlit_app.py
Configuration
Create a .env file with your settings:
# Server Configuration
SERVER_NAME=arxiv-research-server
LOG_LEVEL=INFO
# arXiv API Configuration
ARXIV_REQUEST_TIMEOUT=30
ARXIV_MAX_RETRIES=3
# Caching
CACHE_ENABLED=true
CACHE_TTL_HOURS=24
# Content Processing
MAX_FULL_TEXT_LENGTH=50000
DEFAULT_MAX_RESULTS=10
DEFAULT_YEARS_BACK=4
API Reference
MCP Tools
search_arxiv_papers
Search for academic papers with relevance ranking.
Parameters:
query(string): Search querymax_results(integer, default: 10): Maximum papers to returnyears_back(integer, default: 4): Years to search backinclude_full_text(boolean, default: true): Include full paper text
clear_cache
Clear all cached search results.
get_cache_stats
Get cache statistics and information.
LangChain Tools
ArxivResearchTool
Search arXiv papers with LangChain integration.
ArxivCacheManagementTool
Manage cache with LangChain integration.
Advanced Features
Relevance Ranking
The server uses TF-IDF vectorization and cosine similarity to rank papers by relevance to your query.
PDF Processing
Multiple extraction methods (PyPDF2, pdfplumber) ensure robust text extraction from PDFs.
Caching System
Intelligent caching reduces API calls and improves response times.
Batch Processing
Process multiple research topics efficiently with the batch processor.
Docker Deployment
The project includes a production-ready Dockerfile with:
- Lightweight Python 3.11-slim base image
- Optimized layer caching for faster builds
- Pre-configured HTTP server on port 8090
- Volume support for persistent caching
- Environment variable configuration
Development
Running Tests
pytest tests/
Code Quality
black src/ tests/
flake8 src/ tests/
mypy src/
Building
python setup.py build
Docker Development
# Build development image
docker build -t arxiv-research-mcp:dev .
# Run with source code mounted for development
docker run -p 8090:8090 \
-v $(pwd)/src:/app/src \
-v $(pwd)/config:/app/config \
-v $(pwd)/cache:/app/cache \
arxiv-research-mcp:dev
# Run tests in Docker
docker run arxiv-research-mcp:dev pytest tests/
Architecture
arxiv-research-mcp/
├── src/
│ ├── server.py # Main MCP server
│ ├── models/ # Data models
│ ├── services/ # Core services
│ └── utils/ # Utility functions
├── integrations/ # External integrations
├── scripts/ # Utility scripts
├── tests/ # Test suite
└── examples/ # Usage examples
Documentation
For detailed documentation and guides, see the directory:
- - Complete guide for MCPO integration
- - How to run the server on different ports
- - MCPO-specific documentation
- - Summary of bug fixes and improvements
- - Documentation of code cleanup and optimization
- - Comprehensive Docker deployment guide
- - License details and compliance guide
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
This project is licensed under the MIT License - see the file for details.
Troubleshooting
Docker Issues
Port already in use:
# Use a different port
docker run -p 8080:8090 arxiv-research-mcp
Permission denied:
# Run with proper permissions
sudo docker run -p 8090:8090 arxiv-research-mcp
Build fails:
# Clean build
docker system prune -a
docker build --no-cache -t arxiv-research-mcp .
Container exits immediately:
# Check logs
docker logs <container_id>
# Run interactively
docker run -it arxiv-research-mcp /bin/bash
Support
- Issues: GitHub Issues
- Documentation: GitHub Wiki
- Discussions: GitHub Discussions
Acknowledgments
- arXiv for providing the academic paper database
- MCP (Model Context Protocol) for the server framework
- The open-source community for the various libraries used