Ladvien/research_hub_mcp
If you are the rightful owner of research_hub_mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol (MCP) server that provides AI assistants with academic paper search and retrieval capabilities through multiple research sources.
rust-research-mcp
A Model Context Protocol (MCP) server for academic research and knowledge accumulation through intelligent paper search, retrieval, and metadata extraction.
ā ļø Legal Disclaimer
IMPORTANT: This tool is intended for personal academic use only.
This software is provided for educational and research purposes. Users are responsible for ensuring their use complies with:
- All applicable laws and regulations
- Publisher terms of service
- Institutional policies
- Copyright restrictions
The developers of this tool do not condone or support any illegal activities. Users should:
- Only access papers they have legal rights to access
- Respect intellectual property rights
- Use retrieved materials in accordance with fair use principles
- Consider supporting authors and publishers through legitimate channels
By using this software, you acknowledge that you understand and will comply with all applicable laws and regulations regarding access to academic content.
Features
-
š Multi-Provider Search: Comprehensive search across 14 academic sources:
- CrossRef - Authoritative metadata for 130M+ papers
- Semantic Scholar - AI-powered search with PDF access
- arXiv - Physics, CS, and math preprints
- PubMed Central - Biomedical and life science papers
- OpenReview - ML conference papers (NeurIPS, ICLR, etc.)
- OpenAlex - Open bibliographic database
- CORE - 350M+ open access papers
- Unpaywall - Legal free PDF discovery
- SSRN - Social science working papers
- bioRxiv - Biology preprints
- MDPI - Open access journals
- ResearchGate - Academic social network (ethical access)
- Sci-Hub - Full-text fallback (lowest priority)
-
š§ Intelligent Routing: Smart provider prioritization based on:
- Academic domain detection (CS/ML, biomedical, physics, social sciences)
- Search type optimization (DOI, author, title, keywords)
- Content availability (PDF access, recent papers, open access)
- Temporal relevance (recent vs. historical content)
-
š„ Robust Downloads: Multi-provider fallback with zero-byte protection and integrity verification
-
š Code Pattern Search: Regex-powered search for algorithm implementations in research papers
-
š Metadata Extraction: Extract bibliographic information from PDFs with batch processing
-
š Bibliography Generation: Multi-format citations (BibTeX, APA, MLA, Chicago, IEEE, Harvard)
-
š·ļø Smart Categorization: Automatic paper categorization and organization
-
š¤ MCP Integration: Native support for Claude Desktop and Claude Code workflows
-
ā” High Performance: Built with Rust for speed and reliability
-
š Resilient Architecture: Circuit breakers, rate limiting, automatic retries, and graceful error handling
-
š”ļø Security First: HTTPS-only connections, certificate validation, and secure HTTP client factory
-
š§ Daemon Mode: Background service with health monitoring and signal handling
Installation
Quick Start (Recommended)
Build from source:
# Prerequisites: Rust 1.70+ (install from https://rustup.rs/)
git clone https://github.com/Ladvien/sci_hub_mcp.git
cd sci_hub_mcp
cargo build --release
# Binary will be at ./target/release/rust-research-mcp
# Move to a permanent location
sudo cp target/release/rust-research-mcp /usr/local/bin/
Alternative Installation Methods
Using Cargo:
cargo install rust-research-mcp
Development Build:
git clone https://github.com/Ladvien/sci_hub_mcp.git
cd sci_hub_mcp
cargo build --release
Configuration for Claude Desktop
Add the following to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"rust-research-mcp": {
"command": "/usr/local/bin/rust-research-mcp",
"args": [
"--download-dir", "~/downloads/research_papers",
"--log-level", "info"
],
"env": {
"RUST_LOG": "info"
}
}
}
}
Daemon Mode
For production deployments, you can run the server as a daemon:
# Start daemon with custom configuration
rust-research-mcp --daemon --pid-file /var/run/rust-research-mcp.pid --health-port 8090
# Check daemon status
curl http://localhost:8090/health
# Stop daemon (sends SIGTERM for graceful shutdown)
kill -TERM $(cat /var/run/rust-research-mcp.pid)
Usage
Once configured, you can ask Claude to:
- Search for papers: "Search for recent papers on quantum computing"
- Download papers: "Download the first paper from the search results"
- Extract metadata: "Extract metadata from the PDF file"
Command Line Options
rust-research-mcp [OPTIONS]
Options:
-v, --verbose Enable verbose logging
-c, --config <PATH> Configuration file path
-d, --daemon Run as daemon
--pid-file <PATH> PID file path (for daemon mode)
--health-port <PORT> Health check port [default: 8090]
--port <PORT> Override server port
--host <HOST> Override server host
--log-level <LEVEL> Override log level (trace, debug, info, warn, error)
--profile <PROFILE> Set environment profile (development, production)
--download-dir <PATH> Override download directory path
--generate-schema Generate JSON schema for configuration
-h, --help Print help information
-V, --version Print version information
Environment Variables
RUST_RESEARCH_MCP_*
: Configuration variables (see config.toml for full list)RUST_LOG
: Standard Rust logging configuration (debug, info, warn, error, trace)
Available Tools
Core Research Tools
search_papers
Search for academic papers across 14 different academic sources with intelligent provider routing.
Parameters:
query
(required): Search query (DOI, title, author, or keywords)search_type
(optional): Search type (auto
,doi
,title
,author
,author_year
)limit
(optional): Maximum results to return (default: 10)offset
(optional): Pagination offset (default: 0)
download_paper
Download a paper PDF with multi-provider fallback and integrity verification.
Parameters:
doi
(optional): DOI of the paper to downloadurl
(optional): Direct download URL (alternative to DOI)filename
(optional): Custom filename for the downloaded PDFdirectory
(optional): Target directory (uses default download directory if not specified)category
(optional): Organization category (creates subdirectory)overwrite
(optional): Whether to overwrite existing files (default: false)verify_integrity
(optional): Verify file integrity after download (default: true)
extract_metadata
Extract bibliographic metadata from PDF files using multiple extraction methods.
Parameters:
file_path
(required): Path to the PDF fileextract_full_text
(optional): Also extract full text content (default: false)extract_references
(optional): Extract reference list (default: false)
Advanced Tools
search_code
Search for code patterns within downloaded research papers using regex patterns.
Parameters:
pattern
(required): Regex pattern to search forsearch_dir
(optional): Directory to search in (defaults to download directory)file_extensions
(optional): File extensions to search (default: [".pdf", ".txt"])max_results
(optional): Maximum results to return (default: 50)context_lines
(optional): Lines of context around matches (default: 2)
generate_bibliography
Generate formatted citations from paper metadata in multiple citation styles.
Parameters:
papers
(required): Array of paper metadata or DOIsformat
(optional): Citation format (bibtex
,apa
,mla
,chicago
,ieee
) (default: bibtex)sort_by
(optional): Sort order (author
,year
,title
) (default: author)include_abstracts
(optional): Include abstracts in output (default: false)
categorize_papers
Automatically categorize research papers based on content and metadata.
Parameters:
papers
(required): Array of paper metadata or file pathscategory_scheme
(optional): Categorization scheme (subject
,methodology
,custom
)custom_categories
(optional): Custom category definitionsconfidence_threshold
(optional): Minimum confidence for categorization (default: 0.7)
Example Workflows
Research Collection Workflow
# Step 1: Search for papers on a topic
"Search for recent papers on transformer architectures, limit 20"
# Step 2: Download selected papers
"Download the paper with DOI 10.1038/nature12373 to ~/research/transformers/"
# Step 3: Extract metadata for organization
"Extract metadata from ~/research/transformers/paper.pdf"
# Step 4: Search for code implementations
"Search for 'class Transformer' pattern in ~/research/transformers/"
# Step 5: Generate bibliography
"Create BibTeX bibliography from collected papers"
Literature Review Workflow
# Search across multiple aspects of a topic
"Search for papers by author 'Yoshua Bengio' on deep learning"
"Search for papers on attention mechanisms in neural networks"
# Organize papers by category
"Categorize papers in ~/research/attention/ by methodology"
# Generate comprehensive bibliography
"Generate IEEE format bibliography from all categorized papers"
Claude Code Integration
This MCP server is specifically enhanced for Claude Code workflows with advanced research capabilities:
Key Benefits for Developers
- Algorithm Discovery: Find reference implementations in academic papers
- Code Pattern Search: Regex-powered search across research publications
- Citation Management: Generate properly formatted references for projects
- Research Organization: Automatic categorization and metadata extraction
Integration Tips
- Configure download directory: Set up dedicated research workspace
- Use search patterns: Leverage regex for finding specific implementations
- Organize by categories: Use automatic categorization for better organization
- Generate documentation: Create proper citations and bibliographies
Configuration File
Create a configuration file at ~/.config/knowledge_accumulator_mcp/config.toml
:
# Server configuration
[server]
port = 8080
host = "127.0.0.1"
graceful_shutdown_timeout_secs = 30
# Research source configuration
[research_source]
provider_timeout_secs = 30
max_results_per_provider = 50
# Download settings
[downloads]
directory = "~/downloads/research_papers"
max_concurrent_downloads = 5
max_file_size_mb = 100
verify_integrity = true
# Logging configuration
[logging]
level = "info"
format = "pretty"
output = "stderr"
# Resilience settings
[circuit_breaker]
failure_threshold = 5
timeout_duration_secs = 60
half_open_max_calls = 3
[rate_limiting]
requests_per_second = 2
burst_size = 10
Development
Running Tests
# Run all tests (parallel execution)
cargo nextest run
# Run specific test
cargo nextest run TEST_NAME
# Run with coverage report
cargo tarpaulin --out Html
# Run integration tests
cargo test --test comprehensive_e2e_scenarios
Code Quality
# Format code
cargo fmt
# Run linter (must pass before commit)
cargo clippy -- -D warnings
# Security audit
cargo audit
# Build release version
cargo build --release
Architecture
The project follows a clean, modular architecture with dependency injection:
src/
āāā main.rs # CLI entry point and configuration
āāā lib.rs # Public API and exports
āāā server/ # MCP server implementation
ā āāā handler.rs # MCP request handler
ā āāā transport.rs # Transport layer validation
āāā tools/ # MCP tool implementations
ā āāā search.rs # Multi-provider search
ā āāā download.rs # Paper download with fallback
ā āāā metadata.rs # PDF metadata extraction
ā āāā code_search.rs # Code pattern search
ā āāā bibliography.rs # Citation generation
ā āāā categorize.rs # Paper categorization
āāā client/ # Research source integration
ā āāā meta_search.rs # Meta-search orchestration
ā āāā mirror.rs # Mirror management
ā āāā rate_limiter.rs # Rate limiting
ā āāā providers/ # Academic source implementations
ā āāā arxiv.rs
ā āāā crossref.rs
ā āāā semantic_scholar.rs
ā āāā pubmed_central.rs
ā āāā openreview.rs
ā āāā openalex.rs
ā āāā ... (14 providers total)
āāā resilience/ # Circuit breakers and retry logic
āāā services/ # Business logic services
āāā config/ # Configuration management
āāā error.rs # Centralized error handling
Changelog
Version 0.6.6 (Current)
- šļø Complete Architecture Redesign: Clean hexagonal architecture with dependency injection
- š 14 Academic Providers: Comprehensive coverage including OpenAlex, PubMed Central, OpenReview
- š§ Intelligent Provider Routing: Context-aware selection based on domain, search type, and content availability
- š§ Enhanced MCP Integration: Full rmcp framework integration with proper tool definitions
- š”ļø Security Hardened: HTTPS-only clients, certificate validation, secure HTTP factory
- š Resilience Features: Circuit breakers, rate limiting, automatic retries, graceful degradation
- š·ļø Smart Categorization: Automatic paper categorization and organization
- š Advanced Metadata Extraction: Multiple extraction methods with batch processing
- š Code Pattern Search: Regex-powered search across research publications
- š Multi-format Citations: BibTeX, APA, MLA, Chicago, IEEE format support
- š§ Daemon Mode: Production-ready background service with health monitoring
- š Comprehensive Testing: E2E scenarios, integration tests, security auditing
Contributing
Contributions are welcome! Please read our for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Troubleshooting
Common Issues
Issue: Papers not downloading
- Solution: The tool uses 14 different providers with intelligent fallback. Check network connectivity and try alternative search terms.
Issue: MCP server not connecting
- Solution: Verify the binary path in
claude_desktop_config.json
is absolute and the binary has execute permissions (chmod +x
).
Issue: High memory usage
- Solution: Configure appropriate concurrency limits in
config.toml
. Lowermax_concurrent_downloads
for systems with limited resources.
Issue: Provider timeout errors
- Solution: Increase
provider_timeout_secs
in configuration or check internet connectivity to academic databases.
Issue: Circuit breaker errors
- Solution: The system uses circuit breakers for resilience. Wait for the timeout period or check provider availability.
Logs
Daemon Mode Logs:
- View with:
journalctl -u rust-research-mcp
(systemd) - Or check:
/var/log/rust-research-mcp.log
Claude Desktop Logs:
- macOS:
~/Library/Logs/Claude/mcp-server-rust-research-mcp.log
- Linux:
~/.local/share/Claude/logs/
- Windows:
%APPDATA%\Claude\logs\
Debug Mode:
# Enable debug logging
RUST_LOG=debug rust-research-mcp --verbose
License
This project is licensed under the GPL-3.0 License - see the file for details.
Acknowledgments
- Built with rmcp - Rust SDK for Model Context Protocol
- Uses the Model Context Protocol specification
- Searches academic databases including arXiv and CrossRef
Disclaimer
This tool is provided "as is" without warranty of any kind. The authors and contributors are not responsible for any misuse or legal issues arising from the use of this software. Users must ensure they comply with all applicable laws, regulations, and terms of service when accessing academic content.
For personal academic use only.
Support
For issues, questions, or suggestions, please open an issue on GitHub.
Made with ā¤ļø for the academic community. Please use responsibly.