shivvor2/arxiv-txt-mcp
If you are the rightful owner of arxiv-txt-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP server facilitates the conversion of arXiv documents into LLM-friendly text, optimizing environments where OCR is not feasible.
arXiv-txt MCP Server
An MCP server for arxiv-txt.org, providing LLM-friendly plain text summaries and full content of arXiv papers rendered from source TeX.
Features
- Paper Summary: Retrieve the plain text abstract of any arXiv paper.
- Full Paper Content: Fetch the entire content of a paper in plain text format.
- Self-Host Support: Can be configured to point to a self-hosted
arxiv-txtinstance via an environment variable.
Tools
-
get_summary(arxiv_id: str) -> strFetches the summary of a paper given its arXiv ID (e.g., "1706.03762"). -
get_full_paper(arxiv_id: str) -> strFetches the full paper content for a given arXiv ID.
Installation and Usage
-
Clone the repository:
git clone <repository_url> cd <repository_directory> -
Install dependencies:
pip install -r . -
(Optional) Configure for a self-hosted instance: Create a
.envfile in the root directory to specify a customarxiv-txtURL:ARXIV_TXT_URL=http://localhost:8000 -
Run the server locally:
python arxiv_txt_server.py -
Connect from an MCP Client: For clients that support it, add a server entry to your configuration file:
"mcpServers": { "arxiv-txt": { "command": "python", "args": [ "/path/to/your/main.py" ] } }
Examples
After installation, you can ask your LLM assistant:
- "What is the summary of arXiv paper 1706.03762?"
- "Fetch the full text for the paper 'Attention is All You Need' (1706.03762)."
The client will automatically call the appropriate tools and use the content in its response.
Why This Server?
Suits my use scenario (librechat on cloud vps) better, I prebake my mcp servers into LibreChat's api image.
Most arXiv MCP servers rely on local OCR libraries like PyMuPDF, causing:
- Build Failures on Alpine Linux:
PyMuPDFlacks pre-built wheels for Alpine's musl libc, forcing slow source compilation that often fails duringdocker build. - Poor Accuracy & Heavy Resources: Local OCR is resource-intensive and produces lower-quality text extraction.
Both issues are solved by delegating parsing to arxiv-txt.org:
- Lightweight: No OCR dependencies, just
requestsandfastmcp. - Zero Build Issues: Works seamlessly on any OS, including Alpine Linux.
- Better Quality: Uses a service optimized for clean, LLM-friendly arXiv text extraction.
Ideal for low resources environments (e.g. VPS deployments) or if glibc is not availiable
License
This project is licensed under the MIT License.
Acknowledgments
- Built with FastMCP.
- Text content provided by the arXiv-txt.org service.
- This project is licensed under the MIT License.