arxiv-txt-mcp

shivvor2/arxiv-txt-mcp

3.2

If you are the rightful owner of arxiv-txt-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The MCP server facilitates the conversion of arXiv documents into LLM-friendly text, optimizing environments where OCR is not feasible.

arXiv-txt MCP Server

An MCP server for arxiv-txt.org, providing LLM-friendly plain text summaries and full content of arXiv papers rendered from source TeX.

License Python X (formerly Twitter) URL

Features

  • Paper Summary: Retrieve the plain text abstract of any arXiv paper.
  • Full Paper Content: Fetch the entire content of a paper in plain text format.
  • Self-Host Support: Can be configured to point to a self-hosted arxiv-txt instance via an environment variable.

Tools

  • get_summary(arxiv_id: str) -> str Fetches the summary of a paper given its arXiv ID (e.g., "1706.03762").

  • get_full_paper(arxiv_id: str) -> str Fetches the full paper content for a given arXiv ID.

Installation and Usage

  1. Clone the repository:

    git clone <repository_url>
    cd <repository_directory>
    
  2. Install dependencies:

    pip install -r .
    
  3. (Optional) Configure for a self-hosted instance: Create a .env file in the root directory to specify a custom arxiv-txt URL:

    ARXIV_TXT_URL=http://localhost:8000
    
  4. Run the server locally:

    python arxiv_txt_server.py
    
  5. Connect from an MCP Client: For clients that support it, add a server entry to your configuration file:

    "mcpServers": {
      "arxiv-txt": {
        "command": "python",
        "args": [
          "/path/to/your/main.py"
        ]
      }
    }
    

Examples

After installation, you can ask your LLM assistant:

  • "What is the summary of arXiv paper 1706.03762?"
  • "Fetch the full text for the paper 'Attention is All You Need' (1706.03762)."

The client will automatically call the appropriate tools and use the content in its response.

Why This Server?

Suits my use scenario (librechat on cloud vps) better, I prebake my mcp servers into LibreChat's api image.

Most arXiv MCP servers rely on local OCR libraries like PyMuPDF, causing:

  1. Build Failures on Alpine Linux: PyMuPDF lacks pre-built wheels for Alpine's musl libc, forcing slow source compilation that often fails during docker build.
  2. Poor Accuracy & Heavy Resources: Local OCR is resource-intensive and produces lower-quality text extraction.

Both issues are solved by delegating parsing to arxiv-txt.org:

  • Lightweight: No OCR dependencies, just requests and fastmcp.
  • Zero Build Issues: Works seamlessly on any OS, including Alpine Linux.
  • Better Quality: Uses a service optimized for clean, LLM-friendly arXiv text extraction.

Ideal for low resources environments (e.g. VPS deployments) or if glibc is not availiable

License

This project is licensed under the MIT License.

Acknowledgments

  • Built with FastMCP.
  • Text content provided by the arXiv-txt.org service.
  • This project is licensed under the MIT License.