mcp-research-server by ramonbnuezjr - MCP Server

MCP Research Tools Server

Project Overview

This project implements a Model Context Protocol (MCP) server designed to provide various research-oriented tools, including news summarization and scientific paper retrieval. It acts as a standardized interface, enabling AI agents or other client applications to fetch and process information from diverse sources without needing to interact directly with various external APIs or manage complex processing logic.

The server is built using Python with the FastAPI framework. It currently offers:

A news tool that retrieves news using the Tavily Search API and summarizes content using a configurable LLM (e.g., local models via Ollama like Mistral 7B, or OpenAI).
An arXiv tool that retrieves scientific paper metadata from the arXiv API.

Future enhancements will include summarizing arXiv abstracts/papers and adding more research tools.

Key Features (as of v0.2.0):

Standardized MCP Interface: Adheres to a defined request/response structure.
Multi-Tool Support: A single endpoint (/mcp/tools) routes requests to different tools based on tool_id.
News Tool (news_tool):
- Fetches news article search results using Tavily Search API.
- Summarizes news content using a locally run LLM (e.g., Mistral 7B via Ollama) or OpenAI.
ArXiv Tool (arxiv_tool):
- Fetches scientific paper metadata (titles, authors, abstracts, URLs) from arXiv API.
- XML parsing for arXiv API responses.
- Placeholder for summarizing arXiv abstracts.
Configurable LLM Backend: Supports local LLMs via Ollama and OpenAI, selectable via configuration.
Abstraction of External Services: Shields clients from Tavily and arXiv API specifics.
Error Handling: Provides standardized MCP error messages.
Asynchronous Operations: Leverages FastAPI's async capabilities.
Data Validation: Uses Pydantic for request/response validation.
Configuration Management: Loads API keys and settings from .env.

Architecture

Client (AI Agent/Test Script) sends an MCP-formatted JSON request to the /mcp/tools endpoint, specifying tool_id and method.
MCP Research Tools Server (this application) parses the request and routes it based on tool_id.
The corresponding tool's service handler (e.g., news_service, arxiv_service):
- Calls the relevant External API (Tavily, arXiv).
- Processes the data (e.g., parses XML for arXiv, prepares content for summarization).
- If summarization is involved, calls the configured LLM (e.g., local Mistral 7B via Ollama) through LangChain.
The MCP server transforms the result into the standardized MCP response format.
The MCP server sends the MCP-formatted JSON response back to the client.

sequenceDiagram
    participant Client as AI Agent / Test Script
    participant MCPServer as MCP Research Tools Server (FastAPI)
    participant ExternalAPI as Tavily / arXiv API
    participant LLMService as Local LLM (Ollama) / OpenAI

    Client->>+MCPServer: 1. Request (tool_id, method, params) @ /mcp/tools
    MCPServer->>ExternalAPI: 2. Fetch Data (e.g., news, papers)
    ExternalAPI-->>MCPServer: 3. Raw Data
    MCPServer->>LLMService: 4. Request Summarization (if applicable)
    LLMService-->>MCPServer: 5. Summarized Text
    MCPServer-->>-Client: 6. Processed Data (MCP Format)

Prerequisites

Python 3.8+ (Python 3.10+ recommended)
API Keys / Setups:
- Tavily API Key (TAVILY_API_KEY in .env) for the news tool.
- Ollama installed and a model running (e.g., Mistral 7B via ollama pull mistral) if PREFERRED_LLM_PROVIDER="local". Ollama server should be accessible (default http://localhost:11434).
- (Optional) OpenAI API Key (OPENAI_API_KEY in .env) if PREFERRED_LLM_PROVIDER="openai".
arXiv API is public and does not require a key for basic search.

Setup and Installation

Clone the repository.
```
git clone <your-repository-url>
cd mcp-research-server
```
(Ensure your project folder is named mcp-research-server or adjust as needed).

Create and activate a virtual environment.

python3 -m venv venv
source venv/bin/activate  # On macOS/Linux
# For Windows (Command Prompt): venv\Scripts\activate.bat
# For Windows (PowerShell): venv\Scripts\Activate.ps1

Install dependencies:
```
pip install -r requirements.txt
```
Configure API Keys & Settings in .env:
- Create a .env file in the project root (you can copy .env.example if one is provided, or create it manually).
- Add your TAVILY_API_KEY.
- Set PREFERRED_LLM_PROVIDER (e.g., "local" or "openai").
- If using OpenAI, add your OPENAI_API_KEY.
- Configure LOCAL_LLM_MODEL_NAME (e.g., "mistral:latest") and LOCAL_LLM_API_URL if they differ from the defaults in app/core/config.py.
- Ensure .env is listed in your .gitignore file.

Running the MCP Server

Ensure your virtual environment is activated.
If using a local LLM via Ollama, ensure your Ollama server is running and the model is available.
Navigate to the project root directory (mcp-research-server).
Start the server using Uvicorn:
```
uvicorn app.main:app --reload --port 8001
```
The server will be accessible at http://localhost:8001.

API Documentation (Auto-Generated)

FastAPI automatically generates interactive API documentation:

Swagger UI: http://localhost:8001/docs
ReDoc: http://localhost:8001/redoc

API Endpoint: `/mcp/tools`

This single endpoint handles requests for all available tools. The specific tool and method are determined by the tool_id and method fields in the request body.

News Tool (`tool_id: "news_tool"`)

Method: get_news_summary
Request Parameters (parameters object):
- query (string, required): Search query for news.
- max_summary_sentences (integer, optional, default: 3): Hint for summary length (actual sentence count may vary based on LLM).
- include_sources (boolean, optional, default: true): Whether to include sources in the response.

Example Request:

{
  "protocol_version": "1.0",
  "tool_id": "news_tool",
  "method": "get_news_summary",
  "parameters": {
    "query": "latest developments in AI ethics",
    "include_sources": true
  }
}

Example Success Response (Summarized by Local LLM):

{
  "protocol_version": "1.0",
  "tool_id": "news_tool",
  "status": "success",
  "data": {
    "query_processed": "latest developments in AI ethics",
    "summary": "Title: AI Ethics: Challenges, Importance, and Future\n\n The article discusses the importance of Artificial Intelligence (AI) ethics... and addressing ethical considerations in emerging AI applications. The article emphasizes the need for continued development of AI while mitigating potential risks.",
    "articles_processed_count": 5,
    "sources": [
      {"title": "AI Ethics : Challenges, Importance, and Future - GeeksforGeeks", "url": "https://www.geeksforgeeks.org/ai-ethics/"},
      {"title": "The future of ethics in AI: challenges and opportunities", "url": "https://link.springer.com/article/10.1007/s00146-023-01644-x"}
    ]
  },
  "error": null
}

ArXiv Tool (`tool_id: "arxiv_tool"`)

Method: search_papers
Request Parameters (parameters object):
- search_query (string, required): arXiv search query (e.g., au:Hinton AND cat:cs.LG).
- max_results (integer, optional, default: value from config, e.g., 10): Max papers to return.
- summarize_abstracts (boolean, optional, default: false): Whether to request summarization of abstracts (current implementation for summarization is a placeholder message).

Example Request:

{
  "protocol_version": "1.0",
  "tool_id": "arxiv_tool",
  "method": "search_papers",
  "parameters": {
    "search_query": "au:Lecun AND ti:deep learning",
    "max_results": 2,
    "summarize_abstracts": false
  }
}

Example Success Response (Metadata Fetch):

{
  "protocol_version": "1.0",
  "tool_id": "arxiv_tool",
  "status": "success",
  "data": {
    "query_processed": "au:Lecun AND ti:deep learning",
    "papers_found": 2,
    "papers": [
      {
        "arxiv_id": "2211.01340v3",
        "title": "POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks",
        "authors": ["Randall Balestriero", "Yann LeCun"],
        "published_date": "2022-11-02T17:48:52Z",
        "updated_date": "2023-03-10T16:23:19Z",
        "summary_abstract": "Deep Neural Networks (DNNs) outshine alternative function approximators...",
        "paper_url": "http://arxiv.org/abs/2211.01340v3",
        "pdf_url": "http://arxiv.org/pdf/2211.01340v3",
        "primary_category": "cs.LG",
        "categories": ["cs.LG", "cs.CV", "stat.ML"],
        "generated_summary": null
      },
      {
        "arxiv_id": "2401.11188v1",
        "title": "Fast and Exact Enumeration of Deep Networks Partitions Regions",
        "authors": ["Randall Balestriero", "Yann LeCun"],
        "published_date": "2024-01-20T09:51:52Z",
        "updated_date": "2024-01-20T09:51:52Z",
        "summary_abstract": "One fruitful formulation of Deep Networks (DNs) enabling their theoretical study...",
        "paper_url": "http://arxiv.org/abs/2401.11188v1",
        "pdf_url": "http://arxiv.org/pdf/2401.11188v1",
        "primary_category": "cs.LG",
        "categories": ["cs.LG", "cs.AI"],
        "generated_summary": null
      }
    ]
  },
  "error": null
}

Project Structure

mcp-research-server/    # Main project folder (ensure this matches your folder name)
├── .vscode/
│   └── settings.json
├── app/
│   ├── __init__.py
│   ├── main.py            # FastAPI app, routes to /mcp/tools
│   ├── models.py          # Pydantic models for all tools
│   ├── services/          # Package for service logic
│   │   ├── __init__.py
│   │   ├── news_service.py
│   │   └── arxiv_service.py
│   └── core/
│       ├── __init__.py
│       └── config.py
├── tests/
│   ├── __init__.py
│   # ... placeholder for test files ...
├── .env                   # Local environment variables (NOT COMMITTED)
├── .gitignore
├── CHANGELOG.md
├── requirements.txt
├── README.md              # This file
# ... other root files like news_agent_cli.py (placeholder), test_mcp_client_news.py (placeholder) ...

License

(Specify your license, e.g., MIT License)