WikiArxivMCP

ajm2004/WikiArxivMCP

3.1

If you are the rightful owner of WikiArxivMCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

WikiArxivMCP is a comprehensive Model Context Protocol server that provides unified access to Wikipedia and arXiv data through a standardized MCP interface and a modern web application with AI-powered synthesis capabilities.

Tools
2
Resources
0
Prompts
0

WikiArxivMCP - Intelligent Research Platform

A comprehensive Model Context Protocol (MCP) server that provides unified access to Wikipedia and arXiv data through both a standardized MCP interface and a modern web application with AI-powered synthesis capabilities.

Overview

WikiArxivMCP is a full-stack research platform that combines:

  • MCP Server: Standards-compliant Model Context Protocol server for programmatic access
  • Web Frontend: Modern React-based interface for interactive research
  • AI Integration: Gemini AI-powered synthesis for comprehensive research insights
  • Unified Search: Simultaneous querying of Wikipedia and arXiv databases
  • Caching System: Intelligent caching for improved performance and reduced API calls

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                    WikiArxivMCP Platform                       │
├─────────────────────────────────────────────────────────────────┤
│  Frontend Layer                                                │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │   React App     │  │   Vite Server   │  │  Express API    │ │
│  │   (Port 3002)   │  │   Development   │  │  (Port 3001)    │ │
│  │                 │  │                 │  │                 │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│  MCP Layer                                                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │   MCP Server    │  │   MCP Client    │  │   MCP Tools     │ │
│  │   (stdio)       │  │   Interface     │  │   & Resources   │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│  Service Layer                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │ Wikipedia       │  │   arXiv         │  │   HTTP Client   │ │
│  │ Service         │  │   Service       │  │   with Caching  │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│  External APIs                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │ Wikipedia API   │  │   arXiv API     │  │   Gemini AI     │ │
│  │ REST Endpoints  │  │   XML Feed      │  │   API           │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Data Flow Architecture

Current Implementation - MCP-Based Search Flow:
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ User Input   │───▶│ Frontend     │───▶│ Express API  │
│ (Web Form)   │    │ (React)      │    │ (/api/search)│
└──────────────┘    └──────────────┘    └──────────────┘
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Search       │◀───│ MCP Client   │◀───│ API Router   │
│ Results      │    │ (JSON-RPC)   │    │ Layer        │
└──────────────┘    └──────────────┘    └──────────────┘
       │                                       │
       ▼                                       ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Gemini AI    │    │ MCP Server   │    │ Fallback     │
│ Synthesis    │    │ (stdio)      │    │ Direct Calls │
└──────────────┘    └──────────────┘    └──────────────┘
                           │                     │
                           ▼                     ▼
                    ┌──────────────┐    ┌──────────────┐
                    │ Service      │    │ Wikipedia &  │
                    │ Layer        │    │ arXiv APIs   │
                    └──────────────┘    └──────────────┘
                    ┌──────────────┐            
                    │ Wikipedia &  │            
                    │ arXiv APIs   │            
                    └──────────────┘            

Key Features:

  • Primary Path: Web app → Express API → MCP Client → MCP Server → Services → External APIs
  • Fallback Path: If MCP fails, automatically falls back to direct service calls
  • Robust Error Handling: System continues to work even if MCP layer encounters issues

MCP Protocol Flow

MCP Client Communication:
┌─────────────┐    JSON-RPC     ┌─────────────┐
│ MCP Client  │ ◀──────────────▶ │ MCP Server  │
│             │     over stdio  │             │
└─────────────┘                 └─────────────┘
       │                               │
       │ tools/call                    │
       │ resources/read                │
       │                               ▼
       │                        ┌─────────────┐
       │                        │ Tool        │
       │                        │ Handlers    │
       │                        └─────────────┘
       │                               │
       │                               ▼
       │                        ┌─────────────┐
       │ JSON Response          │ Service     │
       └◀───────────────────────│ Layer       │
                                └─────────────┘

Installation and Setup

Prerequisites

  • Node.js (version 18 or higher)
  • npm or yarn package manager
  • Optional: Gemini API key for AI synthesis features

Installation Steps

  1. Clone and Install Dependencies

    git clone <repository-url>
    cd WikiArxivMCP
    npm install
    
  2. Build the MCP Server

    npm run build
    
  3. Install Frontend Dependencies

    cd frontend
    npm install
    
  4. Configure Environment Variables

    Create a .env file in the frontend directory:

    # Optional: Gemini AI API Key for synthesis features
    GEMINI_API_KEY=your_gemini_api_key_here
    
    # Optional: Cache configuration
    CACHE_SIZE=1000
    CACHE_TTL_SECONDS=600
    

    Get your Gemini API key from: https://makersuite.google.com/app/apikey

Usage

Web Application (Recommended)

  1. Start the Backend API Server

    cd frontend
    node server/index.js
    

    Server will start on http://localhost:3001

  2. Start the Frontend Development Server

    # In a new terminal
    cd frontend
    npm run dev
    

    Application will be available at http://localhost:3002

  3. Access the Application Open your browser and navigate to http://localhost:3002

MCP Server (Programmatic Access)

For direct MCP client integration:

# Start the MCP server
npm start

The server communicates via stdio using the Model Context Protocol JSON-RPC format.

Features and Capabilities

Search Functionality

Wikipedia Search
  • Title-based search with relevance ranking
  • Rich metadata including descriptions and thumbnails
  • Direct page access and summary retrieval
  • Configurable result limits (default: 8 results)
arXiv Paper Search
  • Full-text search across paper titles, abstracts, and authors
  • Comprehensive metadata including author lists, publication dates
  • Direct PDF access links
  • Category and classification information
  • Configurable result limits (default: 5 results)

AI-Powered Synthesis

When a Gemini API key is configured, the system provides:

  • Comprehensive research synthesis combining Wikipedia and arXiv sources
  • Source citation and attribution
  • Structured, accessible explanations
  • Cross-reference linking between sources

Web Interface Features

Search Interface
  • Unified search box for both Wikipedia and arXiv
  • Advanced filtering options
  • Real-time search suggestions
  • Search history management
Results Display
  • Tabbed interface for Wikipedia and arXiv results
  • Rich preview cards with metadata
  • Bookmark and sharing functionality
  • Export capabilities
AI Synthesis View
  • Markdown-formatted comprehensive answers
  • Source attribution and linking
  • Copy and share functionality

API Reference

MCP Tools

search_wikipedia

Search Wikipedia articles by title.

Request:

{
  "method": "tools/call",
  "params": {
    "name": "search_wikipedia",
    "arguments": {
      "query": "quantum computing"
    }
  }
}

Response:

{
  "content": [
    {
      "type": "text",
      "text": "[{\"title\": \"Quantum computing\", \"description\": \"Computing using quantum phenomena\", \"url\": \"https://en.wikipedia.org/wiki/Quantum_computing\"}]"
    }
  ]
}
search_arxiv

Search arXiv papers by query.

Request:

{
  "method": "tools/call",
  "params": {
    "name": "search_arxiv",
    "arguments": {
      "query": "machine learning",
      "max_results": 5
    }
  }
}

Response:

{
  "content": [
    {
      "type": "text",
      "text": "[{\"id\": \"2103.12345\", \"title\": \"Advanced ML Techniques\", \"authors\": [\"John Doe\"], \"summary\": \"Paper summary...\", \"pdf\": \"http://arxiv.org/pdf/2103.12345.pdf\"}]"
    }
  ]
}

MCP Resources

wikipedia:/page/

Retrieve a specific Wikipedia page summary.

URI Pattern: wikipedia:/page/{title}

Example: wikipedia:/page/Artificial Intelligence

arxiv:/id/

Retrieve a specific arXiv paper by ID.

URI Pattern: arxiv:/id/{id}

Example: arxiv:/id/2103.12345

REST API Endpoints

POST /api/search

Unified search endpoint for both Wikipedia and arXiv.

Request Body:

{
  "query": "cloud computing",
  "useGemini": true,
  "filters": {
    "resultLimit": 10,
    "searchType": "all"
  }
}

Response:

{
  "query": "cloud computing",
  "wikipedia": [...],
  "arxiv": [...],
  "synthesis": "AI-generated comprehensive answer...",
  "geminiEnabled": true
}
GET /api/health

Health check endpoint.

Response:

{
  "status": "ok",
  "geminiEnabled": true,
  "mcpConnected": true
}

Configuration

Environment Variables

HTTP Configuration:

  • HTTP_USER_AGENT: Custom user agent for external API requests
  • HTTP_TIMEOUT_MS: Request timeout in milliseconds (default: 10000)

Cache Configuration:

  • CACHE_SIZE: Maximum number of cached items (default: 1000)
  • CACHE_TTL_SECONDS: Cache time-to-live in seconds (default: 600)

AI Integration:

  • GEMINI_API_KEY: Google Gemini API key for synthesis features

Performance Tuning

Cache Settings
# High-performance configuration
CACHE_SIZE=5000
CACHE_TTL_SECONDS=1800

# Memory-constrained configuration
CACHE_SIZE=100
CACHE_TTL_SECONDS=300
Rate Limiting

The system includes built-in rate limiting to prevent API abuse:

  • Wikipedia API: 10 requests per second
  • arXiv API: 3 requests per second
  • Automatic retry with exponential backoff

Development

Project Structure

WikiArxivMCP/
├── src/                    # MCP server source code
│   ├── index.ts           # Main MCP server entry point
│   ├── wikipedia-service.ts # Wikipedia API integration
│   ├── arxiv-service.ts   # arXiv API integration
│   ├── http-client.ts     # HTTP client with caching
│   ├── cache-service.ts   # LRU cache implementation
│   └── schemas.ts         # TypeScript type definitions
├── frontend/              # Web application
│   ├── src/              # React frontend source
│   ├── server/           # Express.js backend
│   └── public/           # Static assets
├── dist/                 # Compiled JavaScript output
├── package.json          # Node.js dependencies and scripts
└── tsconfig.json         # TypeScript configuration

Development Commands

# Build the MCP server
npm run build

# Development mode with auto-reload
npm run dev

# Type checking
npm run type-check

# Run tests
npm test

Frontend Development

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Start production server
npm start

Error Handling

The system includes comprehensive error handling:

Network Errors

  • Automatic retry with exponential backoff
  • Graceful degradation when external APIs are unavailable
  • Timeout handling for slow responses

Data Validation

  • Input validation using Zod schemas
  • Type-safe data structures throughout
  • Sanitized error messages for security

Rate Limit Protection

  • Built-in protection against API rate limits
  • Intelligent request queuing
  • User feedback for rate limit scenarios

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with appropriate tests
  4. Ensure TypeScript compilation: npm run build
  5. Submit a pull request with a clear description

License

MIT License - see LICENSE file for details.

Support

For issues, questions, or contributions, please refer to the project's issue tracker or documentation.