WikiArxivMCP by ajm2004 - MCP Server

WikiArxivMCP - Intelligent Research Platform

A comprehensive Model Context Protocol (MCP) server that provides unified access to Wikipedia and arXiv data through both a standardized MCP interface and a modern web application with AI-powered synthesis capabilities.

Overview

WikiArxivMCP is a full-stack research platform that combines:

MCP Server: Standards-compliant Model Context Protocol server for programmatic access
Web Frontend: Modern React-based interface for interactive research
AI Integration: Gemini AI-powered synthesis for comprehensive research insights
Unified Search: Simultaneous querying of Wikipedia and arXiv databases
Caching System: Intelligent caching for improved performance and reduced API calls

Architecture

System Components

┌─────────────────────────────────────────────────────────────────┐
│                    WikiArxivMCP Platform                       │
├─────────────────────────────────────────────────────────────────┤
│  Frontend Layer                                                │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │   React App     │  │   Vite Server   │  │  Express API    │ │
│  │   (Port 3002)   │  │   Development   │  │  (Port 3001)    │ │
│  │                 │  │                 │  │                 │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│  MCP Layer                                                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │   MCP Server    │  │   MCP Client    │  │   MCP Tools     │ │
│  │   (stdio)       │  │   Interface     │  │   & Resources   │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│  Service Layer                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │ Wikipedia       │  │   arXiv         │  │   HTTP Client   │ │
│  │ Service         │  │   Service       │  │   with Caching  │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│  External APIs                                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │ Wikipedia API   │  │   arXiv API     │  │   Gemini AI     │ │
│  │ REST Endpoints  │  │   XML Feed      │  │   API           │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Data Flow Architecture

Current Implementation - MCP-Based Search Flow:
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ User Input   │───▶│ Frontend     │───▶│ Express API  │
│ (Web Form)   │    │ (React)      │    │ (/api/search)│
└──────────────┘    └──────────────┘    └──────────────┘
                                               │
                                               ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Search       │◀───│ MCP Client   │◀───│ API Router   │
│ Results      │    │ (JSON-RPC)   │    │ Layer        │
└──────────────┘    └──────────────┘    └──────────────┘
       │                                       │
       ▼                                       ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Gemini AI    │    │ MCP Server   │    │ Fallback     │
│ Synthesis    │    │ (stdio)      │    │ Direct Calls │
└──────────────┘    └──────────────┘    └──────────────┘
                           │                     │
                           ▼                     ▼
                    ┌──────────────┐    ┌──────────────┐
                    │ Service      │    │ Wikipedia &  │
                    │ Layer        │    │ arXiv APIs   │
                    └──────────────┘    └──────────────┘
                           │                     
                           ▼                     
                    ┌──────────────┐            
                    │ Wikipedia &  │            
                    │ arXiv APIs   │            
                    └──────────────┘

Key Features:

Primary Path: Web app → Express API → MCP Client → MCP Server → Services → External APIs
Fallback Path: If MCP fails, automatically falls back to direct service calls
Robust Error Handling: System continues to work even if MCP layer encounters issues

MCP Protocol Flow

MCP Client Communication:
┌─────────────┐    JSON-RPC     ┌─────────────┐
│ MCP Client  │ ◀──────────────▶ │ MCP Server  │
│             │     over stdio  │             │
└─────────────┘                 └─────────────┘
       │                               │
       │ tools/call                    │
       │ resources/read                │
       │                               ▼
       │                        ┌─────────────┐
       │                        │ Tool        │
       │                        │ Handlers    │
       │                        └─────────────┘
       │                               │
       │                               ▼
       │                        ┌─────────────┐
       │ JSON Response          │ Service     │
       └◀───────────────────────│ Layer       │
                                └─────────────┘

Installation and Setup

Prerequisites

Node.js (version 18 or higher)
npm or yarn package manager
Optional: Gemini API key for AI synthesis features

Installation Steps

Clone and Install Dependencies

git clone <repository-url>
cd WikiArxivMCP
npm install

Build the MCP Server
```
npm run build
```
Install Frontend Dependencies
```
cd frontend
npm install
```

Configure Environment Variables

Create a .env file in the frontend directory:

# Optional: Gemini AI API Key for synthesis features
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Cache configuration
CACHE_SIZE=1000
CACHE_TTL_SECONDS=600

Get your Gemini API key from: https://makersuite.google.com/app/apikey

Usage

Web Application (Recommended)

Start the Backend API Server
```
cd frontend
node server/index.js
```
Server will start on http://localhost:3001
Start the Frontend Development Server
```
# In a new terminal
cd frontend
npm run dev
```
Application will be available at http://localhost:3002
Access the Application Open your browser and navigate to http://localhost:3002

MCP Server (Programmatic Access)

For direct MCP client integration:

# Start the MCP server
npm start

The server communicates via stdio using the Model Context Protocol JSON-RPC format.

Features and Capabilities

Search Functionality

Wikipedia Search

Title-based search with relevance ranking
Rich metadata including descriptions and thumbnails
Direct page access and summary retrieval
Configurable result limits (default: 8 results)

arXiv Paper Search

Full-text search across paper titles, abstracts, and authors
Comprehensive metadata including author lists, publication dates
Direct PDF access links
Category and classification information
Configurable result limits (default: 5 results)

AI-Powered Synthesis

When a Gemini API key is configured, the system provides:

Comprehensive research synthesis combining Wikipedia and arXiv sources
Source citation and attribution
Structured, accessible explanations
Cross-reference linking between sources

Web Interface Features

Search Interface

Unified search box for both Wikipedia and arXiv
Advanced filtering options
Real-time search suggestions
Search history management

Results Display

Tabbed interface for Wikipedia and arXiv results
Rich preview cards with metadata
Bookmark and sharing functionality
Export capabilities

AI Synthesis View

Markdown-formatted comprehensive answers
Source attribution and linking
Copy and share functionality

API Reference

MCP Tools

search_wikipedia

Search Wikipedia articles by title.

Request:

{
  "method": "tools/call",
  "params": {
    "name": "search_wikipedia",
    "arguments": {
      "query": "quantum computing"
    }
  }
}

Response:

{
  "content": [
    {
      "type": "text",
      "text": "[{\"title\": \"Quantum computing\", \"description\": \"Computing using quantum phenomena\", \"url\": \"https://en.wikipedia.org/wiki/Quantum_computing\"}]"
    }
  ]
}

search_arxiv

Search arXiv papers by query.

Request:

{
  "method": "tools/call",
  "params": {
    "name": "search_arxiv",
    "arguments": {
      "query": "machine learning",
      "max_results": 5
    }
  }
}

Response:

{
  "content": [
    {
      "type": "text",
      "text": "[{\"id\": \"2103.12345\", \"title\": \"Advanced ML Techniques\", \"authors\": [\"John Doe\"], \"summary\": \"Paper summary...\", \"pdf\": \"http://arxiv.org/pdf/2103.12345.pdf\"}]"
    }
  ]
}

MCP Resources

wikipedia:/page/

Retrieve a specific Wikipedia page summary.

URI Pattern: wikipedia:/page/{title}

Example: wikipedia:/page/Artificial Intelligence

arxiv:/id/

Retrieve a specific arXiv paper by ID.

URI Pattern: arxiv:/id/{id}

Example: arxiv:/id/2103.12345

REST API Endpoints

POST /api/search

Unified search endpoint for both Wikipedia and arXiv.

Request Body:

{
  "query": "cloud computing",
  "useGemini": true,
  "filters": {
    "resultLimit": 10,
    "searchType": "all"
  }
}

Response:

{
  "query": "cloud computing",
  "wikipedia": [...],
  "arxiv": [...],
  "synthesis": "AI-generated comprehensive answer...",
  "geminiEnabled": true
}

GET /api/health

Health check endpoint.

Response:

{
  "status": "ok",
  "geminiEnabled": true,
  "mcpConnected": true
}

Configuration

Environment Variables

HTTP Configuration:

HTTP_USER_AGENT: Custom user agent for external API requests
HTTP_TIMEOUT_MS: Request timeout in milliseconds (default: 10000)

Cache Configuration:

CACHE_SIZE: Maximum number of cached items (default: 1000)
CACHE_TTL_SECONDS: Cache time-to-live in seconds (default: 600)

AI Integration:

GEMINI_API_KEY: Google Gemini API key for synthesis features

Performance Tuning

Cache Settings

# High-performance configuration
CACHE_SIZE=5000
CACHE_TTL_SECONDS=1800

# Memory-constrained configuration
CACHE_SIZE=100
CACHE_TTL_SECONDS=300

Rate Limiting

The system includes built-in rate limiting to prevent API abuse:

Wikipedia API: 10 requests per second
arXiv API: 3 requests per second
Automatic retry with exponential backoff

Development

Project Structure

WikiArxivMCP/
├── src/                    # MCP server source code
│   ├── index.ts           # Main MCP server entry point
│   ├── wikipedia-service.ts # Wikipedia API integration
│   ├── arxiv-service.ts   # arXiv API integration
│   ├── http-client.ts     # HTTP client with caching
│   ├── cache-service.ts   # LRU cache implementation
│   └── schemas.ts         # TypeScript type definitions
├── frontend/              # Web application
│   ├── src/              # React frontend source
│   ├── server/           # Express.js backend
│   └── public/           # Static assets
├── dist/                 # Compiled JavaScript output
├── package.json          # Node.js dependencies and scripts
└── tsconfig.json         # TypeScript configuration

Development Commands

# Build the MCP server
npm run build

# Development mode with auto-reload
npm run dev

# Type checking
npm run type-check

# Run tests
npm test

Frontend Development

cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

# Start production server
npm start

Error Handling

The system includes comprehensive error handling:

Network Errors

Automatic retry with exponential backoff
Graceful degradation when external APIs are unavailable
Timeout handling for slow responses

Data Validation

Input validation using Zod schemas
Type-safe data structures throughout
Sanitized error messages for security

Rate Limit Protection

Built-in protection against API rate limits
Intelligent request queuing
User feedback for rate limit scenarios

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes with appropriate tests
Ensure TypeScript compilation: npm run build
Submit a pull request with a clear description

License

MIT License - see LICENSE file for details.

Support

For issues, questions, or contributions, please refer to the project's issue tracker or documentation.