ajm2004/WikiArxivMCP
If you are the rightful owner of WikiArxivMCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
WikiArxivMCP is a comprehensive Model Context Protocol server that provides unified access to Wikipedia and arXiv data through a standardized MCP interface and a modern web application with AI-powered synthesis capabilities.
WikiArxivMCP - Intelligent Research Platform
A comprehensive Model Context Protocol (MCP) server that provides unified access to Wikipedia and arXiv data through both a standardized MCP interface and a modern web application with AI-powered synthesis capabilities.
Overview
WikiArxivMCP is a full-stack research platform that combines:
- MCP Server: Standards-compliant Model Context Protocol server for programmatic access
- Web Frontend: Modern React-based interface for interactive research
- AI Integration: Gemini AI-powered synthesis for comprehensive research insights
- Unified Search: Simultaneous querying of Wikipedia and arXiv databases
- Caching System: Intelligent caching for improved performance and reduced API calls
Architecture
System Components
┌─────────────────────────────────────────────────────────────────┐
│ WikiArxivMCP Platform │
├─────────────────────────────────────────────────────────────────┤
│ Frontend Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ React App │ │ Vite Server │ │ Express API │ │
│ │ (Port 3002) │ │ Development │ │ (Port 3001) │ │
│ │ │ │ │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ MCP Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ MCP Server │ │ MCP Client │ │ MCP Tools │ │
│ │ (stdio) │ │ Interface │ │ & Resources │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Service Layer │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Wikipedia │ │ arXiv │ │ HTTP Client │ │
│ │ Service │ │ Service │ │ with Caching │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ External APIs │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Wikipedia API │ │ arXiv API │ │ Gemini AI │ │
│ │ REST Endpoints │ │ XML Feed │ │ API │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Data Flow Architecture
Current Implementation - MCP-Based Search Flow:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ User Input │───▶│ Frontend │───▶│ Express API │
│ (Web Form) │ │ (React) │ │ (/api/search)│
└──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Search │◀───│ MCP Client │◀───│ API Router │
│ Results │ │ (JSON-RPC) │ │ Layer │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Gemini AI │ │ MCP Server │ │ Fallback │
│ Synthesis │ │ (stdio) │ │ Direct Calls │
└──────────────┘ └──────────────┘ └──────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Service │ │ Wikipedia & │
│ Layer │ │ arXiv APIs │
└──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ Wikipedia & │
│ arXiv APIs │
└──────────────┘
Key Features:
- Primary Path: Web app → Express API → MCP Client → MCP Server → Services → External APIs
- Fallback Path: If MCP fails, automatically falls back to direct service calls
- Robust Error Handling: System continues to work even if MCP layer encounters issues
MCP Protocol Flow
MCP Client Communication:
┌─────────────┐ JSON-RPC ┌─────────────┐
│ MCP Client │ ◀──────────────▶ │ MCP Server │
│ │ over stdio │ │
└─────────────┘ └─────────────┘
│ │
│ tools/call │
│ resources/read │
│ ▼
│ ┌─────────────┐
│ │ Tool │
│ │ Handlers │
│ └─────────────┘
│ │
│ ▼
│ ┌─────────────┐
│ JSON Response │ Service │
└◀───────────────────────│ Layer │
└─────────────┘
Installation and Setup
Prerequisites
- Node.js (version 18 or higher)
- npm or yarn package manager
- Optional: Gemini API key for AI synthesis features
Installation Steps
-
Clone and Install Dependencies
git clone <repository-url> cd WikiArxivMCP npm install -
Build the MCP Server
npm run build -
Install Frontend Dependencies
cd frontend npm install -
Configure Environment Variables
Create a
.envfile in the frontend directory:# Optional: Gemini AI API Key for synthesis features GEMINI_API_KEY=your_gemini_api_key_here # Optional: Cache configuration CACHE_SIZE=1000 CACHE_TTL_SECONDS=600Get your Gemini API key from: https://makersuite.google.com/app/apikey
Usage
Web Application (Recommended)
-
Start the Backend API Server
cd frontend node server/index.jsServer will start on http://localhost:3001
-
Start the Frontend Development Server
# In a new terminal cd frontend npm run devApplication will be available at http://localhost:3002
-
Access the Application Open your browser and navigate to http://localhost:3002
MCP Server (Programmatic Access)
For direct MCP client integration:
# Start the MCP server
npm start
The server communicates via stdio using the Model Context Protocol JSON-RPC format.
Features and Capabilities
Search Functionality
Wikipedia Search
- Title-based search with relevance ranking
- Rich metadata including descriptions and thumbnails
- Direct page access and summary retrieval
- Configurable result limits (default: 8 results)
arXiv Paper Search
- Full-text search across paper titles, abstracts, and authors
- Comprehensive metadata including author lists, publication dates
- Direct PDF access links
- Category and classification information
- Configurable result limits (default: 5 results)
AI-Powered Synthesis
When a Gemini API key is configured, the system provides:
- Comprehensive research synthesis combining Wikipedia and arXiv sources
- Source citation and attribution
- Structured, accessible explanations
- Cross-reference linking between sources
Web Interface Features
Search Interface
- Unified search box for both Wikipedia and arXiv
- Advanced filtering options
- Real-time search suggestions
- Search history management
Results Display
- Tabbed interface for Wikipedia and arXiv results
- Rich preview cards with metadata
- Bookmark and sharing functionality
- Export capabilities
AI Synthesis View
- Markdown-formatted comprehensive answers
- Source attribution and linking
- Copy and share functionality
API Reference
MCP Tools
search_wikipedia
Search Wikipedia articles by title.
Request:
{
"method": "tools/call",
"params": {
"name": "search_wikipedia",
"arguments": {
"query": "quantum computing"
}
}
}
Response:
{
"content": [
{
"type": "text",
"text": "[{\"title\": \"Quantum computing\", \"description\": \"Computing using quantum phenomena\", \"url\": \"https://en.wikipedia.org/wiki/Quantum_computing\"}]"
}
]
}
search_arxiv
Search arXiv papers by query.
Request:
{
"method": "tools/call",
"params": {
"name": "search_arxiv",
"arguments": {
"query": "machine learning",
"max_results": 5
}
}
}
Response:
{
"content": [
{
"type": "text",
"text": "[{\"id\": \"2103.12345\", \"title\": \"Advanced ML Techniques\", \"authors\": [\"John Doe\"], \"summary\": \"Paper summary...\", \"pdf\": \"http://arxiv.org/pdf/2103.12345.pdf\"}]"
}
]
}
MCP Resources
wikipedia:/page/
Retrieve a specific Wikipedia page summary.
URI Pattern: wikipedia:/page/{title}
Example: wikipedia:/page/Artificial Intelligence
arxiv:/id/
Retrieve a specific arXiv paper by ID.
URI Pattern: arxiv:/id/{id}
Example: arxiv:/id/2103.12345
REST API Endpoints
POST /api/search
Unified search endpoint for both Wikipedia and arXiv.
Request Body:
{
"query": "cloud computing",
"useGemini": true,
"filters": {
"resultLimit": 10,
"searchType": "all"
}
}
Response:
{
"query": "cloud computing",
"wikipedia": [...],
"arxiv": [...],
"synthesis": "AI-generated comprehensive answer...",
"geminiEnabled": true
}
GET /api/health
Health check endpoint.
Response:
{
"status": "ok",
"geminiEnabled": true,
"mcpConnected": true
}
Configuration
Environment Variables
HTTP Configuration:
HTTP_USER_AGENT: Custom user agent for external API requestsHTTP_TIMEOUT_MS: Request timeout in milliseconds (default: 10000)
Cache Configuration:
CACHE_SIZE: Maximum number of cached items (default: 1000)CACHE_TTL_SECONDS: Cache time-to-live in seconds (default: 600)
AI Integration:
GEMINI_API_KEY: Google Gemini API key for synthesis features
Performance Tuning
Cache Settings
# High-performance configuration
CACHE_SIZE=5000
CACHE_TTL_SECONDS=1800
# Memory-constrained configuration
CACHE_SIZE=100
CACHE_TTL_SECONDS=300
Rate Limiting
The system includes built-in rate limiting to prevent API abuse:
- Wikipedia API: 10 requests per second
- arXiv API: 3 requests per second
- Automatic retry with exponential backoff
Development
Project Structure
WikiArxivMCP/
├── src/ # MCP server source code
│ ├── index.ts # Main MCP server entry point
│ ├── wikipedia-service.ts # Wikipedia API integration
│ ├── arxiv-service.ts # arXiv API integration
│ ├── http-client.ts # HTTP client with caching
│ ├── cache-service.ts # LRU cache implementation
│ └── schemas.ts # TypeScript type definitions
├── frontend/ # Web application
│ ├── src/ # React frontend source
│ ├── server/ # Express.js backend
│ └── public/ # Static assets
├── dist/ # Compiled JavaScript output
├── package.json # Node.js dependencies and scripts
└── tsconfig.json # TypeScript configuration
Development Commands
# Build the MCP server
npm run build
# Development mode with auto-reload
npm run dev
# Type checking
npm run type-check
# Run tests
npm test
Frontend Development
cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run build
# Start production server
npm start
Error Handling
The system includes comprehensive error handling:
Network Errors
- Automatic retry with exponential backoff
- Graceful degradation when external APIs are unavailable
- Timeout handling for slow responses
Data Validation
- Input validation using Zod schemas
- Type-safe data structures throughout
- Sanitized error messages for security
Rate Limit Protection
- Built-in protection against API rate limits
- Intelligent request queuing
- User feedback for rate limit scenarios
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes with appropriate tests
- Ensure TypeScript compilation:
npm run build - Submit a pull request with a clear description
License
MIT License - see LICENSE file for details.
Support
For issues, questions, or contributions, please refer to the project's issue tracker or documentation.