CamC9/web-research-mcp-server
If you are the rightful owner of web-research-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Web Research MCP Server is a comprehensive tool designed to enhance AI assistants' internet research capabilities by enabling web searches, website scraping, and URL content fetching.
Web Research MCP Server
A comprehensive Model Context Protocol (MCP) server that provides internet research capabilities for AI assistants. This server enables AI agents to search the web, scrape websites, and fetch content from URLs.
Features
🔍 Web Search
- Search the web using DuckDuckGo
- Customizable result limits (1-20 results)
- Returns structured results with titles, URLs, and snippets
- Fallback search mechanisms for better reliability
🌐 Website Scraping
- Extract content from any website
- Smart content extraction using multiple selectors
- Optional link and image extraction
- Handles relative to absolute URL conversion
- Content length limits for optimal performance
📡 URL Fetching
- Fetch raw content from any URL
- Support for text, JSON, and HTML formats
- Useful for APIs, RSS feeds, and structured data
- Returns status codes and headers
Installation
Prerequisites
- Node.js 16 or higher
- npm or yarn
Setup
- Clone or download this repository
- Install dependencies:
npm install
- Build the server:
npm run build
Configuration
Add the server to your MCP client configuration:
For Cline/Claude Dev
Add to your MCP settings file:
{
"mcpServers": {
"web-research-server": {
"disabled": false,
"autoApprove": [],
"command": "node",
"args": ["/path/to/web-research-server/build/index.js"]
}
}
}
For Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json
(macOS):
{
"mcpServers": {
"web-research-server": {
"command": "node",
"args": ["/path/to/web-research-server/build/index.js"]
}
}
}
Usage
Once configured, you can use the following tools:
Web Search
Search for "latest AI developments 2024" with 5 results
Website Scraping
Scrape the content from https://example.com and extract all links
URL Fetching
Fetch the JSON data from https://api.example.com/data
Tools Reference
web_search
Search the web using DuckDuckGo.
Parameters:
query
(string, required): Search querylimit
(number, optional): Maximum results (1-20, default: 10)
scrape_website
Extract content from a website.
Parameters:
url
(string, required): Website URL to scrapeextract_links
(boolean, optional): Extract all links (default: false)extract_images
(boolean, optional): Extract all images (default: false)
fetch_url
Fetch raw content from a URL.
Parameters:
url
(string, required): URL to fetchformat
(string, optional): Expected format - "text", "json", or "html" (default: "text")
Development
Building
npm run build
Watching for changes
npm run watch
Technical Details
- Built with TypeScript and the MCP SDK
- Uses axios for HTTP requests
- Uses cheerio for HTML parsing
- Implements proper error handling and timeouts
- Includes content length limits for performance
- Supports graceful shutdown
Security Features
- Request timeouts (30 seconds)
- Content length limits
- URL validation
- Error handling for malformed requests
- No automatic approval for security
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
If you encounter any issues or have questions, please open an issue on the GitHub repository.