web-research-mcp-server by CamC9 - MCP Server

Web Research MCP Server

A comprehensive Model Context Protocol (MCP) server that provides internet research capabilities for AI assistants. This server enables AI agents to search the web, scrape websites, and fetch content from URLs.

Features

🔍 Web Search

Search the web using DuckDuckGo
Customizable result limits (1-20 results)
Returns structured results with titles, URLs, and snippets
Fallback search mechanisms for better reliability

🌐 Website Scraping

Extract content from any website
Smart content extraction using multiple selectors
Optional link and image extraction
Handles relative to absolute URL conversion
Content length limits for optimal performance

📡 URL Fetching

Fetch raw content from any URL
Support for text, JSON, and HTML formats
Useful for APIs, RSS feeds, and structured data
Returns status codes and headers

Installation

Prerequisites

Node.js 16 or higher
npm or yarn

Setup

Clone or download this repository
Install dependencies:
```
npm install
```
Build the server:
```
npm run build
```

Configuration

Add the server to your MCP client configuration:

For Cline/Claude Dev

Add to your MCP settings file:

{
  "mcpServers": {
    "web-research-server": {
      "disabled": false,
      "autoApprove": [],
      "command": "node",
      "args": ["/path/to/web-research-server/build/index.js"]
    }
  }
}

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "web-research-server": {
      "command": "node",
      "args": ["/path/to/web-research-server/build/index.js"]
    }
  }
}

Usage

Once configured, you can use the following tools:

Web Search

Search for "latest AI developments 2024" with 5 results

Website Scraping

Scrape the content from https://example.com and extract all links

URL Fetching

Fetch the JSON data from https://api.example.com/data

Tools Reference

`web_search`

Search the web using DuckDuckGo.

Parameters:

query (string, required): Search query
limit (number, optional): Maximum results (1-20, default: 10)

`scrape_website`

Extract content from a website.

Parameters:

url (string, required): Website URL to scrape
extract_links (boolean, optional): Extract all links (default: false)
extract_images (boolean, optional): Extract all images (default: false)

`fetch_url`

Fetch raw content from a URL.

Parameters:

url (string, required): URL to fetch
format (string, optional): Expected format - "text", "json", or "html" (default: "text")

Development

Building

npm run build

Watching for changes

npm run watch

Technical Details

Built with TypeScript and the MCP SDK
Uses axios for HTTP requests
Uses cheerio for HTML parsing
Implements proper error handling and timeouts
Includes content length limits for performance
Supports graceful shutdown

Security Features

Request timeouts (30 seconds)
Content length limits
URL validation
Error handling for malformed requests
No automatic approval for security

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

If you encounter any issues or have questions, please open an issue on the GitHub repository.