web-crawler-mcp-server

3.2

If you are the rightful owner of web-crawler-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A Model Context Protocol (MCP) server that provides a web crawling and content extraction tool for AI assistants such as Claude Desktop, Cursor, and other MCP-compatible clients.

Web Crawler MCP Server

A Model Context Protocol (MCP) server that provides a web crawling and content extraction tool for AI assistants such as Claude Desktop, Cursor, and other MCP-compatible clients.

Features

Extracts and cleans main text content from any public web page.
Uses Puppeteer with stealth plugin to bypass anti-bot protections.
Returns readable, whitespace-normalized text for LLM consumption.
Easy integration with Claude Desktop and other MCP clients.

Prerequisites

Node.js (v16 or higher)
MCP-compatible client (e.g., Claude Desktop, Cursor)
(Optional) Puppeteer dependencies for some Linux environments

Installation

Install dependencies:
```
npm install
```
Build the server:
```
npm run build
```

Usage

You can run the server directly:

node build/index.js

Or configure it as an MCP server in your client (e.g., Claude Desktop):

{
  "mcpServers": {
    "web-crawler-mcp": {
      "command": "node",
      "args": ["<absolute-path-to>/server/web_crawler/build/index.js"]
    }
  }
}

Available Tool

web-crawler

Description: Extracts and returns the cleaned text content from a specified URL.
Input:
- url (string, required): The URL to extract content from.

Example

{
  "tool_name": "web-crawler",
  "arguments": {
    "url": "https://openai.com/news"
  }
}

Development

npm run build — Compile TypeScript to JavaScript.
npm run watch — Watch and rebuild on changes.
npm run inspector — Launch MCP Inspector for debugging.

Notes

The server launches a real browser instance (headless: false) for best compatibility.
Output is plain text, suitable for LLM input.
For advanced parsing, modify the Cheerio logic in src/index.ts.

License

MIT

Related MCP Servers

View all browser_automation servers →

firecrawl-mcp-server

4.7

by mendableai

Firecrawl MCP Server is a Model Context Protocol server implementation that integrates with Firecrawl for web scraping capabilities.

web-crawler-mcp-server

Web Crawler MCP Server

Features

Prerequisites

Installation

Usage

Available Tool

Example

Development

Notes

License

Related MCP Servers

firecrawl-mcp-server

brightdata-mcp

Fetch

web-eval-agent

browser-tools-mcp

playwright-mcp

browser-use-mcp-server

mcp-browser-use

RedNote-MCP

duckduckgo-mcp-server

moling

mcp-selenium

Redbook-Search-Comment-MCP2.0

playwright-plus-python-mcp

g-search-mcp

puppeteer-mcp-server

crawl4ai-mcp-server

browser-mcp

mcp-ui

jcrawl4ai-mcp-server

markmap-mcp-server

playwright-mcp

bing-search-mcp

editor-mcp-server

browser-control-mcp

Sketch-Context-MCP

omniparser-autogui-mcp