guangxiangdebizi/cleanweb-mcp
If you are the rightful owner of cleanweb-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
CleanWeb MCP is a lightweight Model Context Protocol server designed to extract and clean web content, converting it into a clean Markdown format.
๐ CleanWeb MCP
A lightweight Model Context Protocol (MCP) server
Specialized in intelligently extracting core web content, automatically filtering ads and irrelevant elements, and converting to clean Markdown format
๐ Quick Start โข ๐ Documentation โข ๐ง Configuration โข ๐ค Contributing
โจ Features
๐ Smart Extraction | ๐งน Content Cleaning | ๐ Format Conversion | โก Lightweight Deploy |
---|---|---|---|
Axios + Cheerio + Readability | Auto-filter ads & distractions | HTML โ Markdown | Zero browser dependency |
๐ฏ Core Advantages
- ๐ Smart Content Extraction: Uses Axios + Cheerio + Readability algorithm to extract main web content
- ๐งน Intelligent Content Cleaning: Automatically removes ads, navigation, sidebars and other distracting elements
- ๐ Markdown Conversion: Converts HTML content to clean Markdown format
- ๐ผ๏ธ Image Link Optimization: Automatically handles overly long image links for better readability
- โก Lightweight Deployment: No browser dependencies, simple and fast deployment
- ๐ง Multiple Output Formats: Supports pure Markdown or JSON format with metadata
- ๐ MCP Protocol: Fully compatible with Model Context Protocol standard
๐ ๏ธ Tech Stack
๐ Quick Start
๐ฆ Installation
# Install from npm
npm install cleanweb-mcp
# Or clone the repository
git clone https://github.com/guangxiangdebizi/cleanweb-mcp.git
cd cleanweb-mcp
npm install
๐ก Advantage: Uses lightweight HTTP client, no browser download required, simpler deployment! Focused on content cleaning and optimization.
๐ง Build Project
npm run build
๐ฏ Usage
1. Stdio Mode (Local Development)
npm run mcp:stdio
2. SSE Mode (via Supergateway)
npm run mcp:sse
Server will start at http://localhost:3100/sse
3. WebSocket Mode
npm run mcp:ws
4. Development Mode (Watch file changes)
npm run mcp:dev
๐ ๏ธ Claude Configuration
Stdio Mode Configuration
Add to Claude's configuration file:
{
"mcpServers": {
"cleanweb-mcp": {
"command": "node",
"args": ["path/to/your/project/build/index.js"]
}
}
}
SSE Mode Configuration
{
"mcpServers": {
"cleanweb-mcp-sse": {
"type": "sse",
"url": "http://localhost:3100/sse",
"timeout": 600
}
}
}
๐จ API Reference
extract_web_content
Intelligently extract web content and convert to Markdown format.
Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
url | string | โ | - | The web URL to extract content from |
format | string | โ | markdown | Return format: markdown or json |
timeout | number | โ | 30000 | Page loading timeout (milliseconds) |
Usage Examples
// Basic usage
extract_web_content({
url: "https://example.com/article"
})
// Advanced usage
extract_web_content({
url: "https://example.com/article",
format: "json",
timeout: 60000
})
๐ Project Structure
cleanweb-mcp/
โโโ ๐ README.md # Project documentation
โโโ ๐ฆ package.json # Project configuration
โโโ โ๏ธ tsconfig.json # TypeScript configuration
โโโ ๐ง claude-config-example.json # Claude configuration example
โโโ ๐ example-usage.md # Usage examples
โโโ ๐๏ธ build/ # Compiled output
โ โโโ index.js
โ โโโ tools/
โ โโโ web-content-extractor.js
โโโ ๐ src/ # Source code
โโโ index.ts # MCP server main entry
โโโ tools/
โโโ web-content-extractor.ts # Web content extraction tool
๐ Migration from Express Server
The original Express server (server.js
) can still run independently:
npm start
The MCP version provides the same core functionality but integrates with AI assistants through the MCP protocol.
๐จ Important Notes
- Lightweight Implementation: Uses HTTP client to fetch static content, no browser dependencies required
- Network Access: Requires access to target websites
- Static Content: Primarily suitable for static HTML content, dynamically rendered content may not be accessible
- Timeout Settings: For slow-loading websites, you can appropriately increase the timeout parameter
- Content Optimization: Automatically optimizes image link display for better readability
๐ค Contributing
Welcome to submit Issues and Pull Requests! If you have any questions or suggestions, feel free to contact me.
๐ Contact
- GitHub: guangxiangdebizi
- Email: guangxiangdebizi@gmail.com
- LinkedIn: Xingyu Chen
- NPM: @xingyuchen
๐ Related Links
- GitHub Repository: https://github.com/guangxiangdebizi/cleanweb-mcp
- NPM Package: https://www.npmjs.com/package/cleanweb-mcp
๐ License
MIT License - See file for details