LLMTooling/cloudscraper-mcp-server
3.2
If you are the rightful owner of cloudscraper-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
CloudScraper MCP Server is a Model Context Protocol server designed to enable AI agents to bypass Cloudflare protection and efficiently scrape web content.
Tools
3
Resources
0
Prompts
0
CloudScraper MCP Server
A Model Context Protocol server that enables AI agents to bypass Cloudflare protection and scrape web content
Core Features
| Feature | Description |
|---|---|
| Cloudflare Bypass | Automatically handles Cloudflare protection using cloudscraper library |
| Multiple Transports | Supports both stdio and HTTP transport protocols |
| Content Cleaning | Converts HTML to clean, LLM-friendly Markdown format |
| Smart Chunking | Automatically splits large responses into 10k token chunks |
| Docker Support | Production-ready containerized deployment |
| Multiple Methods | Supports GET and POST HTTP methods |
| Binary Handling | Base64 encoding for non-text content |
| File Export | Save scraped content directly to disk |
Available MCP Tools
Tool Comparison
| Tool | Return Type | Use Case | Chunking Support | File Output |
|---|---|---|---|---|
| scrape_url | String (content only) | Quick content retrieval for AI processing | Yes | No |
| scrape_url_raw | Dictionary (metadata + content) | Full response details with headers and timing | Yes | No |
| scrape_url_to_file | Dictionary (save confirmation) | Export content to workspace files | No | Yes |
Shared Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Target URL to scrape |
method | string | No | "GET" | HTTP method (GET or POST) |
clean_content | boolean | No | true | Convert HTML to Markdown |
continuation_token | string | No | null | Token for retrieving next chunk |
scrape_url Response Fields
| Field | Type | Description |
|---|---|---|
| Response | string | Page content with chunk instructions if applicable |
Note: When content exceeds 10k tokens, response includes continuation instructions embedded in the text.
scrape_url_raw Response Fields
| Field | Type | Always Present | Description |
|---|---|---|---|
status_code | integer | Yes | HTTP response status code |
headers | object | Yes | Response headers (hop-by-hop headers removed) |
content | string | Yes | Page content or current chunk |
content_type | string | Yes | MIME type of response |
response_time | number | Yes | Request duration in seconds |
chunked | boolean | When chunked | Indicates response was split |
chunk_index | integer | When chunked | Current chunk number (1-based) |
total_chunks | integer | When chunked | Total number of chunks |
continuation_token | string | When more chunks | Token for next chunk retrieval |
total_tokens | integer | When chunked | Total tokens in full response |
message | string | When chunked | Human-readable chunk status |
error | string | On failure | Error description |
scrape_url_to_file Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Target URL to scrape |
file_path | string | Yes | - | Path where content should be saved |
method | string | No | "GET" | HTTP method (GET or POST) |
clean_content | boolean | No | false | Convert HTML to Markdown before saving |
overwrite | boolean | No | false | Replace file if it exists |
scrape_url_to_file Response Fields
| Field | Type | Always Present | Description |
|---|---|---|---|
status_code | integer | Yes | HTTP response status code |
headers | object | Yes | Response headers (hop-by-hop headers removed) |
content_type | string | Yes | MIME type of saved content |
response_time | number | Yes | Request duration in seconds |
file_path | string | On success | Absolute path to saved file |
bytes_written | integer | On success | Number of bytes written to disk |
message | string | On success | Confirmation message |
error | string | On failure | Error description |
Installation
Prerequisites
| Requirement | Version | Purpose |
|---|---|---|
| Python | 3.10+ | Runtime environment |
| uv | Latest | Dependency management |
| Git | Any | Repository cloning |
Setup Steps
Clone the repository and install dependencies:
git clone https://github.com/yourusername/cloudscraper-mcp-server.git
cd cloudscraper-mcp-server
uv sync
Configuration
Transport Protocols
| Transport | Best For | Configuration |
|---|---|---|
| stdio | Claude Code, VSCode, Direct AI integration | Default mode, no environment variables needed |
| http | n8n, Web apps, API integrations, Remote access | Requires MCP_TRANSPORT=http |
Environment Variables
| Variable | Default | Options | Description |
|---|---|---|---|
MCP_TRANSPORT | stdio | stdio, http | Transport protocol selection |
MCP_HOST | 0.0.0.0 | Any valid IP | Host binding for HTTP mode |
MCP_PORT | 8000 | Any valid port | Port for HTTP mode |
Usage Examples
Running with Stdio Transport (Default)
uv run server.py
Running with HTTP Transport
MCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run server.py
Claude Code Integration
claude mcp add cloudscraper-mcp \
--type stdio \
--command "uv" \
--args "run" "server.py" \
--directory "/path/to/cloudscraper-mcp-server"
VSCode/IDE Configuration
{
"mcpServers": {
"cloudscraper-mcp": {
"type": "stdio",
"command": "uv",
"args": [
"run",
"server.py"
],
"cwd": "/path/to/cloudscraper-mcp-server"
}
}
}
Docker Deployment
For containerized deployment instructions, see
Technical Stack
| Component | Technology | Purpose |
|---|---|---|
| Protocol | FastMCP 2.0+ | Model Context Protocol implementation |
| Scraping | cloudscraper 1.2.71+ | Cloudflare bypass engine |
| Compression | brotli 1.0.9+ | Response decompression |
| Parsing | beautifulsoup4 4.10.0+ | HTML parsing |
| Conversion | markdownify 0.11.6+ | HTML to Markdown transformation |
| Tokenization | tiktoken 0.5.0+ | Token counting for chunking |
Advanced Features
Response Chunking System
| Feature | Value | Description |
|---|---|---|
| Max Tokens Per Chunk | 10,000 | Maximum tokens in a single response |
| Chunk Expiry | 2 minutes | Cache lifetime for chunk retrieval |
| Token Encoding | cl100k_base | tiktoken encoding model |
| Continuation Pattern | chunk_id:index | Token format for sequential retrieval |
Security Headers
| Header | Value | Purpose |
|---|---|---|
| User-Agent | Chrome 120 | Browser impersonation |
| Sec-Ch-Ua | Chrome/Chromium | Client hints |
| Sec-Fetch-* | cors/same-origin | Fetch metadata |
| Origin/Referer | Auto-generated | Request legitimacy |
Made with CloudScraper and FastMCP