cloudscraper-mcp-server

LLMTooling/cloudscraper-mcp-server

3.2

If you are the rightful owner of cloudscraper-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

CloudScraper MCP Server is a Model Context Protocol server designed to enable AI agents to bypass Cloudflare protection and efficiently scrape web content.

Tools
3
Resources
0
Prompts
0

CloudScraper MCP Server

A Model Context Protocol server that enables AI agents to bypass Cloudflare protection and scrape web content

Python Version FastMCP


Core Features

FeatureDescription
Cloudflare BypassAutomatically handles Cloudflare protection using cloudscraper library
Multiple TransportsSupports both stdio and HTTP transport protocols
Content CleaningConverts HTML to clean, LLM-friendly Markdown format
Smart ChunkingAutomatically splits large responses into 10k token chunks
Docker SupportProduction-ready containerized deployment
Multiple MethodsSupports GET and POST HTTP methods
Binary HandlingBase64 encoding for non-text content
File ExportSave scraped content directly to disk

Available MCP Tools

Tool Comparison

ToolReturn TypeUse CaseChunking SupportFile Output
scrape_urlString (content only)Quick content retrieval for AI processingYesNo
scrape_url_rawDictionary (metadata + content)Full response details with headers and timingYesNo
scrape_url_to_fileDictionary (save confirmation)Export content to workspace filesNoYes

Shared Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-Target URL to scrape
methodstringNo"GET"HTTP method (GET or POST)
clean_contentbooleanNotrueConvert HTML to Markdown
continuation_tokenstringNonullToken for retrieving next chunk

scrape_url Response Fields

FieldTypeDescription
ResponsestringPage content with chunk instructions if applicable

Note: When content exceeds 10k tokens, response includes continuation instructions embedded in the text.


scrape_url_raw Response Fields

FieldTypeAlways PresentDescription
status_codeintegerYesHTTP response status code
headersobjectYesResponse headers (hop-by-hop headers removed)
contentstringYesPage content or current chunk
content_typestringYesMIME type of response
response_timenumberYesRequest duration in seconds
chunkedbooleanWhen chunkedIndicates response was split
chunk_indexintegerWhen chunkedCurrent chunk number (1-based)
total_chunksintegerWhen chunkedTotal number of chunks
continuation_tokenstringWhen more chunksToken for next chunk retrieval
total_tokensintegerWhen chunkedTotal tokens in full response
messagestringWhen chunkedHuman-readable chunk status
errorstringOn failureError description

scrape_url_to_file Parameters

ParameterTypeRequiredDefaultDescription
urlstringYes-Target URL to scrape
file_pathstringYes-Path where content should be saved
methodstringNo"GET"HTTP method (GET or POST)
clean_contentbooleanNofalseConvert HTML to Markdown before saving
overwritebooleanNofalseReplace file if it exists

scrape_url_to_file Response Fields

FieldTypeAlways PresentDescription
status_codeintegerYesHTTP response status code
headersobjectYesResponse headers (hop-by-hop headers removed)
content_typestringYesMIME type of saved content
response_timenumberYesRequest duration in seconds
file_pathstringOn successAbsolute path to saved file
bytes_writtenintegerOn successNumber of bytes written to disk
messagestringOn successConfirmation message
errorstringOn failureError description

Installation

Prerequisites

RequirementVersionPurpose
Python3.10+Runtime environment
uvLatestDependency management
GitAnyRepository cloning

Setup Steps

Clone the repository and install dependencies:

git clone https://github.com/yourusername/cloudscraper-mcp-server.git
cd cloudscraper-mcp-server
uv sync

Configuration

Transport Protocols

TransportBest ForConfiguration
stdioClaude Code, VSCode, Direct AI integrationDefault mode, no environment variables needed
httpn8n, Web apps, API integrations, Remote accessRequires MCP_TRANSPORT=http

Environment Variables

VariableDefaultOptionsDescription
MCP_TRANSPORTstdiostdio, httpTransport protocol selection
MCP_HOST0.0.0.0Any valid IPHost binding for HTTP mode
MCP_PORT8000Any valid portPort for HTTP mode

Usage Examples

Running with Stdio Transport (Default)

uv run server.py

Running with HTTP Transport

MCP_TRANSPORT=http MCP_HOST=0.0.0.0 MCP_PORT=8000 uv run server.py

Claude Code Integration

claude mcp add cloudscraper-mcp \
  --type stdio \
  --command "uv" \
  --args "run" "server.py" \
  --directory "/path/to/cloudscraper-mcp-server"

VSCode/IDE Configuration

{
  "mcpServers": {
    "cloudscraper-mcp": {
      "type": "stdio",
      "command": "uv",
      "args": [
        "run",
        "server.py"
      ],
      "cwd": "/path/to/cloudscraper-mcp-server"
    }
  }
}

Docker Deployment

For containerized deployment instructions, see


Technical Stack

ComponentTechnologyPurpose
ProtocolFastMCP 2.0+Model Context Protocol implementation
Scrapingcloudscraper 1.2.71+Cloudflare bypass engine
Compressionbrotli 1.0.9+Response decompression
Parsingbeautifulsoup4 4.10.0+HTML parsing
Conversionmarkdownify 0.11.6+HTML to Markdown transformation
Tokenizationtiktoken 0.5.0+Token counting for chunking

Advanced Features

Response Chunking System

FeatureValueDescription
Max Tokens Per Chunk10,000Maximum tokens in a single response
Chunk Expiry2 minutesCache lifetime for chunk retrieval
Token Encodingcl100k_basetiktoken encoding model
Continuation Patternchunk_id:indexToken format for sequential retrieval

Security Headers

HeaderValuePurpose
User-AgentChrome 120Browser impersonation
Sec-Ch-UaChrome/ChromiumClient hints
Sec-Fetch-*cors/same-originFetch metadata
Origin/RefererAuto-generatedRequest legitimacy

Made with CloudScraper and FastMCP