locomotive-agency/web-scraping-mcp-server
If you are the rightful owner of web-scraping-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A Model Context Protocol (MCP) server for web scraping using ScrapingBee API with FastMCP framework.
Web Scraping MCP Server
A Model Context Protocol (MCP) server for web scraping using ScrapingBee API with FastMCP framework.
Features
-
7 MCP Tools for comprehensive web scraping:
fetch_html- Fetch raw HTML contentextract_page_title- Extract page titlesextract_meta_description- Extract meta descriptionsextract_open_graph_metadata- Extract Open Graph metadataextract_h1_headers- Extract H1 headersextract_h2_headers- Extract H2 headersextract_h3_headers- Extract H3 headers
-
Flexible Input: Each tool supports both single URL and batch URL operations
-
Custom User Agents: Support for custom user agent strings
-
JavaScript Rendering: Optional JavaScript rendering via ScrapingBee
-
Async Processing: Concurrent processing for batch operations
-
Comprehensive Error Handling: Categorized error responses for MCP clients
-
Pydantic Settings: Environment-based configuration
Requirements
- Python 3.12+
- ScrapingBee API key
- FastMCP 2.10.6+
Installation
Method 1: UV Tool (Recommended)
Install directly as a UV tool:
uv tool install web-scraping-mcp-server
Method 2: Development Installation
- Clone the repository:
git clone <repository-url>
cd web-scraping-mcp-server
- Install dependencies:
uv sync
- Set up environment variables:
cp .env.example .env
# Edit .env and add your ScrapingBee API key
Usage
Starting the Server
If installed as UV tool:
export SCRAPINGBEE_API_KEY='your-api-key-here'
uvx web-scraping-mcp-server
If using development installation:
export SCRAPINGBEE_API_KEY='your-api-key-here'
uv run web-scraping-mcp-server
The server runs using STDIO transport for MCP protocol compatibility.
Tool Usage
All tools accept either single URL or batch URL requests:
Single URL:
{
"url": "https://example.com",
"user_agent": "Custom User Agent"
}
Batch URLs:
{
"urls": ["https://example.com", "https://test.com"],
"user_agent": "Custom User Agent"
}
Response Format
All tools return standardized responses:
Success Response:
{
"url": "https://example.com",
"success": true,
"data": "extracted content",
"error": null
}
Error Response:
{
"url": "https://example.com",
"success": false,
"data": null,
"error": {
"type": "API_ERROR",
"message": "Error description"
}
}
Error Types
API_ERROR- ScrapingBee API issuesNETWORK_ERROR- Network connectivity problemsTIMEOUT_ERROR- Request timeoutsNOT_FOUND_ERROR- 404 errorsPARSING_ERROR- HTML parsing issues
Configuration
Environment variables (optional):
SCRAPINGBEE_API_KEY=your-api-key # Required
DEFAULT_CONCURRENCY=5 # Concurrent requests
DEFAULT_TIMEOUT=90.0 # Request timeout
LOG_LEVEL=INFO # Logging level
DEFAULT_USER_AGENT=Custom-Agent # Default user agent
Development
Running Tests
uv run pytest
Code Quality
uv run ruff check
uv run mypy src
Example Usage
If installed as UV tool:
python example.py
If using development installation:
uv run python example.py
Architecture
- FastMCP: Modern MCP framework for tool registration
- ScrapingBee: Professional web scraping API
- BeautifulSoup: HTML parsing and extraction
- Pydantic: Data validation and settings
- Async/Await: Concurrent processing support
License
MIT License