web-scraping-mcp-server

locomotive-agency/web-scraping-mcp-server

3.2

If you are the rightful owner of web-scraping-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

A Model Context Protocol (MCP) server for web scraping using ScrapingBee API with FastMCP framework.

Tools
7
Resources
0
Prompts
0

Web Scraping MCP Server

A Model Context Protocol (MCP) server for web scraping using ScrapingBee API with FastMCP framework.

Features

  • 7 MCP Tools for comprehensive web scraping:

    • fetch_html - Fetch raw HTML content
    • extract_page_title - Extract page titles
    • extract_meta_description - Extract meta descriptions
    • extract_open_graph_metadata - Extract Open Graph metadata
    • extract_h1_headers - Extract H1 headers
    • extract_h2_headers - Extract H2 headers
    • extract_h3_headers - Extract H3 headers
  • Flexible Input: Each tool supports both single URL and batch URL operations

  • Custom User Agents: Support for custom user agent strings

  • JavaScript Rendering: Optional JavaScript rendering via ScrapingBee

  • Async Processing: Concurrent processing for batch operations

  • Comprehensive Error Handling: Categorized error responses for MCP clients

  • Pydantic Settings: Environment-based configuration

Requirements

  • Python 3.12+
  • ScrapingBee API key
  • FastMCP 2.10.6+

Installation

Method 1: UV Tool (Recommended)

Install directly as a UV tool:

uv tool install web-scraping-mcp-server

Method 2: Development Installation

  1. Clone the repository:
git clone <repository-url>
cd web-scraping-mcp-server
  1. Install dependencies:
uv sync
  1. Set up environment variables:
cp .env.example .env
# Edit .env and add your ScrapingBee API key

Usage

Starting the Server

If installed as UV tool:
export SCRAPINGBEE_API_KEY='your-api-key-here'
uvx web-scraping-mcp-server
If using development installation:
export SCRAPINGBEE_API_KEY='your-api-key-here'
uv run web-scraping-mcp-server

The server runs using STDIO transport for MCP protocol compatibility.

Tool Usage

All tools accept either single URL or batch URL requests:

Single URL:

{
  "url": "https://example.com",
  "user_agent": "Custom User Agent"
}

Batch URLs:

{
  "urls": ["https://example.com", "https://test.com"],
  "user_agent": "Custom User Agent"
}

Response Format

All tools return standardized responses:

Success Response:

{
  "url": "https://example.com",
  "success": true,
  "data": "extracted content",
  "error": null
}

Error Response:

{
  "url": "https://example.com",
  "success": false,
  "data": null,
  "error": {
    "type": "API_ERROR",
    "message": "Error description"
  }
}

Error Types

  • API_ERROR - ScrapingBee API issues
  • NETWORK_ERROR - Network connectivity problems
  • TIMEOUT_ERROR - Request timeouts
  • NOT_FOUND_ERROR - 404 errors
  • PARSING_ERROR - HTML parsing issues

Configuration

Environment variables (optional):

SCRAPINGBEE_API_KEY=your-api-key        # Required
DEFAULT_CONCURRENCY=5                   # Concurrent requests
DEFAULT_TIMEOUT=90.0                    # Request timeout
LOG_LEVEL=INFO                          # Logging level
DEFAULT_USER_AGENT=Custom-Agent         # Default user agent

Development

Running Tests

uv run pytest

Code Quality

uv run ruff check
uv run mypy src

Example Usage

If installed as UV tool:
python example.py
If using development installation:
uv run python example.py

Architecture

  • FastMCP: Modern MCP framework for tool registration
  • ScrapingBee: Professional web scraping API
  • BeautifulSoup: HTML parsing and extraction
  • Pydantic: Data validation and settings
  • Async/Await: Concurrent processing support

License

MIT License