webdocs-mcp-server by IsaacIndex - MCP Server

Web Scraper MCP Server

A web scraping server that implements the Minecraft Control Protocol (MCP) using FastAPI. This server provides tools for extracting content and links from web pages in a structured way.

Features

Web content extraction with intelligent content cleaning
Link extraction with full URL resolution
Language detection
Headless browser automation with Playwright by default or Selenium when cookies are required
Open URLs in your existing browser session
Multi-step website actions with Playwright
FastAPI-based REST API
MCP protocol implementation

Prerequisites

Python 3.11+
Chrome browser installed
If Chrome is not in your PATH, set the CHROME_BINARY environment variable to the full path of the Chrome executable
The WebScraper can run in Playwright or Selenium mode. Playwright is the default unless cookie-based sessions are needed.
uv package manager

Setup

Create a virtual environment and install dependencies using uv:

uv venv
uv pip install -r requirements.txt

Create a .env file (optional):

PORT=8000

Usage modes

MCP mode

Start the server:

python mcp_server.py [--log-level info|debug|warning|error|critical]

The server will start on http://localhost:8000 by default.

Agent mode

The old agents.py script is deprecated. Use agents_stream_tools.py instead. Legacy code is kept in agents_legacy.py for reference.

Run the interactive agent:

python agents_stream_tools.py "your question here"

API Endpoints

MCP Endpoints

Scrape Website

URL: /mcp
Method: POST
Command: scrape_website

Parameters:

{
  "url": "https://example.com",
  "query": "specific topic"
}

Response: Filtered content relevant to the query

Extract Links

URL: /mcp
Method: POST
Command: extract_links
Parameters:

{
  "url": "https://example.com"
}

The server fetches the URL and returns all links found on the page.

Relative paths are converted to absolute URLs based on the provided page.
Response: List of all links found on the page

Download PDFs

URL: /mcp
Method: POST
Command: download_pdfs

Parameters:

{
  "links": ["https://example.com/sample.pdf"]
}

Response: Paths to the downloaded PDF files

Ping

URL: /mcp
Method: POST
Command: ping
Response: Server status and version information

Open Browser

URL: /mcp
Method: POST
Command: open_browser
Parameters:
```
{
  "url": "https://example.com"
}
```
Response: Confirmation that the URL was opened using your browser session

React Browser Task

URL: /mcp
Method: POST
Command: react_browser_task

Parameters:

{
  "url": "https://example.com",
  "goal": "Click the next button and return the page text"
}

Response: Final page content after completing the goal

Health Check

URL: /health
Method: GET

API Documentation

Once the server is running, you can access the API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Logging

Server logs are written to project_folder/logs/mcp.log and include:

Server startup/shutdown events
Web scraping operations
Error messages and exceptions The agents_stream_tools.py script logs to project_folder/logs/agent.log. Both sets of logs are stored in the same directory and the agent output captures tool calls and the <think> sections for full traceability. You can adjust server verbosity with the --log-level flag when starting the server. By default, server logs use the warning level.

Project Structure

mcp_server.py: Core server implementation and MCP endpoints
agents_stream_tools.py: Interactive agent for direct tool use
mcp.json: MCP configuration file
requirements.txt: Python dependencies
pyproject.toml: Project metadata and build configuration

Error Handling

The server includes comprehensive error handling for:

Invalid URLs
Network connectivity issues
WebDriver initialization failures
Content extraction errors

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Updating `mcp.json` Path

If you move the project directory, run the helper script to update the mcp.json configuration:

python update_mcp_path.py

The updated JSON is copied to your clipboard so you can replace the content of mcp.json easily.

IsaacIndex/webdocs-mcp-server

scrape_website

extract_links

download_pdfs_from_text

ping

open_browser

Web Scraper MCP Server

Features

Prerequisites

Setup

Usage modes

MCP mode

Agent mode

API Endpoints

MCP Endpoints

Scrape Website

Extract Links

Download PDFs

Ping

Open Browser

React Browser Task

Health Check

API Documentation

Logging

Project Structure

Error Handling

Contributing

Updating `mcp.json` Path

IsaacIndex/webdocs-mcp-server

scrape_website

extract_links

download_pdfs_from_text

ping

open_browser

Web Scraper MCP Server

Features

Prerequisites

Setup

Usage modes

MCP mode

Agent mode

API Endpoints

MCP Endpoints

Scrape Website

Extract Links

Download PDFs

Ping

Open Browser

React Browser Task

Health Check

API Documentation

Logging

Project Structure

Error Handling

Contributing

Updating mcp.json Path

Updating `mcp.json` Path