250908-html-mcp-server-go

MauricioZapata00/250908-html-mcp-server-go

3.1

If you are the rightful owner of 250908-html-mcp-server-go and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A high-performance Go implementation of an MCP server for fetching and extracting content from web pages.

Tools
1
Resources
0
Prompts
0

HTML MCP Server Go

A high-performance Go implementation of an MCP (Model Context Protocol) server for fetching and extracting content from web pages, including JavaScript-rendered content and Docsify documentation sites. Built using Clean Architecture principles with Go Fiber v2.

Features

  • Dual Mode Operation: Works as both MCP server (JSON-RPC) and REST API server
  • HTML Content Extraction: Extract text content from static HTML pages
  • JavaScript Rendering: Support for SPA and dynamically rendered content using Chrome/Chromium
  • Docsify Support: Specialized parsing for Docsify documentation sites
  • Clean Architecture: Domain-driven design with separated concerns
  • High Performance: Built with Go Fiber v2 and async processing
  • Docker Ready: Containerized deployment with health checks
  • CORS Support: Cross-origin requests enabled for API mode

Quick Start

Prerequisites

  • Go 1.21 or later
  • Chrome/Chromium (for JavaScript rendering)
  • Docker (optional, for containerized deployment)

Local Development

  1. Clone and build:

    git clone <repository-url>
    cd html-mcp-server-go
    go mod download
    go build -o html-mcp-server ./cmd
    
  2. Run as REST API server:

    ./html-mcp-server -mode=api -port=8085
    
  3. Test the API:

    # Health check
    curl http://localhost:8085/health
    
    # Fetch regular webpage
    curl -X POST http://localhost:8085/api/fetch \
      -H "Content-Type: application/json" \
      -d '{"url": "https://example.com"}'
    
    # Fetch with JavaScript rendering
    curl -X POST http://localhost:8085/api/fetch \
      -H "Content-Type: application/json" \
      -d '{"url": "https://docs.example.com", "use_js_rendering": true}'
    
    # Fetch Docsify site
    curl -X POST http://localhost:8085/api/fetch \
      -H "Content-Type: application/json" \
      -d '{"url": "https://docsify.js.org", "use_js_rendering": true, "parse_docsify": true}'
    
  4. Run as MCP server:

    ./html-mcp-server -mode=mcp
    

Docker Deployment

  1. Build and run with Docker:

    docker build -t html-mcp-server-go .
    docker run -p 8085:8085 html-mcp-server-go
    
  2. Or use Docker Compose:

    docker-compose up -d
    

API Reference

REST API

GET /health

Returns server health status.

Response:

{
  "status": "healthy",
  "version": "1.0.0"
}
POST /api/fetch

Fetches and extracts content from web pages.

Request Body:

{
  "url": "https://example.com",
  "extract_text_only": true,
  "use_js_rendering": false,
  "parse_docsify": false,
  "timeout_seconds": 30,
  "wait_for_selector": ".content"
}

Parameters:

  • url (required): The URL to fetch
  • extract_text_only: Extract only text content (default: true)
  • use_js_rendering: Use Chrome for JavaScript rendering (default: false)
  • parse_docsify: Parse Docsify-specific content (default: auto-detect)
  • timeout_seconds: Request timeout (default: 30, max: 300)
  • wait_for_selector: CSS selector to wait for (JS rendering only)

Response:

{
  "url": "https://example.com",
  "title": "Example Domain",
  "text_content": "Example Domain This domain is for use...",
  "raw_html": "<!doctype html><html>...",
  "metadata": {
    "content_type": "text/html; charset=utf-8",
    "status_code": 200,
    "content_length": 1256,
    "is_js_rendered": false,
    "is_docsify": false
  },
  "docsify_data": {
    "sidebar_content": "...",
    "navbar_content": "...",
    "main_content": "...",
    "config_data": {...}
  }
}

MCP Protocol

The server implements the MCP (Model Context Protocol) with the following tools:

fetch_html

Fetch and extract content from web pages.

Parameters:

  • url (required): The URL to fetch
  • extract_text_only: Extract only text content
  • use_js_rendering: Use JavaScript rendering
  • parse_docsify: Parse Docsify content
  • timeout_seconds: Request timeout
  • wait_for_selector: CSS selector to wait for

MCP Client Configuration

Add one of these configurations to your MCP client settings:

Local Development (with Go installed)
{
  "mcpServers": {
    "html-mcp-server-go": {
      "command": "go",
      "args": ["run", "./cmd", "-mode=mcp"],
      "cwd": "/path/to/html-mcp-server-go"
    }
  }
}
Binary Execution (compiled binary)
{
  "mcpServers": {
    "html-mcp-server-go": {
      "command": "/path/to/html-mcp-server",
      "args": ["-mode=mcp"]
    }
  }
}
Docker Container
{
  "mcpServers": {
    "html-mcp-server-go": {
      "command": "docker",
      "args": ["run", "--rm", "-i", "html-mcp-server-go", "-mode=mcp"]
    }
  }
}

Note: Replace /path/to/html-mcp-server-go and /path/to/html-mcp-server with the actual paths to your project directory and binary file respectively.

Architecture

The project follows Clean Architecture principles:

├── cmd/                    # Application entry point
├── domain/                 # Core business logic
│   ├── model/             # Domain entities and value objects
│   └── port/              # Interfaces for external dependencies
├── application/           # Business logic and use cases
│   ├── service/           # Application services
│   └── usecase/           # Use case implementations
├── infrastructure/        # External adapters
│   ├── adapter/           # HTML parser adapter
│   ├── api/               # Fiber REST API server
│   ├── client/            # HTTP client with Chrome support
│   └── mcp/               # MCP server implementation
└── docker-compose.yaml   # Container orchestration

Configuration

Environment Variables

  • CHROME_BIN: Path to Chrome/Chromium binary
  • CHROME_PATH: Alternative Chrome path

Command Line Flags

  • -mode: Server mode (mcp, api, auto) - default: auto
  • -port: API server port - default: 8085

Docker Configuration

The Docker image includes:

  • Alpine Linux base image
  • Chrome/Chromium browser
  • Non-root user for security
  • Health checks
  • Resource limits (1GB memory, 1 CPU core)

Development

Adding New Features

  1. Define domain models in domain/model/
  2. Create interfaces in domain/port/
  3. Implement business logic in application/service/
  4. Create adapters in infrastructure/
  5. Wire dependencies in cmd/main.go

Testing

# Run tests
go test ./...

# Run specific package tests
go test ./domain/model
go test ./application/service

# Integration testing with Docker
docker-compose up -d
curl http://localhost:8085/health
docker-compose down

Building

# Development build
go build -o html-mcp-server ./cmd

# Production build
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o html-mcp-server ./cmd

# Docker build
docker build -t html-mcp-server-go .

Docsify Support

The server provides specialized support for Docsify documentation sites:

Features

  • Auto-detection: Automatically detects Docsify sites
  • Content Extraction: Separates sidebar, navbar, and main content
  • Configuration Parsing: Extracts Docsify configuration data
  • JavaScript Rendering: Handles dynamic content loading

Usage Example

curl -X POST http://localhost:8085/api/fetch \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docsify.js.org/#/quickstart",
    "use_js_rendering": true,
    "parse_docsify": true,
    "wait_for_selector": ".markdown-section"
  }'

Docsify Response Structure

{
  "docsify_data": {
    "sidebar_content": "Quick start\nWriting content...",
    "navbar_content": "GitHub\nGitee",
    "main_content": "Quick start\nIt is recommended...",
    "config_data": {
      "name": "docsify",
      "repo": "https://github.com/docsifyjs/docsify/",
      "loadSidebar": true
    }
  }
}

Error Handling

The server provides comprehensive error handling:

HTTP Status Codes (API Mode)

  • 200 OK: Successful request
  • 400 Bad Request: Invalid request parameters
  • 408 Request Timeout: Request timeout
  • 500 Internal Server Error: Server-side errors

Error Response Format

{
  "error": "ERROR_CODE",
  "message": "Human-readable error description"
}

Common Error Codes

  • INVALID_URL: Empty or malformed URL
  • FETCH_ERROR: Network or HTTP errors
  • PARSE_ERROR: HTML parsing failures
  • TIMEOUT: Request timeout
  • JS_RENDERING_ERROR: JavaScript rendering failures
  • DOCSIFY_PARSE_ERROR: Docsify parsing failures

Performance

  • Concurrent Processing: Goroutine-based async processing
  • Connection Pooling: HTTP client reuses connections
  • Memory Efficient: Streaming HTML processing
  • Resource Limits: Configurable timeouts and limits
  • Browser Optimization: Chrome launched with minimal resource usage

Security

  • Non-root Container: Docker runs as non-root user
  • Resource Limits: Memory and CPU limits in Docker
  • Input Validation: URL validation and sanitization
  • Timeout Protection: Request and browser timeouts
  • CORS Configuration: Configurable cross-origin policies

License

This project is open source and available under the MIT License.