MauricioZapata00/250908-html-mcp-server-go
If you are the rightful owner of 250908-html-mcp-server-go and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A high-performance Go implementation of an MCP server for fetching and extracting content from web pages.
HTML MCP Server Go
A high-performance Go implementation of an MCP (Model Context Protocol) server for fetching and extracting content from web pages, including JavaScript-rendered content and Docsify documentation sites. Built using Clean Architecture principles with Go Fiber v2.
Features
- Dual Mode Operation: Works as both MCP server (JSON-RPC) and REST API server
- HTML Content Extraction: Extract text content from static HTML pages
- JavaScript Rendering: Support for SPA and dynamically rendered content using Chrome/Chromium
- Docsify Support: Specialized parsing for Docsify documentation sites
- Clean Architecture: Domain-driven design with separated concerns
- High Performance: Built with Go Fiber v2 and async processing
- Docker Ready: Containerized deployment with health checks
- CORS Support: Cross-origin requests enabled for API mode
Quick Start
Prerequisites
- Go 1.21 or later
- Chrome/Chromium (for JavaScript rendering)
- Docker (optional, for containerized deployment)
Local Development
-
Clone and build:
git clone <repository-url> cd html-mcp-server-go go mod download go build -o html-mcp-server ./cmd -
Run as REST API server:
./html-mcp-server -mode=api -port=8085 -
Test the API:
# Health check curl http://localhost:8085/health # Fetch regular webpage curl -X POST http://localhost:8085/api/fetch \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' # Fetch with JavaScript rendering curl -X POST http://localhost:8085/api/fetch \ -H "Content-Type: application/json" \ -d '{"url": "https://docs.example.com", "use_js_rendering": true}' # Fetch Docsify site curl -X POST http://localhost:8085/api/fetch \ -H "Content-Type: application/json" \ -d '{"url": "https://docsify.js.org", "use_js_rendering": true, "parse_docsify": true}' -
Run as MCP server:
./html-mcp-server -mode=mcp
Docker Deployment
-
Build and run with Docker:
docker build -t html-mcp-server-go . docker run -p 8085:8085 html-mcp-server-go -
Or use Docker Compose:
docker-compose up -d
API Reference
REST API
GET /health
Returns server health status.
Response:
{
"status": "healthy",
"version": "1.0.0"
}
POST /api/fetch
Fetches and extracts content from web pages.
Request Body:
{
"url": "https://example.com",
"extract_text_only": true,
"use_js_rendering": false,
"parse_docsify": false,
"timeout_seconds": 30,
"wait_for_selector": ".content"
}
Parameters:
url(required): The URL to fetchextract_text_only: Extract only text content (default: true)use_js_rendering: Use Chrome for JavaScript rendering (default: false)parse_docsify: Parse Docsify-specific content (default: auto-detect)timeout_seconds: Request timeout (default: 30, max: 300)wait_for_selector: CSS selector to wait for (JS rendering only)
Response:
{
"url": "https://example.com",
"title": "Example Domain",
"text_content": "Example Domain This domain is for use...",
"raw_html": "<!doctype html><html>...",
"metadata": {
"content_type": "text/html; charset=utf-8",
"status_code": 200,
"content_length": 1256,
"is_js_rendered": false,
"is_docsify": false
},
"docsify_data": {
"sidebar_content": "...",
"navbar_content": "...",
"main_content": "...",
"config_data": {...}
}
}
MCP Protocol
The server implements the MCP (Model Context Protocol) with the following tools:
fetch_html
Fetch and extract content from web pages.
Parameters:
url(required): The URL to fetchextract_text_only: Extract only text contentuse_js_rendering: Use JavaScript renderingparse_docsify: Parse Docsify contenttimeout_seconds: Request timeoutwait_for_selector: CSS selector to wait for
MCP Client Configuration
Add one of these configurations to your MCP client settings:
Local Development (with Go installed)
{
"mcpServers": {
"html-mcp-server-go": {
"command": "go",
"args": ["run", "./cmd", "-mode=mcp"],
"cwd": "/path/to/html-mcp-server-go"
}
}
}
Binary Execution (compiled binary)
{
"mcpServers": {
"html-mcp-server-go": {
"command": "/path/to/html-mcp-server",
"args": ["-mode=mcp"]
}
}
}
Docker Container
{
"mcpServers": {
"html-mcp-server-go": {
"command": "docker",
"args": ["run", "--rm", "-i", "html-mcp-server-go", "-mode=mcp"]
}
}
}
Note: Replace /path/to/html-mcp-server-go and /path/to/html-mcp-server with the actual paths to your project directory and binary file respectively.
Architecture
The project follows Clean Architecture principles:
├── cmd/ # Application entry point
├── domain/ # Core business logic
│ ├── model/ # Domain entities and value objects
│ └── port/ # Interfaces for external dependencies
├── application/ # Business logic and use cases
│ ├── service/ # Application services
│ └── usecase/ # Use case implementations
├── infrastructure/ # External adapters
│ ├── adapter/ # HTML parser adapter
│ ├── api/ # Fiber REST API server
│ ├── client/ # HTTP client with Chrome support
│ └── mcp/ # MCP server implementation
└── docker-compose.yaml # Container orchestration
Configuration
Environment Variables
CHROME_BIN: Path to Chrome/Chromium binaryCHROME_PATH: Alternative Chrome path
Command Line Flags
-mode: Server mode (mcp, api, auto) - default: auto-port: API server port - default: 8085
Docker Configuration
The Docker image includes:
- Alpine Linux base image
- Chrome/Chromium browser
- Non-root user for security
- Health checks
- Resource limits (1GB memory, 1 CPU core)
Development
Adding New Features
- Define domain models in
domain/model/ - Create interfaces in
domain/port/ - Implement business logic in
application/service/ - Create adapters in
infrastructure/ - Wire dependencies in
cmd/main.go
Testing
# Run tests
go test ./...
# Run specific package tests
go test ./domain/model
go test ./application/service
# Integration testing with Docker
docker-compose up -d
curl http://localhost:8085/health
docker-compose down
Building
# Development build
go build -o html-mcp-server ./cmd
# Production build
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o html-mcp-server ./cmd
# Docker build
docker build -t html-mcp-server-go .
Docsify Support
The server provides specialized support for Docsify documentation sites:
Features
- Auto-detection: Automatically detects Docsify sites
- Content Extraction: Separates sidebar, navbar, and main content
- Configuration Parsing: Extracts Docsify configuration data
- JavaScript Rendering: Handles dynamic content loading
Usage Example
curl -X POST http://localhost:8085/api/fetch \
-H "Content-Type: application/json" \
-d '{
"url": "https://docsify.js.org/#/quickstart",
"use_js_rendering": true,
"parse_docsify": true,
"wait_for_selector": ".markdown-section"
}'
Docsify Response Structure
{
"docsify_data": {
"sidebar_content": "Quick start\nWriting content...",
"navbar_content": "GitHub\nGitee",
"main_content": "Quick start\nIt is recommended...",
"config_data": {
"name": "docsify",
"repo": "https://github.com/docsifyjs/docsify/",
"loadSidebar": true
}
}
}
Error Handling
The server provides comprehensive error handling:
HTTP Status Codes (API Mode)
200 OK: Successful request400 Bad Request: Invalid request parameters408 Request Timeout: Request timeout500 Internal Server Error: Server-side errors
Error Response Format
{
"error": "ERROR_CODE",
"message": "Human-readable error description"
}
Common Error Codes
INVALID_URL: Empty or malformed URLFETCH_ERROR: Network or HTTP errorsPARSE_ERROR: HTML parsing failuresTIMEOUT: Request timeoutJS_RENDERING_ERROR: JavaScript rendering failuresDOCSIFY_PARSE_ERROR: Docsify parsing failures
Performance
- Concurrent Processing: Goroutine-based async processing
- Connection Pooling: HTTP client reuses connections
- Memory Efficient: Streaming HTML processing
- Resource Limits: Configurable timeouts and limits
- Browser Optimization: Chrome launched with minimal resource usage
Security
- Non-root Container: Docker runs as non-root user
- Resource Limits: Memory and CPU limits in Docker
- Input Validation: URL validation and sanitization
- Timeout Protection: Request and browser timeouts
- CORS Configuration: Configurable cross-origin policies
License
This project is open source and available under the MIT License.