mcp-broken-link-checker by davinoishi - MCP Server

Broken Link Checker MCP Server

A Model Context Protocol (MCP) server that provides broken link checking capabilities with streamable HTTP transport. This server enables AI assistants like Claude to check URLs, scan web pages, and crawl entire websites for broken links.

Features

Four powerful tools for link checking:
- check_url - Verify a single URL
- check_html - Check all links in HTML content
- check_page - Fetch and check all links on a webpage
- check_site - Recursively crawl and check an entire website
Streamable HTTP transport - Serverless-ready, no persistent connections required
JSON-RPC 2.0 - Standard MCP protocol support
Comprehensive results - Status codes, redirects, response times, and detailed error messages
Configurable options - Timeouts, redirect handling, custom user agents

Installation

npm install
npm run build

Usage

Running the Server

Development mode:

npm run dev

Production mode:

npm run build
npm start

The server will start on port 3000 by default (configurable via PORT environment variable).

Endpoints

Health check: GET /health
MCP endpoint: POST /mcp

Using with Claude Code

Add to your MCP settings:

claude mcp add --transport http broken-link-checker http://localhost:3000/mcp

Or manually add to your MCP configuration:

{
  "mcpServers": {
    "broken-link-checker": {
      "transport": "http",
      "url": "http://localhost:3000/mcp"
    }
  }
}

Tools

1. check_url

Check if a single URL is working or broken.

Parameters:

url (string, required) - The URL to check
timeout (number, optional) - Request timeout in milliseconds (default: 30000)
followRedirects (boolean, optional) - Whether to follow redirects (default: true)
maxRedirects (number, optional) - Maximum number of redirects (default: 5)
userAgent (string, optional) - Custom User-Agent string

Example:

{
  "tool": "check_url",
  "arguments": {
    "url": "https://example.com",
    "timeout": 10000
  }
}

Response:

{
  "url": "https://example.com",
  "status": 200,
  "statusText": "OK",
  "broken": false,
  "responseTime": 123
}

2. check_html

Check all links within HTML content.

Parameters:

html (string, required) - HTML content to check
baseUrl (string, required) - Base URL for resolving relative links
All options from check_url

Example:

{
  "tool": "check_html",
  "arguments": {
    "html": "<a href='/about'>About</a>",
    "baseUrl": "https://example.com"
  }
}

Response:

{
  "totalLinks": 5,
  "brokenLinks": 1,
  "workingLinks": 4,
  "results": [...]
}

3. check_page

Fetch a webpage and check all links on it.

Parameters:

url (string, required) - The URL of the page to check
All options from check_url

Example:

{
  "tool": "check_page",
  "arguments": {
    "url": "https://example.com"
  }
}

Response:

{
  "page": {
    "url": "https://example.com",
    "status": 200,
    "broken": false
  },
  "totalLinks": 25,
  "brokenLinks": 2,
  "workingLinks": 23,
  "links": [...]
}

4. check_site

Recursively crawl and check all links on a website.

Parameters:

url (string, required) - The starting URL of the site
maxPages (number, optional) - Maximum pages to check (default: 50)
All options from check_url

Example:

{
  "tool": "check_site",
  "arguments": {
    "url": "https://example.com",
    "maxPages": 10
  }
}

Response:

{
  "totalPages": 10,
  "totalLinks": 150,
  "brokenLinks": 5,
  "workingLinks": 145,
  "pages": [...]
}

Conversational Usage Examples

Once connected to Claude Code, you can use natural language:

"Check if https://example.com is working"
"Scan https://mysite.com and find all broken links"
"Crawl https://blog.example.com and report any 404 errors"
"Check all the links on this page: https://docs.example.com/api"

API Examples

Direct HTTP Request

curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "check_url",
      "arguments": {
        "url": "https://example.com"
      }
    }
  }'

List Available Tools

curl -X POST http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/list"
  }'

Deployment

Docker

Create a Dockerfile:

FROM node:22-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --production

COPY dist ./dist

EXPOSE 3000

CMD ["node", "dist/index.js"]

Build and run:

npm run build
docker build -t broken-link-checker-mcp .
docker run -p 3000:3000 broken-link-checker-mcp

Cloud Deployment

This server uses streamable HTTP transport and is serverless-ready. Deploy to:

Google Cloud Run - Scales to zero when idle
AWS Elastic Beanstalk - Managed Node.js hosting
Azure App Service - Integrated deployment
Render - One-click deployment
Railway - Git-based deployment

Architecture

This implementation uses:

Express.js - HTTP server
MCP SDK - Model Context Protocol implementation
got - HTTP client for link checking
parse5 - HTML parsing
TypeScript - Type-safe implementation
Zod - Runtime schema validation

Comparison with Original broken-link-checker

This MCP server provides:

✅ MCP protocol support for AI integration
✅ HTTP transport (stateless, serverless-ready)
✅ Simplified, focused API
✅ Modern TypeScript implementation

Original library offers:

Advanced HTML parsing options
Robot exclusion protocol support
CLI interface
EventEmitter-based streaming
More granular configuration

License

MIT

Contributing

Contributions welcome! This is a reference implementation that can be extended with:

Additional link checking options
Better error handling
Caching layer
Rate limiting
Authentication support
Webhook notifications

Credits

Inspired by broken-link-checker by Steven Vachon.