super-fetch-mcp-server by j0hanz - MCP Server

🚀 superFetch MCP Server

One-Click Install

A Model Context Protocol (MCP) server that fetches, extracts, and transforms web content into AI-optimized formats using Mozilla Readability.

Quick Start · How to Choose a Tool · Tools · Configuration · Contributing

📦 Published to MCP Registry — Search for io.github.j0hanz/superfetch

[!CAUTION] This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.

✨ Features

Feature	Description
🧠 Smart Extraction	Mozilla Readability removes ads, navigation, and boilerplate
📄 Multiple Formats	JSONL semantic blocks or clean Markdown with YAML frontmatter
🔗 Link Discovery	Extract and classify internal/external links
⚡ Built-in Caching	Configurable TTL and max entries
🛡️ Security First	SSRF protection, URL validation, header sanitization
🔄 Resilient Fetching	Exponential backoff with jitter
📊 Monitoring	Stats resource for cache performance and health

🎯 How to Choose a Tool

Use this guide to select the right tool for your web content extraction needs:

Decision Tree

Need web content for AI?
├─ Single URL?
│   ├─ Need structured semantic blocks → fetch-url (JSONL)
│   ├─ Need readable markdown → fetch-markdown
│   └─ Need links only → fetch-links
└─ Multiple URLs?
    └─ Use fetch-urls (batch processing)

Quick Reference Table

Tool	Best For	Output Format	Use When
`fetch-url`	Single page → structured content	JSONL semantic blocks	AI analysis, RAG pipelines, content parsing
`fetch-markdown`	Single page → readable format	Clean Markdown + TOC	Documentation, human-readable output
`fetch-links`	Link discovery & classification	URL array with types	Sitemap building, finding related pages
`fetch-urls`	Batch processing multiple pages	Multiple JSONL/Markdown	Comparing pages, bulk extraction

Common Use Cases

Task	Recommended Tool	Why
Parse a blog post for AI	`fetch-url`	Returns semantic blocks (headings, paragraphs, code)
Generate documentation	`fetch-markdown`	Clean markdown with optional TOC
Build a sitemap	`fetch-links`	Extracts and classifies all links
Compare multiple docs	`fetch-urls`	Parallel fetching with concurrency control
Extract article for RAG	`fetch-url` + `extractMainContent: true`	Removes ads/nav, keeps main content

Quick Start

Add superFetch to your MCP client configuration — no installation required!

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

VS Code

Add to .vscode/mcp.json in your workspace:

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

With Custom Configuration

Configure SuperFetch behavior by adding environment variables to the env property:

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "CACHE_TTL": "7200",
        "LOG_LEVEL": "debug",
        "FETCH_TIMEOUT": "60000"
      }
    }
  }
}

See Configuration section below for all available options and presets.

Cursor

Open Cursor Settings
Go to Features > MCP Servers
Click "+ Add new global MCP server"
Add this configuration:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

Tip: On Windows, if you encounter issues, try: cmd /c "npx -y @j0hanz/superfetch@latest --stdio"

Codex IDE

Add to your ~/.codex/config.toml file:

Basic Configuration:

[mcp_servers.superfetch]
command = "npx"
args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]

With Environment Variables:

[mcp_servers.superfetch]
command = "npx"
args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]
env = { CACHE_TTL = "7200", LOG_LEVEL = "debug", FETCH_TIMEOUT = "60000" }

Access config file: Click the gear icon → "Codex Settings > Open config.toml"

Documentation: Codex MCP Guide

Cline (VS Code Extension)

Open the Cline MCP settings file:

macOS:

code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Windows:

code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json

Add the configuration:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Windsurf

Add to ./codeium/windsurf/model_config.json:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

Claude Desktop (Config File Locations)

macOS:

# Open config file
open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"

# Or with VS Code
code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"

Windows:

code %APPDATA%\Claude\claude_desktop_config.json

Installation (Alternative)

Global Installation

npm install -g @j0hanz/superfetch

# Run in stdio mode
superfetch --stdio

# Run HTTP server
superfetch

From Source

git clone https://github.com/j0hanz/super-fetch-mcp-server.git
cd super-fetch-mcp-server
npm install
npm run build

Running the Server

HTTP Mode (default)

# Development with hot reload
npm run dev

# Production
npm start

Server runs at http://127.0.0.1:3000:

Health check: GET /health
MCP endpoint: POST /mcp

stdio Mode (direct MCP integration)

node dist/index.js --stdio

Available Tools

Note: If extracted content exceeds MAX_INLINE_CONTENT_CHARS, the tool response includes a resourceUri and a resource_link content block instead of embedding the full text.

`fetch-url`

Fetches a webpage and converts it to AI-readable JSONL format with semantic content blocks.

Parameter	Type	Default	Description
`url`	string	required	URL to fetch
`extractMainContent`	boolean	`true`	Use Readability to extract main content
`includeMetadata`	boolean	`true`	Include page metadata (title, description)
`maxContentLength`	number	–	Maximum content length in characters
`customHeaders`	object	–	Custom HTTP headers for the request
`timeout`	number	`30000`	Request timeout in milliseconds (1000-60000)
`retries`	number	`3`	Number of retry attempts (1-10)

Example Response:

{
  "url": "https://example.com/article",
  "title": "Example Article",
  "fetchedAt": "2025-12-11T10:30:00.000Z",
  "contentBlocks": [
    {
      "type": "metadata",
      "title": "Example Article",
      "description": "A sample article"
    },
    { "type": "heading", "level": 1, "text": "Introduction" },
    {
      "type": "paragraph",
      "text": "This is the main content of the article..."
    },
    {
      "type": "code",
      "language": "javascript",
      "content": "console.log('Hello');"
    }
  ],
  "cached": false
}

`fetch-links`

Extracts hyperlinks from a webpage with classification. Supports filtering, image links, and link limits.

Parameter	Type	Default	Description
`url`	string	required	URL to extract links from
`includeExternal`	boolean	`true`	Include external links
`includeInternal`	boolean	`true`	Include internal links
`includeImages`	boolean	`false`	Include image links (img src attributes)
`maxLinks`	number	–	Maximum number of links to return (1-1000)
`filterPattern`	string	–	Regex pattern to filter links (matches href)
`customHeaders`	object	–	Custom HTTP headers for the request
`timeout`	number	`30000`	Request timeout in milliseconds (1000-60000)
`retries`	number	`3`	Number of retry attempts (1-10)

Example Response:

{
  "url": "https://example.com/",
  "linkCount": 15,
  "links": [
    {
      "href": "https://example.com/about",
      "text": "About Us",
      "type": "internal"
    },
    {
      "href": "https://github.com/example",
      "text": "GitHub",
      "type": "external"
    },
    { "href": "https://example.com/logo.png", "text": "", "type": "image" }
  ],
  "cached": false,
  "truncated": false
}

`fetch-markdown`

Fetches a webpage and converts it to clean Markdown with optional table of contents.

Parameter	Type	Default	Description
`url`	string	required	URL to fetch
`extractMainContent`	boolean	`true`	Extract main content only
`includeMetadata`	boolean	`true`	Include YAML frontmatter
`maxContentLength`	number	–	Maximum content length in characters
`generateToc`	boolean	`false`	Generate table of contents from headings
`customHeaders`	object	–	Custom HTTP headers for the request
`timeout`	number	`30000`	Request timeout in milliseconds (1000-60000)
`retries`	number	`3`	Number of retry attempts (1-10)

Example Response:

{
  "url": "https://example.com/docs",
  "title": "Documentation",
  "fetchedAt": "2025-12-11T10:30:00.000Z",
  "markdown": "---\ntitle: Documentation\nsource: \"https://example.com/docs\"\n---\n\n# Getting Started\n\nWelcome to our documentation...\n\n## Installation\n\n```bash\nnpm install example\n```",
  "toc": [
    { "level": 1, "text": "Getting Started", "slug": "getting-started" },
    { "level": 2, "text": "Installation", "slug": "installation" }
  ],
  "cached": false,
  "truncated": false
}

`fetch-urls` (Batch)

Fetches multiple URLs in parallel with concurrency control. Ideal for comparing content or processing multiple pages efficiently.

Parameter	Type	Default	Description
`urls`	string[]	required	Array of URLs to fetch (1-10 URLs)
`extractMainContent`	boolean	`true`	Use Readability to extract main content
`includeMetadata`	boolean	`true`	Include page metadata
`maxContentLength`	number	–	Maximum content length per URL in characters
`format`	string	`'jsonl'`	Output format: `'jsonl'` or `'markdown'`
`concurrency`	number	`3`	Maximum concurrent requests (1-5)
`continueOnError`	boolean	`true`	Continue processing if some URLs fail
`customHeaders`	object	–	Custom HTTP headers for all requests
`timeout`	number	`30000`	Request timeout in milliseconds (1000-60000)
`retries`	number	`3`	Number of retry attempts (1-10)

Example Output:

{
  "results": [
    {
      "url": "https://example.com",
      "success": true,
      "title": "Example",
      "content": "...",
      "cached": false
    },
    {
      "url": "https://example.org",
      "success": true,
      "title": "Example Org",
      "content": "...",
      "cached": false
    }
  ],
  "summary": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "cached": 0,
    "totalContentBlocks": 15
  },
  "fetchedAt": "2024-12-11T10:30:00.000Z"
}

Resources

URI	Description
`superfetch://stats`	Server statistics and cache metrics
`superfetch://health`	Real-time server health and dependency status
Dynamic resources	Cached content available via resource subscriptions

Configuration

Configuration Presets

Default (Recommended) — No configuration needed

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

Debug Mode — Verbose logging and no cache

VS Code (.vscode/mcp.json):

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "LOG_LEVEL": "debug",
        "CACHE_ENABLED": "false"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "LOG_LEVEL": "debug",
        "CACHE_ENABLED": "false"
      }
    }
  }
}

Cursor (MCP settings):

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "LOG_LEVEL": "debug",
        "CACHE_ENABLED": "false"
      }
    }
  }
}

Performance Mode — Aggressive caching for speed

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "CACHE_TTL": "7200",
        "CACHE_MAX_KEYS": "500",
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Custom User Agent — For sites that block bots

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
      }
    }
  }
}

Slow Networks / CI/CD — Extended timeouts

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "FETCH_TIMEOUT": "60000",
        "CACHE_ENABLED": "false",
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Available Environment Variables

Configure SuperFetch behavior by adding environment variables to your MCP client configuration's env property.

🌐 Fetcher Settings

Variable	Default	Valid Values	Description
`FETCH_TIMEOUT`	`30000`	`5000`-`120000`	Request timeout in milliseconds (5s-2min)
`USER_AGENT`	`superFetch-MCP/1.0`	Any valid user agent	Custom user agent for requests (useful for sites blocking bots)

💾 Cache Settings

Variable	Default	Valid Values	Description
`CACHE_ENABLED`	`true`	`true` / `false`	Enable response caching
`CACHE_TTL`	`3600`	`60`-`86400`	Cache lifetime in seconds (1min-24hrs)
`CACHE_MAX_KEYS`	`100`	`10`-`1000`	Maximum number of cached entries

📦 Output Settings

Variable	Default	Valid Values	Description
`MAX_INLINE_CONTENT_CHARS`	`20000`	`1000`-`200000`	Inline content limit before returning a `resource_link` instead

📝 Logging Settings

Variable	Default	Valid Values	Description
`LOG_LEVEL`	`info`	`debug` / `info` / `warn` / `error`	Logging verbosity level
`ENABLE_LOGGING`	`true`	`true` / `false`	Enable/disable all logging

🔍 Extraction Settings

Variable	Default	Valid Values	Description
`EXTRACT_MAIN_CONTENT`	`true`	`true` / `false`	Use Mozilla Readability to extract main content
`INCLUDE_METADATA`	`true`	`true` / `false`	Include page metadata (title, description, author)

🛡️ Security Settings

Variable	Default	Description
`API_KEY`	-	API Key for HTTP authentication (required for HTTP mode)
`ALLOW_REMOTE`	`false`	Allow binding to non-loopback interfaces

Rate Limiting

Variable	Default	Valid Values	Description
`RATE_LIMIT_ENABLED`	`true`	`true` / `false`	Enable/disable HTTP rate limiting
`RATE_LIMIT_MAX`	`100`	`1`-`10000`	Max requests per window per IP
`RATE_LIMIT_WINDOW_MS`	`60000`	`1000`-`3600000`	Rate limit window in milliseconds
`RATE_LIMIT_CLEANUP_MS`	`60000`	`10000`-`3600000`	Cleanup interval for limiter entries

HTTP Mode Configuration

HTTP Mode (Advanced) — For running as a standalone HTTP server

SuperFetch can run as an HTTP server for custom integrations. HTTP mode requires additional configuration and an API_KEY for authenticated access (send Authorization: Bearer <key> or X-API-Key: <key>).

Start HTTP Server

npx -y @j0hanz/superfetch@latest
# Server runs at http://127.0.0.1:3000

HTTP-Specific Environment Variables

Variable	Default	Description
`PORT`	`3000`	HTTP server port
`HOST`	`127.0.0.1`	HTTP server host (`0.0.0.0` for Docker/K8s)
`ALLOWED_ORIGINS`	`[]`	Comma-separated CORS origins
`CORS_ALLOW_ALL`	`false`	Allow all CORS origins (dev only, security risk)
`SESSION_TTL_MS`	`1800000`	Session time-to-live in milliseconds (30 mins)
`MAX_SESSIONS`	`200`	Maximum number of active sessions

VS Code HTTP Mode Setup

{
  "servers": {
    "superFetch": {
      "type": "http",
      "url": "http://127.0.0.1:3000/mcp"
    }
  }
}

Docker/Kubernetes Example

PORT=8080 HOST=0.0.0.0 ALLOWED_ORIGINS=https://myapp.com npx @j0hanz/superfetch@latest

Configuration Cookbook

Use Case	Configuration
🐛 Debugging issues	`LOG_LEVEL=debug`, `CACHE_ENABLED=false`
🚀 Maximum performance	`CACHE_TTL=7200`, `CACHE_MAX_KEYS=500`, `LOG_LEVEL=error`
🌐 Slow target sites	`FETCH_TIMEOUT=60000`
🤖 Bypass bot detection	`USER_AGENT="Mozilla/5.0 (compatible; MyBot/1.0)"`
🔄 CI/CD (always fresh)	`CACHE_ENABLED=false`, `FETCH_TIMEOUT=60000`, `LOG_LEVEL=warn`
📊 Production monitoring	`LOG_LEVEL=warn` or `error`

Content Block Types

JSONL output includes semantic content blocks:

Type	Description
`metadata`	Page title, description, author, URL, timestamp
`heading`	Headings (h1-h6) with level indicator
`paragraph`	Text paragraphs
`list`	Ordered/unordered lists
`code`	Code blocks with language
`table`	Tables with headers and rows
`image`	Images with src and alt text

Security

SSRF Protection

Blocked destinations:

Localhost and loopback addresses
Private IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
Cloud metadata endpoints (AWS, GCP, Azure)
IPv6 link-local and unique local addresses

Header Sanitization

Blocked headers: host, authorization, cookie, x-forwarded-for, x-real-ip, proxy-authorization

Rate Limiting

Default: 100 requests/minute per IP. Configure with RATE_LIMIT_MAX and RATE_LIMIT_WINDOW_MS.

HTTP Mode Endpoints

When running without --stdio, the following endpoints are available:

Endpoint	Method	Description
`/health`	GET	Health check with uptime and version
`/mcp`	POST	MCP request handling (requires session)
`/mcp`	GET	SSE stream for notifications
`/mcp`	DELETE	Close session

Sessions are managed via mcp-session-id header with 30-minute TTL.

Development

Scripts

Command	Description
`npm run dev`	Development server with hot reload
`npm run build`	Compile TypeScript
`npm start`	Production server
`npm run lint`	Run ESLint
`npm run type-check`	TypeScript type checking
`npm run format`	Format with Prettier
`npm test`	Run Vitest tests
`npm run test:coverage`	Run tests with coverage
`npm run bench`	Run minimal performance benchmark
`npm run release`	Create new release
`npm run knip`	Find unused exports/dependencies
`npm run knip:fix`	Auto-fix unused code

Tech Stack

Category	Technology
Runtime	Node.js ≥20.0.0
Language	TypeScript 5.9
MCP SDK	@modelcontextprotocol/sdk ^1.25.1
Content Extraction	@mozilla/readability ^0.6.0
HTML Parsing	Cheerio ^1.1.2, LinkeDOM ^0.18.12
Markdown	Turndown ^7.2.2
HTTP	Express ^5.2.1, Axios ^1.7.9
Caching	node-cache ^5.1.2
Validation	Zod ^3.24.1

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Ensure linting passes: npm run lint
Run tests: npm test
Commit changes: git commit -m 'Add amazing feature'
Push: git push origin feature/amazing-feature
Open a Pull Request

For examples of other MCP servers, see: github.com/modelcontextprotocol/servers

j0hanz/super-fetch-mcp-server