super-fetch-mcp-server

j0hanz/super-fetch-mcp-server

3.2

If you are the rightful owner of super-fetch-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

SuperFetch is a Model Context Protocol (MCP) server designed to fetch, extract, and transform web content into AI-optimized formats using Mozilla Readability.

Tools
3
Resources
0
Prompts
0

🚀 superFetch MCP Server

npm version Node.js TypeScript

One-Click Install

Install with NPX in VS Code Install with NPX in VS Code Insiders

Install in Cursor

A Model Context Protocol (MCP) server that fetches, extracts, and transforms web content into AI-optimized formats using Mozilla Readability.

Quick Start · How to Choose a Tool · Tools · Configuration · Contributing

📦 Published to MCP Registry — Search for io.github.j0hanz/superfetch


[!CAUTION] This server can access URLs on behalf of AI assistants. Built-in SSRF protection blocks private IP ranges and cloud metadata endpoints, but exercise caution when deploying in sensitive environments.

✨ Features

FeatureDescription
🧠 Smart ExtractionMozilla Readability removes ads, navigation, and boilerplate
📄 Multiple FormatsJSONL semantic blocks or clean Markdown with YAML frontmatter
🔗 Link DiscoveryExtract and classify internal/external links
Built-in CachingConfigurable TTL and max entries
🛡️ Security FirstSSRF protection, URL validation, header sanitization
🔄 Resilient FetchingExponential backoff with jitter
📊 MonitoringStats resource for cache performance and health

🎯 How to Choose a Tool

Use this guide to select the right tool for your web content extraction needs:

Decision Tree

Need web content for AI?
├─ Single URL?
│   ├─ Need structured semantic blocks → fetch-url (JSONL)
│   ├─ Need readable markdown → fetch-markdown
│   └─ Need links only → fetch-links
└─ Multiple URLs?
    └─ Use fetch-urls (batch processing)

Quick Reference Table

ToolBest ForOutput FormatUse When
fetch-urlSingle page → structured contentJSONL semantic blocksAI analysis, RAG pipelines, content parsing
fetch-markdownSingle page → readable formatClean Markdown + TOCDocumentation, human-readable output
fetch-linksLink discovery & classificationURL array with typesSitemap building, finding related pages
fetch-urlsBatch processing multiple pagesMultiple JSONL/MarkdownComparing pages, bulk extraction

Common Use Cases

TaskRecommended ToolWhy
Parse a blog post for AIfetch-urlReturns semantic blocks (headings, paragraphs, code)
Generate documentationfetch-markdownClean markdown with optional TOC
Build a sitemapfetch-linksExtracts and classifies all links
Compare multiple docsfetch-urlsParallel fetching with concurrency control
Extract article for RAGfetch-url + extractMainContent: trueRemoves ads/nav, keeps main content

Quick Start

Add superFetch to your MCP client configuration — no installation required!

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

VS Code

Add to .vscode/mcp.json in your workspace:

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

With Custom Configuration

Configure SuperFetch behavior by adding environment variables to the env property:

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "CACHE_TTL": "7200",
        "LOG_LEVEL": "debug",
        "FETCH_TIMEOUT": "60000"
      }
    }
  }
}

See Configuration section below for all available options and presets.

Cursor

  1. Open Cursor Settings
  2. Go to Features > MCP Servers
  3. Click "+ Add new global MCP server"
  4. Add this configuration:
{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}

Tip: On Windows, if you encounter issues, try: cmd /c "npx -y @j0hanz/superfetch@latest --stdio"

Codex IDE

Add to your ~/.codex/config.toml file:

Basic Configuration:

[mcp_servers.superfetch]
command = "npx"
args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]

With Environment Variables:

[mcp_servers.superfetch]
command = "npx"
args = ["-y", "@j0hanz/superfetch@latest", "--stdio"]
env = { CACHE_TTL = "7200", LOG_LEVEL = "debug", FETCH_TIMEOUT = "60000" }

Access config file: Click the gear icon → "Codex Settings > Open config.toml"

Documentation: Codex MCP Guide

Cline (VS Code Extension)

Open the Cline MCP settings file:

macOS:

code ~/Library/Application\ Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Windows:

code %APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev\settings\cline_mcp_settings.json

Add the configuration:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "disabled": false,
      "autoApprove": []
    }
  }
}
Windsurf

Add to ./codeium/windsurf/model_config.json:

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}
Claude Desktop (Config File Locations)

macOS:

# Open config file
open -e "$HOME/Library/Application Support/Claude/claude_desktop_config.json"

# Or with VS Code
code "$HOME/Library/Application Support/Claude/claude_desktop_config.json"

Windows:

code %APPDATA%\Claude\claude_desktop_config.json

Installation (Alternative)

Global Installation

npm install -g @j0hanz/superfetch

# Run in stdio mode
superfetch --stdio

# Run HTTP server
superfetch

From Source

git clone https://github.com/j0hanz/super-fetch-mcp-server.git
cd super-fetch-mcp-server
npm install
npm run build

Running the Server

HTTP Mode (default)
# Development with hot reload
npm run dev

# Production
npm start

Server runs at http://127.0.0.1:3000:

  • Health check: GET /health
  • MCP endpoint: POST /mcp
stdio Mode (direct MCP integration)
node dist/index.js --stdio

Available Tools

Note: If extracted content exceeds MAX_INLINE_CONTENT_CHARS, the tool response includes a resourceUri and a resource_link content block instead of embedding the full text.

fetch-url

Fetches a webpage and converts it to AI-readable JSONL format with semantic content blocks.

ParameterTypeDefaultDescription
urlstringrequiredURL to fetch
extractMainContentbooleantrueUse Readability to extract main content
includeMetadatabooleantrueInclude page metadata (title, description)
maxContentLengthnumberMaximum content length in characters
customHeadersobjectCustom HTTP headers for the request
timeoutnumber30000Request timeout in milliseconds (1000-60000)
retriesnumber3Number of retry attempts (1-10)

Example Response:

{
  "url": "https://example.com/article",
  "title": "Example Article",
  "fetchedAt": "2025-12-11T10:30:00.000Z",
  "contentBlocks": [
    {
      "type": "metadata",
      "title": "Example Article",
      "description": "A sample article"
    },
    { "type": "heading", "level": 1, "text": "Introduction" },
    {
      "type": "paragraph",
      "text": "This is the main content of the article..."
    },
    {
      "type": "code",
      "language": "javascript",
      "content": "console.log('Hello');"
    }
  ],
  "cached": false
}

fetch-links

Extracts hyperlinks from a webpage with classification. Supports filtering, image links, and link limits.

ParameterTypeDefaultDescription
urlstringrequiredURL to extract links from
includeExternalbooleantrueInclude external links
includeInternalbooleantrueInclude internal links
includeImagesbooleanfalseInclude image links (img src attributes)
maxLinksnumberMaximum number of links to return (1-1000)
filterPatternstringRegex pattern to filter links (matches href)
customHeadersobjectCustom HTTP headers for the request
timeoutnumber30000Request timeout in milliseconds (1000-60000)
retriesnumber3Number of retry attempts (1-10)

Example Response:

{
  "url": "https://example.com/",
  "linkCount": 15,
  "links": [
    {
      "href": "https://example.com/about",
      "text": "About Us",
      "type": "internal"
    },
    {
      "href": "https://github.com/example",
      "text": "GitHub",
      "type": "external"
    },
    { "href": "https://example.com/logo.png", "text": "", "type": "image" }
  ],
  "cached": false,
  "truncated": false
}

fetch-markdown

Fetches a webpage and converts it to clean Markdown with optional table of contents.

ParameterTypeDefaultDescription
urlstringrequiredURL to fetch
extractMainContentbooleantrueExtract main content only
includeMetadatabooleantrueInclude YAML frontmatter
maxContentLengthnumberMaximum content length in characters
generateTocbooleanfalseGenerate table of contents from headings
customHeadersobjectCustom HTTP headers for the request
timeoutnumber30000Request timeout in milliseconds (1000-60000)
retriesnumber3Number of retry attempts (1-10)

Example Response:

{
  "url": "https://example.com/docs",
  "title": "Documentation",
  "fetchedAt": "2025-12-11T10:30:00.000Z",
  "markdown": "---\ntitle: Documentation\nsource: \"https://example.com/docs\"\n---\n\n# Getting Started\n\nWelcome to our documentation...\n\n## Installation\n\n```bash\nnpm install example\n```",
  "toc": [
    { "level": 1, "text": "Getting Started", "slug": "getting-started" },
    { "level": 2, "text": "Installation", "slug": "installation" }
  ],
  "cached": false,
  "truncated": false
}

fetch-urls (Batch)

Fetches multiple URLs in parallel with concurrency control. Ideal for comparing content or processing multiple pages efficiently.

ParameterTypeDefaultDescription
urlsstring[]requiredArray of URLs to fetch (1-10 URLs)
extractMainContentbooleantrueUse Readability to extract main content
includeMetadatabooleantrueInclude page metadata
maxContentLengthnumberMaximum content length per URL in characters
formatstring'jsonl'Output format: 'jsonl' or 'markdown'
concurrencynumber3Maximum concurrent requests (1-5)
continueOnErrorbooleantrueContinue processing if some URLs fail
customHeadersobjectCustom HTTP headers for all requests
timeoutnumber30000Request timeout in milliseconds (1000-60000)
retriesnumber3Number of retry attempts (1-10)

Example Output:

{
  "results": [
    {
      "url": "https://example.com",
      "success": true,
      "title": "Example",
      "content": "...",
      "cached": false
    },
    {
      "url": "https://example.org",
      "success": true,
      "title": "Example Org",
      "content": "...",
      "cached": false
    }
  ],
  "summary": {
    "total": 2,
    "successful": 2,
    "failed": 0,
    "cached": 0,
    "totalContentBlocks": 15
  },
  "fetchedAt": "2024-12-11T10:30:00.000Z"
}

Resources

URIDescription
superfetch://statsServer statistics and cache metrics
superfetch://healthReal-time server health and dependency status
Dynamic resourcesCached content available via resource subscriptions

Configuration

Configuration Presets

Default (Recommended) — No configuration needed
{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"]
    }
  }
}
Debug Mode — Verbose logging and no cache

VS Code (.vscode/mcp.json):

{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "LOG_LEVEL": "debug",
        "CACHE_ENABLED": "false"
      }
    }
  }
}

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "LOG_LEVEL": "debug",
        "CACHE_ENABLED": "false"
      }
    }
  }
}

Cursor (MCP settings):

{
  "mcpServers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "LOG_LEVEL": "debug",
        "CACHE_ENABLED": "false"
      }
    }
  }
}
Performance Mode — Aggressive caching for speed
{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "CACHE_TTL": "7200",
        "CACHE_MAX_KEYS": "500",
        "LOG_LEVEL": "warn"
      }
    }
  }
}
Custom User Agent — For sites that block bots
{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
      }
    }
  }
}
Slow Networks / CI/CD — Extended timeouts
{
  "servers": {
    "superFetch": {
      "command": "npx",
      "args": ["-y", "@j0hanz/superfetch@latest", "--stdio"],
      "env": {
        "FETCH_TIMEOUT": "60000",
        "CACHE_ENABLED": "false",
        "LOG_LEVEL": "warn"
      }
    }
  }
}

Available Environment Variables

Configure SuperFetch behavior by adding environment variables to your MCP client configuration's env property.

🌐 Fetcher Settings
VariableDefaultValid ValuesDescription
FETCH_TIMEOUT300005000-120000Request timeout in milliseconds (5s-2min)
USER_AGENTsuperFetch-MCP/1.0Any valid user agentCustom user agent for requests (useful for sites blocking bots)
💾 Cache Settings
VariableDefaultValid ValuesDescription
CACHE_ENABLEDtruetrue / falseEnable response caching
CACHE_TTL360060-86400Cache lifetime in seconds (1min-24hrs)
CACHE_MAX_KEYS10010-1000Maximum number of cached entries
📦 Output Settings
VariableDefaultValid ValuesDescription
MAX_INLINE_CONTENT_CHARS200001000-200000Inline content limit before returning a resource_link instead
📝 Logging Settings
VariableDefaultValid ValuesDescription
LOG_LEVELinfodebug / info / warn / errorLogging verbosity level
ENABLE_LOGGINGtruetrue / falseEnable/disable all logging
🔍 Extraction Settings
VariableDefaultValid ValuesDescription
EXTRACT_MAIN_CONTENTtruetrue / falseUse Mozilla Readability to extract main content
INCLUDE_METADATAtruetrue / falseInclude page metadata (title, description, author)
🛡️ Security Settings
VariableDefaultDescription
API_KEY-API Key for HTTP authentication (required for HTTP mode)
ALLOW_REMOTEfalseAllow binding to non-loopback interfaces
Rate Limiting
VariableDefaultValid ValuesDescription
RATE_LIMIT_ENABLEDtruetrue / falseEnable/disable HTTP rate limiting
RATE_LIMIT_MAX1001-10000Max requests per window per IP
RATE_LIMIT_WINDOW_MS600001000-3600000Rate limit window in milliseconds
RATE_LIMIT_CLEANUP_MS6000010000-3600000Cleanup interval for limiter entries

HTTP Mode Configuration

HTTP Mode (Advanced) — For running as a standalone HTTP server

SuperFetch can run as an HTTP server for custom integrations. HTTP mode requires additional configuration and an API_KEY for authenticated access (send Authorization: Bearer <key> or X-API-Key: <key>).

Start HTTP Server
npx -y @j0hanz/superfetch@latest
# Server runs at http://127.0.0.1:3000
HTTP-Specific Environment Variables
VariableDefaultDescription
PORT3000HTTP server port
HOST127.0.0.1HTTP server host (0.0.0.0 for Docker/K8s)
ALLOWED_ORIGINS[]Comma-separated CORS origins
CORS_ALLOW_ALLfalseAllow all CORS origins (dev only, security risk)
SESSION_TTL_MS1800000Session time-to-live in milliseconds (30 mins)
MAX_SESSIONS200Maximum number of active sessions
VS Code HTTP Mode Setup
{
  "servers": {
    "superFetch": {
      "type": "http",
      "url": "http://127.0.0.1:3000/mcp"
    }
  }
}
Docker/Kubernetes Example
PORT=8080 HOST=0.0.0.0 ALLOWED_ORIGINS=https://myapp.com npx @j0hanz/superfetch@latest

Configuration Cookbook

Use CaseConfiguration
🐛 Debugging issuesLOG_LEVEL=debug, CACHE_ENABLED=false
🚀 Maximum performanceCACHE_TTL=7200, CACHE_MAX_KEYS=500, LOG_LEVEL=error
🌐 Slow target sitesFETCH_TIMEOUT=60000
🤖 Bypass bot detectionUSER_AGENT="Mozilla/5.0 (compatible; MyBot/1.0)"
🔄 CI/CD (always fresh)CACHE_ENABLED=false, FETCH_TIMEOUT=60000, LOG_LEVEL=warn
📊 Production monitoringLOG_LEVEL=warn or error

Content Block Types

JSONL output includes semantic content blocks:

TypeDescription
metadataPage title, description, author, URL, timestamp
headingHeadings (h1-h6) with level indicator
paragraphText paragraphs
listOrdered/unordered lists
codeCode blocks with language
tableTables with headers and rows
imageImages with src and alt text

Security

SSRF Protection

Blocked destinations:

  • Localhost and loopback addresses
  • Private IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
  • Cloud metadata endpoints (AWS, GCP, Azure)
  • IPv6 link-local and unique local addresses

Header Sanitization

Blocked headers: host, authorization, cookie, x-forwarded-for, x-real-ip, proxy-authorization

Rate Limiting

Default: 100 requests/minute per IP. Configure with RATE_LIMIT_MAX and RATE_LIMIT_WINDOW_MS.

HTTP Mode Endpoints

When running without --stdio, the following endpoints are available:

EndpointMethodDescription
/healthGETHealth check with uptime and version
/mcpPOSTMCP request handling (requires session)
/mcpGETSSE stream for notifications
/mcpDELETEClose session

Sessions are managed via mcp-session-id header with 30-minute TTL.


Development

Scripts

CommandDescription
npm run devDevelopment server with hot reload
npm run buildCompile TypeScript
npm startProduction server
npm run lintRun ESLint
npm run type-checkTypeScript type checking
npm run formatFormat with Prettier
npm testRun Vitest tests
npm run test:coverageRun tests with coverage
npm run benchRun minimal performance benchmark
npm run releaseCreate new release
npm run knipFind unused exports/dependencies
npm run knip:fixAuto-fix unused code

Tech Stack

CategoryTechnology
RuntimeNode.js ≥20.0.0
LanguageTypeScript 5.9
MCP SDK@modelcontextprotocol/sdk ^1.25.1
Content Extraction@mozilla/readability ^0.6.0
HTML ParsingCheerio ^1.1.2, LinkeDOM ^0.18.12
MarkdownTurndown ^7.2.2
HTTPExpress ^5.2.1, Axios ^1.7.9
Cachingnode-cache ^5.1.2
ValidationZod ^3.24.1

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Ensure linting passes: npm run lint
  4. Run tests: npm test
  5. Commit changes: git commit -m 'Add amazing feature'
  6. Push: git push origin feature/amazing-feature
  7. Open a Pull Request

For examples of other MCP servers, see: github.com/modelcontextprotocol/servers