scrape-do-mcp-server by yigitkonur - MCP Server

🕷️ Scrape.do MCP Server

Complete research suite for AI assistants. Professional web scraping + intelligent search + LLM processing. Intelligent batching, zero rate limiting, production-ready.

Transform any website into clean, structured data for your AI workflows. Search the web for research. Built for reliability with automatic retry logic and smart concurrency control.

🔧 Latest Update: Multi-model LLM failover + Smart mode escalation! Automatically tries up to 3 different LLM models (primary → secondary → tertiary) and escalates scraping modes (basic → javascript → premium) for 100% reliability. Clean output format with reduced verbosity.

🚀 Quick Start

Installation

Option A: NPX Installation (Recommended)

Zero setup - no global installation required

# Simple Mode (default - only scrape_links tool)
claude mcp add scrape-do npx --scope user \
  -e SCRAPEDO_API_KEY=your_scrape_do_key \
  -- -y @yigitkonur/scrape-do-mcp-server

# Research Suite Mode (all 6 tools + search + LLM processing)
claude mcp add scrape-do npx --scope user \
  -e SCRAPEDO_API_KEY=your_scrape_do_key \
  -e SERPER_API_KEY=your_serper_key \
  -e ADVANCED_MODE=true \
  -e LLM_ENABLED=true \
  -e OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  -e OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxx \
  -e OPENAI_MODEL=google/gemini-2.0-flash-exp:free \
  -e OPENAI_MODEL_SECONDARY=x-ai/grok-2-1212:free \
  -e OPENAI_MODEL_TERTIARY=openai/gpt-4o-mini \
  -- -y @yigitkonur/scrape-do-mcp-server

# Advanced Mode (all 6 tools, LLM disabled for speed)
claude mcp add scrape-do npx --scope user \
  -e SCRAPEDO_API_KEY=your_scrape_do_key \
  -e SERPER_API_KEY=your_serper_key \
  -e ADVANCED_MODE=true \
  -e LLM_ENABLED=false \
  -- -y @yigitkonur/scrape-do-mcp-server

Option B: Global Installation

Install once, use anywhere

# Step 1: Install globally
npm install -g @yigitkonur/scrape-do-mcp-server

# Step 2: Add to Claude (Simple Mode - default)
claude mcp add scrape-do scrape-do-mcp-server --scope user \
  -e SCRAPEDO_API_KEY=your_scrape_do_key

# Step 2: Add to Claude (Research Suite Mode with LLM Failover)
claude mcp add scrape-do scrape-do-mcp-server --scope user \
  -e SCRAPEDO_API_KEY=your_scrape_do_key \
  -e SERPER_API_KEY=your_serper_key \
  -e ADVANCED_MODE=true \
  -e LLM_ENABLED=true \
  -e OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
  -e OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxx \
  -e OPENAI_MODEL=google/gemini-2.0-flash-exp:free \
  -e OPENAI_MODEL_SECONDARY=x-ai/grok-2-1212:free \
  -e OPENAI_MODEL_TERTIARY=openai/gpt-4o-mini

Other MCP Clients

First, install globally:

npm install -g @yigitkonur/scrape-do-mcp-server

Simple Mode Configuration (Default)

{
  "mcpServers": {
    "scrape-do": {
      "command": "scrape-do-mcp-server",
      "env": {
        "SCRAPEDO_API_KEY": "your_scrape_do_key"
      }
    }
  }
}

Research Suite Configuration (All Tools + Search + LLM with Failover)

{
  "mcpServers": {
    "scrape-do": {
      "command": "scrape-do-mcp-server",
      "env": {
        "SCRAPEDO_API_KEY": "your_scrape_do_key",
        "SERPER_API_KEY": "your_serper_key",
        "ADVANCED_MODE": "true",
        "LLM_ENABLED": "true",
        "OPENAI_BASE_URL": "https://openrouter.ai/api/v1",
        "OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxxxx",
        "OPENAI_MODEL": "google/gemini-2.0-flash-exp:free",
        "OPENAI_MODEL_SECONDARY": "x-ai/grok-2-1212:free",
        "OPENAI_MODEL_TERTIARY": "openai/gpt-4o-mini"
      }
    }
  }
}

Advanced Mode Configuration (All Tools + Search, LLM Disabled)

{
  "mcpServers": {
    "scrape-do": {
      "command": "scrape-do-mcp-server",
      "env": {
        "SCRAPEDO_API_KEY": "your_scrape_do_key",
        "SERPER_API_KEY": "your_serper_key",
        "ADVANCED_MODE": "true",
        "LLM_ENABLED": "false"
      }
    }
  }
}

Client	Config Location
Cline	`~/.cline_mcp_settings.json`
Cursor	`~/.cursor/mcp.json` or `.cursor/mcp.json`
Windsurf	`~/.codeium/windsurf/mcp_config.json`

Get Your API Keys

Required: Scrape.do API Key

Visit scrape.do and create an account
Generate your API key from the dashboard
Add it to your MCP configuration as SCRAPEDO_API_KEY

Optional: Search API Key (for research suite)

Visit SERPER and create an account
Generate your API key from the dashboard
Add it to your configuration as SERPER_API_KEY

Optional: LLM API Key (for enhanced mode)

Visit OpenRouter and create an account
Generate your API key from the dashboard
Add it to your configuration as OPENAI_API_KEY

Test Installation

Verify Installation:

# Check if binary is available
which scrape-do-mcp-server

# Verify Claude MCP connection
claude mcp list

Test Basic Scraping:

# Set environment variables first
export SCRAPEDO_API_KEY=your_key

npx @modelcontextprotocol/inspector --cli \
  'scrape-do-mcp-server' \
  --method tools/call --tool-name scrape_single \
  --tool-arg url="https://example.com" \
  --tool-arg use_llm=false

Test LLM-Enhanced Scraping:

# Set environment variables first
export SCRAPEDO_API_KEY=your_key
export OPENAI_API_KEY=your_openrouter_key
export OPENAI_BASE_URL=https://openrouter.ai/api/v1

npx @modelcontextprotocol/inspector --cli \
  'scrape-do-mcp-server' \
  --method tools/call --tool-name scrape_single \
  --tool-arg url="https://example.com" \
  --tool-arg use_llm=true \
  --tool-arg what_to_extract="main content only"

Direct NPX Usage

For standalone testing or automation, you can run the server directly via NPX. Important: Environment variables must be passed correctly using the env command:

# Test Simple Mode (default - 1 tool)
env SCRAPEDO_API_KEY=your_key \
npx @modelcontextprotocol/inspector --cli -- \
npx @yigitkonur/scrape-do-mcp-server --method tools/list

# Test Advanced Mode (6 tools)
env SCRAPEDO_API_KEY=your_key ADVANCED_MODE=true \
npx @modelcontextprotocol/inspector --cli -- \
npx @yigitkonur/scrape-do-mcp-server --method tools/list

# Test with LLM processing
env SCRAPEDO_API_KEY=your_key \
    OPENAI_API_KEY=your_openrouter_key \
    OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
npx @modelcontextprotocol/inspector --cli -- \
npx @yigitkonur/scrape-do-mcp-server \
--method tools/call --tool-name scrape_links \
--tool-arg 'urls=["https://example.com"]' \
--tool-arg 'use_llm=true' \
--tool-arg 'what_to_extract="main content"'

Note: The env command ensures environment variables are properly inherited by nested NPX processes.

🎯 Core Features

Complete Research Suite

Web Scraping: Process up to 50 URLs with intelligent batching and automatic mode escalation
Web Search: Batch search up to 100 keywords via SERPER API
LLM Processing: AI-powered content cleanup with automatic failover (up to 3 models)

Intelligent Batching

Process up to 50 URLs simultaneously with automatic rate limiting prevention. Our intelligent batching system prevents HTTP 429 errors by processing URLs in optimized chunks of 10 concurrent requests with smart delays.

# Process multiple URLs efficiently with automatic batching
scrape_links(urls=["site1.com", "site2.com", "site3.com"])

# Batch processes 10 URLs concurrently, then next 10, etc.
scrape_links(urls=[...30_urls...])  # Processed in 3 batches of 10

Batch Web Search

Search multiple keywords simultaneously and get structured results for comprehensive research workflows.

# Search multiple topics at once
get_multiple_search_results(keywords=["AI research", "machine learning", "deep learning"])

Multiple Scraping Modes

Mode	Use Case	Cost	Speed
Basic	Static content, fast processing	1 credit	~2s
JavaScript	SPAs, dynamic content	5 credits	~5s
Premium	Geo-restricted, high-success rate	10 credits	~3s

Built-in Resilience

Smart mode escalation: Automatically upgrades basic → javascript → premium on 502 errors
Multi-model LLM failover: Tries up to 3 different models (primary → secondary → tertiary) for 100% reliability
Automatic retries with exponential backoff
Graceful fallbacks when sites are unreachable
Smart timeout handling based on content type
Zero rate limiting with intelligent batching (10 concurrent URLs per batch)
Clean output format: Removed verbose status details from successful scrapes

🎛️ Operation Modes

Simple Mode (Default) ⭐ Recommended

Streamlined experience that drives LLM focus to the optimal tool

By default, only the scrape_links tool is available. This intentional design choice ensures LLMs consistently use the most powerful, reliable scraping tool instead of getting distracted by legacy alternatives like web_fetch or single-URL tools.

# Default mode - only scrape_links available
export SCRAPEDO_API_KEY=your_key
# ADVANCED_MODE=false (default)

Why Simple Mode?

Eliminates tool confusion: LLMs often default to familiar but inferior tools like web_fetch
Drives optimal usage: Forces use of scrape_links which handles 1-50 URLs with intelligent batching
Reduces cognitive load: Single tool choice = faster, more reliable workflows
Better results: scrape_links outperforms all other URL tools in speed, reliability, and features

Best for: 95% of use cases, new users, production workflows, consistent results

Advanced Mode

Full toolset for specialized edge cases

Enable all 6 tools when you need fine-grained control or have specific technical requirements that demand tool variety.

# Enable all tools
export SCRAPEDO_API_KEY=your_key
export ADVANCED_MODE=true

Available tools in Advanced Mode:

scrape_links - Universal URL processor with LLM support (still the best choice)
scrape_single - Single URL processing with LLM support
scrape_premium - High-success rate with residential proxies and LLM support
scrape_javascript - Dynamic content with JS rendering and LLM support
scrape_interactive - Browser automation with interactions (no LLM support)
check_credits - API usage monitoring

Best for: Power users who need specific tool access, complex automation workflows, debugging scenarios

🛠️ Available Tools

`get_multiple_search_results` - Batch Web Search 🔍

Search up to 100 keywords simultaneously via SERPER API

# Basic batch search
get_multiple_search_results(
  keywords=["python tutorials", "machine learning", "data science"]
)

# Research workflow example
get_multiple_search_results(
  keywords=["competitor analysis", "market research 2024", "industry trends"]
)

Perfect for: Research workflows, competitive analysis, comprehensive topic exploration, finding multiple sources quickly.

Requires: SERPER_API_KEY environment variable set.

`scrape_links` - Primary Web Scraper

Process 1-50 URLs in parallel with intelligent batching, automatic mode escalation, and optional LLM processing

# Basic usage - automatically retries with upgraded modes on 502 errors
scrape_links(
  urls=["https://news.ycombinator.com", "https://github.com/trending"],
  mode="basic",
  use_llm=false
)

# LLM-enhanced batch processing with targeted extraction
scrape_links(
  urls=["https://site1.com", "https://site2.com", "https://site3.com"],
  mode="basic",
  use_llm=true,
  what_to_extract="main content and key points"
)

# Advanced usage with JavaScript rendering and LLM processing
scrape_links(
  urls=["https://example.com"],
  mode="javascript",
  timeout=60,
  waitFor=5000,
  use_llm=true
)

# Batch processing - handles 10 concurrent URLs per batch
scrape_links(
  urls=[...30_different_urls...],  # Processes in 3 batches of 10
  mode="basic",
  use_llm=false
)

# Premium mode with geo-targeting and LLM extraction
scrape_links(
  urls=["https://geo-restricted.com"],
  mode="premium",
  country="US",
  use_llm=true,
  what_to_extract="product pricing and availability"
)

Perfect for: Bulk content extraction, research workflows, competitive analysis, AI-powered data extraction

Smart Features:

Processes 10 URLs concurrently per batch with 500ms delays
Automatically upgrades basic → javascript → premium on 502 errors
Preserves successful results while retrying failed URLs
Zero configuration needed for resilient scraping

`scrape_single` - Individual URL Processing

When you need focused scraping of a single URL with optional LLM processing

# Basic scraping (1 credit)
scrape_single(
  url="https://example.com",
  use_llm=false
)

# LLM-enhanced scraping with targeted extraction
scrape_single(
  url="https://example.com",
  use_llm=true,
  what_to_extract="main content and key points"
)

# Advanced options
scrape_single(
  url="https://example.com",
  followRedirects=true,
  timeout=30000,
  use_llm=true
)

Perfect for: Testing, single-page extraction, fallback processing, targeted content extraction

`scrape_premium` - High-Success Rate Scraping

For geo-restricted or hard-to-access content with optional LLM processing

# Basic premium scraping with residential proxy (10 credits)
scrape_premium(
  url="https://restricted-site.com",
  country="US",
  use_llm=false
)

# LLM-enhanced premium scraping
scrape_premium(
  url="https://restricted-site.com",
  country="US",
  use_llm=true,
  what_to_extract="product information and availability"
)

# Advanced options with sticky sessions
scrape_premium(
  url="https://restricted-site.com",
  country="US",
  sticky=true,
  sessionId="my-session-123",
  followRedirects=true,
  timeout=30000,
  use_llm=true
)

Perfect for: E-commerce, geo-blocked content, enterprise sites, high-success rate requirements

`scrape_javascript` - Dynamic Content

For React, Vue, and other SPA applications with optional LLM processing

# Basic JavaScript rendering (5 credits)
scrape_javascript(
  url="https://spa-application.com",
  waitFor=5000,
  use_llm=false
)

# Wait for specific element to load
scrape_javascript(
  url="https://spa-application.com",
  waitSelector=".content-loaded",
  use_llm=false
)

# LLM-enhanced extraction from dynamic content
scrape_javascript(
  url="https://spa-application.com",
  waitFor=3000,
  use_llm=true,
  what_to_extract="product details and pricing"
)

# Advanced options
scrape_javascript(
  url="https://spa-application.com",
  waitFor=5000,
  waitSelector="#main-content",
  viewport={"width": 1920, "height": 1080},
  timeout=30000,
  use_llm=true
)

Perfect for: Modern web apps, dashboards, dynamic pricing, SPAs requiring JavaScript execution

`scrape_interactive` - Browser Automation

For sites requiring user interaction

scrape_interactive(
  url="https://complex-site.com",
  actions=[
    {"type": "click", "selector": ".cookie-accept"},
    {"type": "fill", "selector": "input[name='search']", "value": "query"},
    {"type": "click", "selector": ".search-button"},
    {"type": "wait", "wait": 2000}
  ]
)

Perfect for: Login flows, form submissions, multi-step processes

`check_credits` - Account Management

Monitor your API usage and remaining credits

check_credits(includeDetails=true)

⚙️ Configuration

Environment Variables

Core Configuration

Variable	Required	Default	Description
`SCRAPEDO_API_KEY`	✅ Yes	-	Your Scrape.do API key
`SERPER_API_KEY`	❌ No	-	SERPER API key for web search (enables `get_multiple_search_results`)
`LLM_ENABLED`	❌ No	`false`	Enable LLM processing by default
`ADVANCED_MODE`	❌ No	`false`	Enable all tools (default: only `scrape_links`)

LLM Configuration (Optional)

Variable	Required	Default	Description
`OPENAI_API_KEY`	❌ No	-	OpenRouter/OpenAI compatible API key
`OPENAI_BASE_URL`	❌ No	`https://openrouter.ai/api/v1`	API endpoint URL
`OPENAI_MODEL`	❌ No	`openai/gpt-4o-mini`	Primary model for processing
`OPENAI_MODEL_SECONDARY`	❌ No	Same as primary	Secondary failover model (automatic fallback)
`OPENAI_MODEL_TERTIARY`	❌ No	Same as primary	Tertiary failover model (final fallback)
`LLM_TIMEOUT_MS`	❌ No	`300000`	LLM processing timeout (5 minutes)
`LLM_MAX_CONCURRENT`	❌ No	`5`	Max concurrent LLM requests
`LLM_MAX_RETRIES`	❌ No	`3`	Max retry attempts for LLM failures

LLM Failover Strategy: The server automatically tries models in order (primary → secondary → tertiary) if one fails. All models share the same API key and base URL for simplicity.

🔒 Security Best Practices

⚠️ CRITICAL: Never commit API keys to version control

# ✅ Correct: Use environment variables
export SCRAPEDO_API_KEY=your_actual_key
export SERPER_API_KEY=your_actual_key
export OPENAI_API_KEY=your_actual_key

# ✅ Correct: Use .env file (automatically ignored)
cp .env.example .env
# Edit .env with your actual keys

# ❌ Wrong: Never put real keys in code or public repos

Environment Configuration:

API keys are automatically excluded from git (.gitignore)
Use .env.example as a template
Real .env files are never published to NPM
Rotate keys immediately if accidentally exposed

Server Configuration

Variable	Required	Default	Description
`MCP_TRANSPORT`	❌ No	`stdio`	MCP transport method
`LOG_LEVEL`	❌ No	`info`	Logging level (error, warn, info, debug)
`LOG_FORMAT`	❌ No	`json`	Log format (json, text)

Performance Tuning

Variable	Required	Default	Description
`CACHE_TTL_SECONDS`	❌ No	`60`	Cache duration (1-86400)
`CACHE_MAX_ITEMS`	❌ No	`1000`	Maximum cached items (1-100000)
`RATE_LIMIT_REQUESTS`	❌ No	`100`	Rate limit per window (1-10000)
`RATE_LIMIT_WINDOW_MS`	❌ No	`60000`	Rate limit window (1000-3600000)
`DEFAULT_TIMEOUT_MS`	❌ No	`30000`	Default timeout (1000-300000)
`MAX_TIMEOUT_MS`	❌ No	`120000`	Maximum timeout (5000-600000)

LLM Mode Control

Default Behavior: LLM features are disabled (LLM_ENABLED=false) for maximum speed.

Enable LLM Globally: Set LLM_ENABLED=true to make all tools use LLM by default.

Per-Request Override: Always specify use_llm=true/false in tool calls to override the global default.

# Example: LLM enabled globally, but can be disabled per request
LLM_ENABLED=true  # Global default: ON
scrape_single(url="...", use_llm=false)  # This request: OFF

# Example: LLM disabled globally, but can be enabled per request
LLM_ENABLED=false  # Global default: OFF
scrape_single(url="...", use_llm=true)   # This request: ON

Performance Optimization

Recommended Settings for Different Use Cases:

# High-volume processing
CACHE_TTL_SECONDS=300
RATE_LIMIT_REQUESTS=200

# Development/testing
CACHE_TTL_SECONDS=30
LOG_LEVEL=debug

# Production reliability
DEFAULT_TIMEOUT_MS=45000
MAX_TIMEOUT_MS=180000

📊 Performance & Reliability

Intelligent Batching System

10 URLs per batch with 500ms delays between batches
Zero HTTP 429 errors across all test scenarios
Smart mode escalation: Automatically retries failed URLs with upgraded modes
40%+ success rate on challenging sites (vs 12% without batching)

Speed Benchmarks

Single URL: ~2-5 seconds depending on mode
10 URLs: ~8-15 seconds with concurrent batching
30 URLs: ~25-40 seconds with intelligent delays (3 batches of 10)
50 URLs: ~40-60 seconds with full batching (5 batches of 10)

Error Handling

Smart mode escalation: Automatic basic → javascript → premium on 502 errors
Automatic fallbacks: Retry logic with exponential backoff
Graceful degradation: Always returns available content
Detailed error reporting: Clear guidance for resolution
Batch resilience: Failed URLs don't block successful ones

🔧 Advanced Usage

Handling Different Content Types

# News articles and blogs
scrape_links(urls=["news-site.com"], mode="basic")

# E-commerce product pages
scrape_links(urls=["shop.com/product"], mode="premium", country="US")

# Single-page applications
scrape_links(urls=["app.com"], mode="javascript", waitFor=3000)

# Complex interactive sites
scrape_interactive(
  url="complex-site.com",
  actions=[{"type": "click", "selector": ".load-more"}]
)

Smart Mode Escalation (Automatic)

Zero-configuration automatic retry with mode upgrades on 502 errors:

# You only need to call once - the system handles retries automatically
scrape_links(urls=["site1.com", "site2.com"], mode="basic")

# Behind the scenes, for URLs that return 502 errors:
# 1. First attempt: basic mode (fast, 1 credit)
# 2. Auto-retry: javascript mode if basic returns 502 (5 credits)
# 3. Final retry: premium mode if javascript returns 502 (10 credits)
# 4. Returns best available result or clear error message

How it works:

Detects HTTP 502 (Bad Gateway) errors automatically
Upgrades mode: basic → javascript → premium
Preserves successful results from initial batch
Only retries failed URLs with upgraded modes
Maximizes success rate while minimizing credit usage

Manual Error Recovery (for non-502 errors):

# For timeouts or other errors, manually escalate:
1. Try: scrape_links(mode="basic")
2. If timeout: scrape_links(mode="javascript", timeout=60)
3. If blocked: scrape_links(mode="premium", country="US")
4. Complex sites: scrape_interactive with custom actions

Credit Optimization

# Check credits before large operations
check_credits()

# Use appropriate modes to minimize costs
- Basic (1 credit): Static content
- JavaScript (5 credits): Dynamic content only when needed
- Premium (10 credits): When basic/javascript fail

🧠 AI-Powered Content Enhancement

Optional Feature: Enhance scraped content with AI-powered cleanup and targeted extraction.

How It Works

After scraping content via Scrape.do, the LLM processes the markdown to:

Content Cleanup: Remove navigation, ads, and clutter while preserving all meaningful information
Targeted Extraction: Extract specific data based on your instructions
Structure Enhancement: Improve formatting and readability

Setup Requirements

For LLM Features, Add These Environment Variables:

LLM_ENABLED=true
OPENAI_API_KEY=your_openrouter_api_key
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=moonshotai/kimi-k2-0905:nitro

Recommended Provider: OpenRouter with the Kimi model for fast, reliable processing.

Usage Examples

Content Cleanup Mode

scrape_single(
  url="https://messy-blog-post.com",
  use_llm=true
)

Result: Clean, well-structured markdown with clutter removed, all content preserved.

Targeted Extraction Mode

scrape_single(
  url="https://product-page.com",
  use_llm=true,
  what_to_extract="Extract only price, title, and availability status"
)

Result: Only the requested data, cleanly formatted.

Batch Processing with LLM

scrape_links(
  urls=["site1.com", "site2.com", "site3.com"],
  use_llm=true,
  what_to_extract="main content and key points"
)

Result: Each successfully scraped URL processed through LLM in parallel.

Performance Impact

Processing Time: +2-5 seconds per URL for LLM processing
Parallel Processing: LLM processes successful scrapes concurrently with semaphore control
Graceful Fallback: If LLM fails, original scraped content is returned
Token Limits: Very large documents (200k+ characters) may hit output limits

📝 Quick Configuration Examples

Minimal Setup (Fast Mode)

# Only Scrape.do API key required
export SCRAPEDO_API_KEY=your_scrape_do_key
export LLM_ENABLED=false

# Usage: Pure scraping, maximum speed
scrape_single(url="https://example.com", use_llm=false)

Full Setup (LLM Enhanced)

# Scrape.do + OpenRouter configuration
export SCRAPEDO_API_KEY=your_scrape_do_key
export LLM_ENABLED=true
export OPENAI_API_KEY=your_openrouter_key
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
export OPENAI_MODEL=moonshotai/kimi-k2-0905:nitro

# Usage: AI-enhanced content processing
scrape_single(url="https://example.com", use_llm=true, what_to_extract="key information")

Hybrid Setup (Flexible)

# LLM available but disabled by default
export SCRAPEDO_API_KEY=your_scrape_do_key
export LLM_ENABLED=false
export OPENAI_API_KEY=your_openrouter_key
export OPENAI_BASE_URL=https://openrouter.ai/api/v1

# Usage: Choose per request
scrape_single(url="https://example.com", use_llm=false)  # Fast
scrape_single(url="https://example.com", use_llm=true)   # Enhanced

🐛 Troubleshooting

Installation Issues

"Claude MCP Shows ✗ Failed to connect"

Primary Solution: Use the recommended NPX syntax

# Remove any existing configuration
claude mcp remove scrape-do -s user

# Use correct NPX syntax with proper argument separation
claude mcp add scrape-do npx --scope user \
  -e SCRAPEDO_API_KEY=your_key \
  -- -y @yigitkonur/scrape-do-mcp-server

# Verify connection
claude mcp list

Alternative Solution: Global installation

# Install globally then use direct binary
npm install -g @yigitkonur/scrape-do-mcp-server
claude mcp add scrape-do scrape-do-mcp-server --scope user \
  -e SCRAPEDO_API_KEY=your_key

"scrape-do-mcp-server: command not found"

Ensure global installation completed successfully
Check if binary is in PATH: which scrape-do-mcp-server
Restart terminal after global installation
If still not found, try: npm install -g @yigitkonur/scrape-do-mcp-server --force

"NPX Permission Denied Errors"

NPX may have cached a version with incorrect permissions
Clear NPX cache: rm -rf ~/.npm/_npx
Use global installation instead (recommended approach)

Installation Method Comparison

NPX (Recommended):

✅ Zero global installation required
✅ Always uses latest version
✅ Works with proper syntax: claude mcp add scrape-do npx ... -- -y @yigitkonur/scrape-do-mcp-server
⚠️ Requires specific argument structure for Claude MCP compatibility

Global Installation (Alternative):

✅ Reliable binary availability with correct permissions
✅ Faster startup time (no package resolution)
✅ Consistent behavior across different environments
⚠️ Requires manual updates for new versions

Common Issues

"HTTP 429 Rate Limiting"

This should not occur with our intelligent batching
If it happens, reduce batch size in source code

"Timeout Errors"

Try escalating: basic → javascript → premium modes
Increase timeout: timeout=60 for slow sites

"Content Not Loading"

Use scrape_javascript for dynamic content
Try waitSelector for specific elements
Use scrape_interactive for complex interactions

"Credits Running Low"

Use check_credits() to monitor usage
Optimize mode selection (basic < javascript < premium)
Cache results when possible

Performance Optimization

# Monitor your usage patterns
check_credits(includeDetails=true)

# Use appropriate timeouts
scrape_links(urls=["..."], timeout=30)  # Standard
scrape_links(urls=["..."], timeout=60)  # Slow sites

# Choose efficient modes
scrape_links(mode="basic")      # Try first
scrape_links(mode="javascript") # If content missing
scrape_links(mode="premium")    # If blocked

LLM Troubleshooting (Optional)

"LLM Processing Failed"

Verify OPENAI_API_KEY and OPENAI_BASE_URL are set
Check OpenRouter account credits
Original content is returned as fallback

"Content Too Long"

Large documents may hit LLM output limits
Use targeted extraction: what_to_extract="specific data"
Consider processing in smaller chunks

"LLM Not Working Despite Configuration"

Check LLM_ENABLED environment variable setting
Verify all LLM environment variables are exported correctly
Use explicit use_llm=true in tool calls to override defaults
Test with: check_credits() to verify connectivity

Configuration Behavior Summary

LLM_ENABLED	use_llm parameter	Final Behavior	Use Case
`false` (default)	Not specified	LLM OFF	Fast mode
`false`	`use_llm=true`	LLM ON	Selective enhancement
`true`	Not specified	LLM ON	Enhanced by default
`true`	`use_llm=false`	LLM OFF	Selective speed boost

Key Points:

Default: LLM_ENABLED=false for maximum speed
Tool parameter use_llm always overrides the global setting
Missing LLM API keys automatically disable LLM features
All tools gracefully fall back to original content if LLM fails

🔧 Development

Git Workflow for Contributors

This repository uses semantic-release for automated versioning and NPM publishing. To avoid git push conflicts:

# Always pull before pushing (recommended workflow)
git pull --rebase origin main && git push origin main

# Use conventional commit format for proper versioning
git commit -m "feat: add new feature"    # triggers minor version bump
git commit -m "fix: resolve bug"        # triggers patch version bump
git commit -m "docs: update readme"     # no version bump

Note: Semantic-release automatically creates version commits after each push, which requires rebasing on subsequent pushes.

yigitkonur/scrape-do-mcp-server