yigitkonur/scrape-do-mcp-server
If you are the rightful owner of scrape-do-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Scrape.do MCP Server is a comprehensive research suite designed for AI assistants, offering professional web scraping, intelligent search, and LLM processing.
🕷️ Scrape.do MCP Server
Complete research suite for AI assistants. Professional web scraping + intelligent search + LLM processing. Intelligent batching, zero rate limiting, production-ready.
Transform any website into clean, structured data for your AI workflows. Search the web for research. Built for reliability with automatic retry logic and smart concurrency control.
🔧 Latest Update: Multi-model LLM failover + Smart mode escalation! Automatically tries up to 3 different LLM models (primary → secondary → tertiary) and escalates scraping modes (basic → javascript → premium) for 100% reliability. Clean output format with reduced verbosity.
🚀 Quick Start
Installation
Option A: NPX Installation (Recommended)
Zero setup - no global installation required
# Simple Mode (default - only scrape_links tool)
claude mcp add scrape-do npx --scope user \
-e SCRAPEDO_API_KEY=your_scrape_do_key \
-- -y @yigitkonur/scrape-do-mcp-server
# Research Suite Mode (all 6 tools + search + LLM processing)
claude mcp add scrape-do npx --scope user \
-e SCRAPEDO_API_KEY=your_scrape_do_key \
-e SERPER_API_KEY=your_serper_key \
-e ADVANCED_MODE=true \
-e LLM_ENABLED=true \
-e OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
-e OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxx \
-e OPENAI_MODEL=google/gemini-2.0-flash-exp:free \
-e OPENAI_MODEL_SECONDARY=x-ai/grok-2-1212:free \
-e OPENAI_MODEL_TERTIARY=openai/gpt-4o-mini \
-- -y @yigitkonur/scrape-do-mcp-server
# Advanced Mode (all 6 tools, LLM disabled for speed)
claude mcp add scrape-do npx --scope user \
-e SCRAPEDO_API_KEY=your_scrape_do_key \
-e SERPER_API_KEY=your_serper_key \
-e ADVANCED_MODE=true \
-e LLM_ENABLED=false \
-- -y @yigitkonur/scrape-do-mcp-server
Option B: Global Installation
Install once, use anywhere
# Step 1: Install globally
npm install -g @yigitkonur/scrape-do-mcp-server
# Step 2: Add to Claude (Simple Mode - default)
claude mcp add scrape-do scrape-do-mcp-server --scope user \
-e SCRAPEDO_API_KEY=your_scrape_do_key
# Step 2: Add to Claude (Research Suite Mode with LLM Failover)
claude mcp add scrape-do scrape-do-mcp-server --scope user \
-e SCRAPEDO_API_KEY=your_scrape_do_key \
-e SERPER_API_KEY=your_serper_key \
-e ADVANCED_MODE=true \
-e LLM_ENABLED=true \
-e OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
-e OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxx \
-e OPENAI_MODEL=google/gemini-2.0-flash-exp:free \
-e OPENAI_MODEL_SECONDARY=x-ai/grok-2-1212:free \
-e OPENAI_MODEL_TERTIARY=openai/gpt-4o-mini
Other MCP Clients
First, install globally:
npm install -g @yigitkonur/scrape-do-mcp-server
Simple Mode Configuration (Default)
{
"mcpServers": {
"scrape-do": {
"command": "scrape-do-mcp-server",
"env": {
"SCRAPEDO_API_KEY": "your_scrape_do_key"
}
}
}
}
Research Suite Configuration (All Tools + Search + LLM with Failover)
{
"mcpServers": {
"scrape-do": {
"command": "scrape-do-mcp-server",
"env": {
"SCRAPEDO_API_KEY": "your_scrape_do_key",
"SERPER_API_KEY": "your_serper_key",
"ADVANCED_MODE": "true",
"LLM_ENABLED": "true",
"OPENAI_BASE_URL": "https://openrouter.ai/api/v1",
"OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxxxx",
"OPENAI_MODEL": "google/gemini-2.0-flash-exp:free",
"OPENAI_MODEL_SECONDARY": "x-ai/grok-2-1212:free",
"OPENAI_MODEL_TERTIARY": "openai/gpt-4o-mini"
}
}
}
}
Advanced Mode Configuration (All Tools + Search, LLM Disabled)
{
"mcpServers": {
"scrape-do": {
"command": "scrape-do-mcp-server",
"env": {
"SCRAPEDO_API_KEY": "your_scrape_do_key",
"SERPER_API_KEY": "your_serper_key",
"ADVANCED_MODE": "true",
"LLM_ENABLED": "false"
}
}
}
}
| Client | Config Location |
|---|---|
| Cline | ~/.cline_mcp_settings.json |
| Cursor | ~/.cursor/mcp.json or .cursor/mcp.json |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
Get Your API Keys
Required: Scrape.do API Key
- Visit scrape.do and create an account
- Generate your API key from the dashboard
- Add it to your MCP configuration as
SCRAPEDO_API_KEY
Optional: Search API Key (for research suite)
- Visit SERPER and create an account
- Generate your API key from the dashboard
- Add it to your configuration as
SERPER_API_KEY
Optional: LLM API Key (for enhanced mode)
- Visit OpenRouter and create an account
- Generate your API key from the dashboard
- Add it to your configuration as
OPENAI_API_KEY
Test Installation
Verify Installation:
# Check if binary is available
which scrape-do-mcp-server
# Verify Claude MCP connection
claude mcp list
Test Basic Scraping:
# Set environment variables first
export SCRAPEDO_API_KEY=your_key
npx @modelcontextprotocol/inspector --cli \
'scrape-do-mcp-server' \
--method tools/call --tool-name scrape_single \
--tool-arg url="https://example.com" \
--tool-arg use_llm=false
Test LLM-Enhanced Scraping:
# Set environment variables first
export SCRAPEDO_API_KEY=your_key
export OPENAI_API_KEY=your_openrouter_key
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
npx @modelcontextprotocol/inspector --cli \
'scrape-do-mcp-server' \
--method tools/call --tool-name scrape_single \
--tool-arg url="https://example.com" \
--tool-arg use_llm=true \
--tool-arg what_to_extract="main content only"
Direct NPX Usage
For standalone testing or automation, you can run the server directly via NPX. Important: Environment variables must be passed correctly using the env command:
# Test Simple Mode (default - 1 tool)
env SCRAPEDO_API_KEY=your_key \
npx @modelcontextprotocol/inspector --cli -- \
npx @yigitkonur/scrape-do-mcp-server --method tools/list
# Test Advanced Mode (6 tools)
env SCRAPEDO_API_KEY=your_key ADVANCED_MODE=true \
npx @modelcontextprotocol/inspector --cli -- \
npx @yigitkonur/scrape-do-mcp-server --method tools/list
# Test with LLM processing
env SCRAPEDO_API_KEY=your_key \
OPENAI_API_KEY=your_openrouter_key \
OPENAI_BASE_URL=https://openrouter.ai/api/v1 \
npx @modelcontextprotocol/inspector --cli -- \
npx @yigitkonur/scrape-do-mcp-server \
--method tools/call --tool-name scrape_links \
--tool-arg 'urls=["https://example.com"]' \
--tool-arg 'use_llm=true' \
--tool-arg 'what_to_extract="main content"'
Note: The env command ensures environment variables are properly inherited by nested NPX processes.
🎯 Core Features
Complete Research Suite
- Web Scraping: Process up to 50 URLs with intelligent batching and automatic mode escalation
- Web Search: Batch search up to 100 keywords via SERPER API
- LLM Processing: AI-powered content cleanup with automatic failover (up to 3 models)
Intelligent Batching
Process up to 50 URLs simultaneously with automatic rate limiting prevention. Our intelligent batching system prevents HTTP 429 errors by processing URLs in optimized chunks of 10 concurrent requests with smart delays.
# Process multiple URLs efficiently with automatic batching
scrape_links(urls=["site1.com", "site2.com", "site3.com"])
# Batch processes 10 URLs concurrently, then next 10, etc.
scrape_links(urls=[...30_urls...]) # Processed in 3 batches of 10
Batch Web Search
Search multiple keywords simultaneously and get structured results for comprehensive research workflows.
# Search multiple topics at once
get_multiple_search_results(keywords=["AI research", "machine learning", "deep learning"])
Multiple Scraping Modes
| Mode | Use Case | Cost | Speed |
|---|---|---|---|
| Basic | Static content, fast processing | 1 credit | ~2s |
| JavaScript | SPAs, dynamic content | 5 credits | ~5s |
| Premium | Geo-restricted, high-success rate | 10 credits | ~3s |
Built-in Resilience
- Smart mode escalation: Automatically upgrades basic → javascript → premium on 502 errors
- Multi-model LLM failover: Tries up to 3 different models (primary → secondary → tertiary) for 100% reliability
- Automatic retries with exponential backoff
- Graceful fallbacks when sites are unreachable
- Smart timeout handling based on content type
- Zero rate limiting with intelligent batching (10 concurrent URLs per batch)
- Clean output format: Removed verbose status details from successful scrapes
🎛️ Operation Modes
Simple Mode (Default) ⭐ Recommended
Streamlined experience that drives LLM focus to the optimal tool
By default, only the scrape_links tool is available. This intentional design choice ensures LLMs consistently use the most powerful, reliable scraping tool instead of getting distracted by legacy alternatives like web_fetch or single-URL tools.
# Default mode - only scrape_links available
export SCRAPEDO_API_KEY=your_key
# ADVANCED_MODE=false (default)
Why Simple Mode?
- Eliminates tool confusion: LLMs often default to familiar but inferior tools like
web_fetch - Drives optimal usage: Forces use of
scrape_linkswhich handles 1-50 URLs with intelligent batching - Reduces cognitive load: Single tool choice = faster, more reliable workflows
- Better results:
scrape_linksoutperforms all other URL tools in speed, reliability, and features
Best for: 95% of use cases, new users, production workflows, consistent results
Advanced Mode
Full toolset for specialized edge cases
Enable all 6 tools when you need fine-grained control or have specific technical requirements that demand tool variety.
# Enable all tools
export SCRAPEDO_API_KEY=your_key
export ADVANCED_MODE=true
Available tools in Advanced Mode:
scrape_links- Universal URL processor with LLM support (still the best choice)scrape_single- Single URL processing with LLM supportscrape_premium- High-success rate with residential proxies and LLM supportscrape_javascript- Dynamic content with JS rendering and LLM supportscrape_interactive- Browser automation with interactions (no LLM support)check_credits- API usage monitoring
Best for: Power users who need specific tool access, complex automation workflows, debugging scenarios
🛠️ Available Tools
get_multiple_search_results - Batch Web Search 🔍
Search up to 100 keywords simultaneously via SERPER API
# Basic batch search
get_multiple_search_results(
keywords=["python tutorials", "machine learning", "data science"]
)
# Research workflow example
get_multiple_search_results(
keywords=["competitor analysis", "market research 2024", "industry trends"]
)
Perfect for: Research workflows, competitive analysis, comprehensive topic exploration, finding multiple sources quickly.
Requires: SERPER_API_KEY environment variable set.
scrape_links - Primary Web Scraper
Process 1-50 URLs in parallel with intelligent batching, automatic mode escalation, and optional LLM processing
# Basic usage - automatically retries with upgraded modes on 502 errors
scrape_links(
urls=["https://news.ycombinator.com", "https://github.com/trending"],
mode="basic",
use_llm=false
)
# LLM-enhanced batch processing with targeted extraction
scrape_links(
urls=["https://site1.com", "https://site2.com", "https://site3.com"],
mode="basic",
use_llm=true,
what_to_extract="main content and key points"
)
# Advanced usage with JavaScript rendering and LLM processing
scrape_links(
urls=["https://example.com"],
mode="javascript",
timeout=60,
waitFor=5000,
use_llm=true
)
# Batch processing - handles 10 concurrent URLs per batch
scrape_links(
urls=[...30_different_urls...], # Processes in 3 batches of 10
mode="basic",
use_llm=false
)
# Premium mode with geo-targeting and LLM extraction
scrape_links(
urls=["https://geo-restricted.com"],
mode="premium",
country="US",
use_llm=true,
what_to_extract="product pricing and availability"
)
Perfect for: Bulk content extraction, research workflows, competitive analysis, AI-powered data extraction
Smart Features:
- Processes 10 URLs concurrently per batch with 500ms delays
- Automatically upgrades basic → javascript → premium on 502 errors
- Preserves successful results while retrying failed URLs
- Zero configuration needed for resilient scraping
scrape_single - Individual URL Processing
When you need focused scraping of a single URL with optional LLM processing
# Basic scraping (1 credit)
scrape_single(
url="https://example.com",
use_llm=false
)
# LLM-enhanced scraping with targeted extraction
scrape_single(
url="https://example.com",
use_llm=true,
what_to_extract="main content and key points"
)
# Advanced options
scrape_single(
url="https://example.com",
followRedirects=true,
timeout=30000,
use_llm=true
)
Perfect for: Testing, single-page extraction, fallback processing, targeted content extraction
scrape_premium - High-Success Rate Scraping
For geo-restricted or hard-to-access content with optional LLM processing
# Basic premium scraping with residential proxy (10 credits)
scrape_premium(
url="https://restricted-site.com",
country="US",
use_llm=false
)
# LLM-enhanced premium scraping
scrape_premium(
url="https://restricted-site.com",
country="US",
use_llm=true,
what_to_extract="product information and availability"
)
# Advanced options with sticky sessions
scrape_premium(
url="https://restricted-site.com",
country="US",
sticky=true,
sessionId="my-session-123",
followRedirects=true,
timeout=30000,
use_llm=true
)
Perfect for: E-commerce, geo-blocked content, enterprise sites, high-success rate requirements
scrape_javascript - Dynamic Content
For React, Vue, and other SPA applications with optional LLM processing
# Basic JavaScript rendering (5 credits)
scrape_javascript(
url="https://spa-application.com",
waitFor=5000,
use_llm=false
)
# Wait for specific element to load
scrape_javascript(
url="https://spa-application.com",
waitSelector=".content-loaded",
use_llm=false
)
# LLM-enhanced extraction from dynamic content
scrape_javascript(
url="https://spa-application.com",
waitFor=3000,
use_llm=true,
what_to_extract="product details and pricing"
)
# Advanced options
scrape_javascript(
url="https://spa-application.com",
waitFor=5000,
waitSelector="#main-content",
viewport={"width": 1920, "height": 1080},
timeout=30000,
use_llm=true
)
Perfect for: Modern web apps, dashboards, dynamic pricing, SPAs requiring JavaScript execution
scrape_interactive - Browser Automation
For sites requiring user interaction
scrape_interactive(
url="https://complex-site.com",
actions=[
{"type": "click", "selector": ".cookie-accept"},
{"type": "fill", "selector": "input[name='search']", "value": "query"},
{"type": "click", "selector": ".search-button"},
{"type": "wait", "wait": 2000}
]
)
Perfect for: Login flows, form submissions, multi-step processes
check_credits - Account Management
Monitor your API usage and remaining credits
check_credits(includeDetails=true)
⚙️ Configuration
Environment Variables
Core Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
SCRAPEDO_API_KEY | ✅ Yes | - | Your Scrape.do API key |
SERPER_API_KEY | ❌ No | - | SERPER API key for web search (enables get_multiple_search_results) |
LLM_ENABLED | ❌ No | false | Enable LLM processing by default |
ADVANCED_MODE | ❌ No | false | Enable all tools (default: only scrape_links) |
LLM Configuration (Optional)
| Variable | Required | Default | Description |
|---|---|---|---|
OPENAI_API_KEY | ❌ No | - | OpenRouter/OpenAI compatible API key |
OPENAI_BASE_URL | ❌ No | https://openrouter.ai/api/v1 | API endpoint URL |
OPENAI_MODEL | ❌ No | openai/gpt-4o-mini | Primary model for processing |
OPENAI_MODEL_SECONDARY | ❌ No | Same as primary | Secondary failover model (automatic fallback) |
OPENAI_MODEL_TERTIARY | ❌ No | Same as primary | Tertiary failover model (final fallback) |
LLM_TIMEOUT_MS | ❌ No | 300000 | LLM processing timeout (5 minutes) |
LLM_MAX_CONCURRENT | ❌ No | 5 | Max concurrent LLM requests |
LLM_MAX_RETRIES | ❌ No | 3 | Max retry attempts for LLM failures |
LLM Failover Strategy: The server automatically tries models in order (primary → secondary → tertiary) if one fails. All models share the same API key and base URL for simplicity.
🔒 Security Best Practices
⚠️ CRITICAL: Never commit API keys to version control
# ✅ Correct: Use environment variables
export SCRAPEDO_API_KEY=your_actual_key
export SERPER_API_KEY=your_actual_key
export OPENAI_API_KEY=your_actual_key
# ✅ Correct: Use .env file (automatically ignored)
cp .env.example .env
# Edit .env with your actual keys
# ❌ Wrong: Never put real keys in code or public repos
Environment Configuration:
- API keys are automatically excluded from git (
.gitignore) - Use
.env.exampleas a template - Real
.envfiles are never published to NPM - Rotate keys immediately if accidentally exposed
Server Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
MCP_TRANSPORT | ❌ No | stdio | MCP transport method |
LOG_LEVEL | ❌ No | info | Logging level (error, warn, info, debug) |
LOG_FORMAT | ❌ No | json | Log format (json, text) |
Performance Tuning
| Variable | Required | Default | Description |
|---|---|---|---|
CACHE_TTL_SECONDS | ❌ No | 60 | Cache duration (1-86400) |
CACHE_MAX_ITEMS | ❌ No | 1000 | Maximum cached items (1-100000) |
RATE_LIMIT_REQUESTS | ❌ No | 100 | Rate limit per window (1-10000) |
RATE_LIMIT_WINDOW_MS | ❌ No | 60000 | Rate limit window (1000-3600000) |
DEFAULT_TIMEOUT_MS | ❌ No | 30000 | Default timeout (1000-300000) |
MAX_TIMEOUT_MS | ❌ No | 120000 | Maximum timeout (5000-600000) |
LLM Mode Control
Default Behavior: LLM features are disabled (LLM_ENABLED=false) for maximum speed.
Enable LLM Globally: Set LLM_ENABLED=true to make all tools use LLM by default.
Per-Request Override: Always specify use_llm=true/false in tool calls to override the global default.
# Example: LLM enabled globally, but can be disabled per request
LLM_ENABLED=true # Global default: ON
scrape_single(url="...", use_llm=false) # This request: OFF
# Example: LLM disabled globally, but can be enabled per request
LLM_ENABLED=false # Global default: OFF
scrape_single(url="...", use_llm=true) # This request: ON
Performance Optimization
Recommended Settings for Different Use Cases:
# High-volume processing
CACHE_TTL_SECONDS=300
RATE_LIMIT_REQUESTS=200
# Development/testing
CACHE_TTL_SECONDS=30
LOG_LEVEL=debug
# Production reliability
DEFAULT_TIMEOUT_MS=45000
MAX_TIMEOUT_MS=180000
📊 Performance & Reliability
Intelligent Batching System
- 10 URLs per batch with 500ms delays between batches
- Zero HTTP 429 errors across all test scenarios
- Smart mode escalation: Automatically retries failed URLs with upgraded modes
- 40%+ success rate on challenging sites (vs 12% without batching)
Speed Benchmarks
- Single URL: ~2-5 seconds depending on mode
- 10 URLs: ~8-15 seconds with concurrent batching
- 30 URLs: ~25-40 seconds with intelligent delays (3 batches of 10)
- 50 URLs: ~40-60 seconds with full batching (5 batches of 10)
Error Handling
- Smart mode escalation: Automatic basic → javascript → premium on 502 errors
- Automatic fallbacks: Retry logic with exponential backoff
- Graceful degradation: Always returns available content
- Detailed error reporting: Clear guidance for resolution
- Batch resilience: Failed URLs don't block successful ones
🔧 Advanced Usage
Handling Different Content Types
# News articles and blogs
scrape_links(urls=["news-site.com"], mode="basic")
# E-commerce product pages
scrape_links(urls=["shop.com/product"], mode="premium", country="US")
# Single-page applications
scrape_links(urls=["app.com"], mode="javascript", waitFor=3000)
# Complex interactive sites
scrape_interactive(
url="complex-site.com",
actions=[{"type": "click", "selector": ".load-more"}]
)
Smart Mode Escalation (Automatic)
Zero-configuration automatic retry with mode upgrades on 502 errors:
# You only need to call once - the system handles retries automatically
scrape_links(urls=["site1.com", "site2.com"], mode="basic")
# Behind the scenes, for URLs that return 502 errors:
# 1. First attempt: basic mode (fast, 1 credit)
# 2. Auto-retry: javascript mode if basic returns 502 (5 credits)
# 3. Final retry: premium mode if javascript returns 502 (10 credits)
# 4. Returns best available result or clear error message
How it works:
- Detects HTTP 502 (Bad Gateway) errors automatically
- Upgrades mode: basic → javascript → premium
- Preserves successful results from initial batch
- Only retries failed URLs with upgraded modes
- Maximizes success rate while minimizing credit usage
Manual Error Recovery (for non-502 errors):
# For timeouts or other errors, manually escalate:
1. Try: scrape_links(mode="basic")
2. If timeout: scrape_links(mode="javascript", timeout=60)
3. If blocked: scrape_links(mode="premium", country="US")
4. Complex sites: scrape_interactive with custom actions
Credit Optimization
# Check credits before large operations
check_credits()
# Use appropriate modes to minimize costs
- Basic (1 credit): Static content
- JavaScript (5 credits): Dynamic content only when needed
- Premium (10 credits): When basic/javascript fail
🧠 AI-Powered Content Enhancement
Optional Feature: Enhance scraped content with AI-powered cleanup and targeted extraction.
How It Works
After scraping content via Scrape.do, the LLM processes the markdown to:
- Content Cleanup: Remove navigation, ads, and clutter while preserving all meaningful information
- Targeted Extraction: Extract specific data based on your instructions
- Structure Enhancement: Improve formatting and readability
Setup Requirements
For LLM Features, Add These Environment Variables:
LLM_ENABLED=true
OPENAI_API_KEY=your_openrouter_api_key
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_MODEL=moonshotai/kimi-k2-0905:nitro
Recommended Provider: OpenRouter with the Kimi model for fast, reliable processing.
Usage Examples
Content Cleanup Mode
scrape_single(
url="https://messy-blog-post.com",
use_llm=true
)
Result: Clean, well-structured markdown with clutter removed, all content preserved.
Targeted Extraction Mode
scrape_single(
url="https://product-page.com",
use_llm=true,
what_to_extract="Extract only price, title, and availability status"
)
Result: Only the requested data, cleanly formatted.
Batch Processing with LLM
scrape_links(
urls=["site1.com", "site2.com", "site3.com"],
use_llm=true,
what_to_extract="main content and key points"
)
Result: Each successfully scraped URL processed through LLM in parallel.
Performance Impact
- Processing Time: +2-5 seconds per URL for LLM processing
- Parallel Processing: LLM processes successful scrapes concurrently with semaphore control
- Graceful Fallback: If LLM fails, original scraped content is returned
- Token Limits: Very large documents (200k+ characters) may hit output limits
📝 Quick Configuration Examples
Minimal Setup (Fast Mode)
# Only Scrape.do API key required
export SCRAPEDO_API_KEY=your_scrape_do_key
export LLM_ENABLED=false
# Usage: Pure scraping, maximum speed
scrape_single(url="https://example.com", use_llm=false)
Full Setup (LLM Enhanced)
# Scrape.do + OpenRouter configuration
export SCRAPEDO_API_KEY=your_scrape_do_key
export LLM_ENABLED=true
export OPENAI_API_KEY=your_openrouter_key
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
export OPENAI_MODEL=moonshotai/kimi-k2-0905:nitro
# Usage: AI-enhanced content processing
scrape_single(url="https://example.com", use_llm=true, what_to_extract="key information")
Hybrid Setup (Flexible)
# LLM available but disabled by default
export SCRAPEDO_API_KEY=your_scrape_do_key
export LLM_ENABLED=false
export OPENAI_API_KEY=your_openrouter_key
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
# Usage: Choose per request
scrape_single(url="https://example.com", use_llm=false) # Fast
scrape_single(url="https://example.com", use_llm=true) # Enhanced
🐛 Troubleshooting
Installation Issues
"Claude MCP Shows ✗ Failed to connect"
- Primary Solution: Use the recommended NPX syntax
# Remove any existing configuration
claude mcp remove scrape-do -s user
# Use correct NPX syntax with proper argument separation
claude mcp add scrape-do npx --scope user \
-e SCRAPEDO_API_KEY=your_key \
-- -y @yigitkonur/scrape-do-mcp-server
# Verify connection
claude mcp list
- Alternative Solution: Global installation
# Install globally then use direct binary
npm install -g @yigitkonur/scrape-do-mcp-server
claude mcp add scrape-do scrape-do-mcp-server --scope user \
-e SCRAPEDO_API_KEY=your_key
"scrape-do-mcp-server: command not found"
- Ensure global installation completed successfully
- Check if binary is in PATH:
which scrape-do-mcp-server - Restart terminal after global installation
- If still not found, try:
npm install -g @yigitkonur/scrape-do-mcp-server --force
"NPX Permission Denied Errors"
- NPX may have cached a version with incorrect permissions
- Clear NPX cache:
rm -rf ~/.npm/_npx - Use global installation instead (recommended approach)
Installation Method Comparison
NPX (Recommended):
- ✅ Zero global installation required
- ✅ Always uses latest version
- ✅ Works with proper syntax:
claude mcp add scrape-do npx ... -- -y @yigitkonur/scrape-do-mcp-server - ⚠️ Requires specific argument structure for Claude MCP compatibility
Global Installation (Alternative):
- ✅ Reliable binary availability with correct permissions
- ✅ Faster startup time (no package resolution)
- ✅ Consistent behavior across different environments
- ⚠️ Requires manual updates for new versions
Common Issues
"HTTP 429 Rate Limiting"
- This should not occur with our intelligent batching
- If it happens, reduce batch size in source code
"Timeout Errors"
- Try escalating: basic → javascript → premium modes
- Increase timeout:
timeout=60for slow sites
"Content Not Loading"
- Use
scrape_javascriptfor dynamic content - Try
waitSelectorfor specific elements - Use
scrape_interactivefor complex interactions
"Credits Running Low"
- Use
check_credits()to monitor usage - Optimize mode selection (basic < javascript < premium)
- Cache results when possible
Performance Optimization
# Monitor your usage patterns
check_credits(includeDetails=true)
# Use appropriate timeouts
scrape_links(urls=["..."], timeout=30) # Standard
scrape_links(urls=["..."], timeout=60) # Slow sites
# Choose efficient modes
scrape_links(mode="basic") # Try first
scrape_links(mode="javascript") # If content missing
scrape_links(mode="premium") # If blocked
LLM Troubleshooting (Optional)
"LLM Processing Failed"
- Verify
OPENAI_API_KEYandOPENAI_BASE_URLare set - Check OpenRouter account credits
- Original content is returned as fallback
"Content Too Long"
- Large documents may hit LLM output limits
- Use targeted extraction:
what_to_extract="specific data" - Consider processing in smaller chunks
"LLM Not Working Despite Configuration"
- Check
LLM_ENABLEDenvironment variable setting - Verify all LLM environment variables are exported correctly
- Use explicit
use_llm=truein tool calls to override defaults - Test with:
check_credits()to verify connectivity
Configuration Behavior Summary
| LLM_ENABLED | use_llm parameter | Final Behavior | Use Case |
|---|---|---|---|
false (default) | Not specified | LLM OFF | Fast mode |
false | use_llm=true | LLM ON | Selective enhancement |
true | Not specified | LLM ON | Enhanced by default |
true | use_llm=false | LLM OFF | Selective speed boost |
Key Points:
- Default:
LLM_ENABLED=falsefor maximum speed - Tool parameter
use_llmalways overrides the global setting - Missing LLM API keys automatically disable LLM features
- All tools gracefully fall back to original content if LLM fails
🔧 Development
Git Workflow for Contributors
This repository uses semantic-release for automated versioning and NPM publishing. To avoid git push conflicts:
# Always pull before pushing (recommended workflow)
git pull --rebase origin main && git push origin main
# Use conventional commit format for proper versioning
git commit -m "feat: add new feature" # triggers minor version bump
git commit -m "fix: resolve bug" # triggers patch version bump
git commit -m "docs: update readme" # no version bump
Note: Semantic-release automatically creates version commits after each push, which requires rebasing on subsequent pushes.