willbohn/spider-mcp
If you are the rightful owner of spider-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Spider Cloud MCP Server is a high-performance server designed for web scraping, crawling, and data extraction using the Model Context Protocol (MCP).
Spider Cloud MCP Server
A high-performance Model Context Protocol (MCP) server that provides comprehensive web scraping, crawling, and data extraction capabilities through the Spider Cloud API. This server enables AI assistants like Claude to interact with web content using Spider Cloud's advanced scraping infrastructure.
๐ Features
Core Tools
spider_scrape
- Advanced single-page scraping with JavaScript rendering and anti-bot bypassspider_crawl
- Intelligent website crawling with depth control and filteringspider_search
- Google-like web search with content fetching capabilitiesspider_links
- Comprehensive link extraction and analysisspider_screenshot
- High-quality webpage screenshots with customizationspider_transform
- HTML to markdown/text conversion with readability processing
Advanced Capabilities
- ๐ก๏ธ Anti-bot Detection Bypass - Stealth mode and advanced evasion techniques
- ๐ Premium Proxy Support - Geographic targeting with country-specific proxies
- ๐ญ JavaScript Rendering - Full browser emulation for dynamic content
- ๐ Metadata Extraction - Comprehensive page metadata and analytics
- ๐ CSS Selectors - Precise content targeting and extraction
- ๐พ Cloud Storage - Optional data persistence in Spider Cloud
- โก High Performance - Optimized for speed with configurable timeouts
- ๐ Secure Authentication - Bearer token authentication with API key
- ๐ Cost Tracking - Real-time API usage cost monitoring
- ๐ Debug Mode - Comprehensive logging for troubleshooting
๐ Prerequisites
- Node.js 18 or higher
- Spider Cloud API key (Get one free at spider.cloud)
- MCP-compatible client (Claude Desktop, Claude Code, Cursor, etc.)
๐ Quick Start
Option 1: Install from npm (Recommended)
# Global installation
npm install -g @willbohn/spider-mcp
# Or use with npx (no installation needed)
npx @willbohn/spider-mcp
Option 2: Clone from GitHub
Windows:
# Clone and install
git clone https://github.com/willbohn/spider-mcp.git
cd spider-mcp
# Run the Windows installer (PowerShell)
.\install-windows.ps1
# Or use the batch file (Command Prompt)
install-windows.bat
# Test the installation
$env:SPIDER_API_KEY="your_key"
node test.js
macOS/Linux:
# Clone and install
git clone https://github.com/willbohn/spider-mcp.git
cd spider-mcp
./install-local.sh
# Or manually:
npm install
npm link
# Test the installation
SPIDER_API_KEY=your_key node test.js
Option 3: Direct Path Configuration
Skip installation and point directly to the built files in your MCP client configuration.
โ๏ธ Configuration
Platform-Specific Setup Instructions
๐ช Windows Users
Claude Desktop (Windows)
-
Find your configuration file:
- Press
Win + R
, type%APPDATA%\Claude
and press Enter - Open
claude_desktop_config.json
(create it if it doesn't exist)
- Press
-
Add the Spider MCP configuration:
{
"mcpServers": {
"spider": {
"command": "npx",
"args": ["@willbohn/spider-mcp"],
"env": {
"SPIDER_API_KEY": "your_spider_api_key_here"
}
}
}
}
- Alternative: Using direct path (if npm doesn't work):
{
"mcpServers": {
"spider": {
"command": "node",
"args": ["C:\\Users\\YourName\\spider-mcp\\dist\\index.js"],
"env": {
"SPIDER_API_KEY": "your_spider_api_key_here"
}
}
}
}
Note: On Windows, use double backslashes (\\
) in paths or forward slashes (/
).
Testing on Windows
# PowerShell
$env:SPIDER_API_KEY="your_key"
node test.js
# Command Prompt
set SPIDER_API_KEY=your_key
node test.js
๐ macOS Users
Claude Desktop (macOS)
-
Find your configuration file:
open ~/Library/Application\ Support/Claude/
Open
claude_desktop_config.json
(create it if it doesn't exist) -
Add the Spider MCP configuration:
{
"mcpServers": {
"spider": {
"command": "npx",
"args": ["@willbohn/spider-mcp"],
"env": {
"SPIDER_API_KEY": "your_spider_api_key_here"
}
}
}
}
Testing on macOS
export SPIDER_API_KEY="your_key"
node test.js
๐ง Linux Users
Claude Desktop (Linux)
-
Find your configuration file:
# Location varies by distribution, commonly: ~/.config/Claude/claude_desktop_config.json # or ~/.claude/claude_desktop_config.json
-
Add the Spider MCP configuration:
{
"mcpServers": {
"spider": {
"command": "npx",
"args": ["@willbohn/spider-mcp"],
"env": {
"SPIDER_API_KEY": "your_spider_api_key_here"
}
}
}
}
Testing on Linux
export SPIDER_API_KEY="your_key"
node test.js
Other MCP Clients
Claude Code Configuration
Claude Code automatically detects MCP servers. Simply:
-
Install the package globally:
npm install -g @willbohn/spider-mcp
-
Set your API key:
- Windows (PowerShell):
$env:SPIDER_API_KEY="your_key"
- Windows (CMD):
set SPIDER_API_KEY=your_key
- macOS/Linux:
export SPIDER_API_KEY="your_key"
- Windows (PowerShell):
-
The server will be available in Claude Code
Cursor IDE Configuration
Add to your Cursor settings:
{
"mcp.servers": {
"spider": {
"command": "npx",
"args": ["@willbohn/spider-mcp"],
"env": {
"SPIDER_API_KEY": "your_spider_api_key_here"
}
}
}
}
VS Code with Continue Extension
Add to your Continue configuration:
{
"mcpServers": [
{
"name": "spider",
"command": "npx",
"args": ["@willbohn/spider-mcp"],
"env": {
"SPIDER_API_KEY": "your_spider_api_key_here"
}
}
]
}
Environment Variables
Variable | Required | Description | Default |
---|---|---|---|
SPIDER_API_KEY | Yes | Your Spider Cloud API key | - |
SPIDER_API_BASE_URL | No | API endpoint URL | https://api.spider.cloud |
SPIDER_REQUEST_TIMEOUT | No | Request timeout in milliseconds | 60000 |
DEBUG | No | Enable debug logging | false |
๐ ๏ธ Tool Documentation
spider_scrape
Scrape content from a single URL with advanced options.
Parameters:
url
(required): Target URL to scrapereturn_format
: Output format (markdown
,raw
,text
,html
,screenshot
,links
)js
: Enable JavaScript renderingwait_for
: Wait time for page load (0-60000ms)css_selector
: CSS selector for specific contentproxy_enabled
: Use premium proxyproxy_country
: Two-letter country codestealth
: Enable stealth modeanti_bot
: Advanced anti-bot bypassheaders
: Custom HTTP headerscookies
: Cookie stringmetadata
: Include metadataclean_html
: Clean and sanitize HTMLmedia
: Include media elements
Example:
{
"url": "https://example.com",
"return_format": "markdown",
"js": true,
"stealth": true,
"css_selector": ".main-content"
}
spider_crawl
Crawl an entire website with intelligent navigation.
Parameters:
url
(required): Starting URLlimit
: Max pages to crawl (1-10000)depth
: Max crawl depth (0-10)return_format
: Output formatwhitelist
: URL patterns to includeblacklist
: URL patterns to excludebudget
: Crawl budget configurationsubdomains
: Include subdomainssitemap
: Use sitemap.xmlrespect_robots
: Respect robots.txt- Plus all proxy and rendering options from scrape
Example:
{
"url": "https://docs.example.com",
"limit": 50,
"depth": 3,
"whitelist": ["*/api/*"],
"return_format": "markdown"
}
spider_search
Search the web with Google-like results.
Parameters:
query
(required): Search querysearch_limit
: Max results (1-100)fetch_page_content
: Fetch full contenttbs
: Time-based search (qdr:d
,qdr:w
,qdr:m
,qdr:y
)gl
: Country code (e.g.,us
,uk
)hl
: Language code (e.g.,en
,es
)safe
: SafeSearch level (off
,medium
,high
)- Plus content fetching options
Example:
{
"query": "artificial intelligence news",
"search_limit": 10,
"tbs": "qdr:w",
"gl": "us",
"fetch_page_content": true
}
spider_links
Extract and analyze links from a webpage.
Parameters:
url
(required): Target URLlimit
: Max links (1-5000)depth
: Extraction depth (0-5)unique
: Return only unique linkssubdomains
: Include subdomain linksexternal
: Include external links- Plus standard options
spider_screenshot
Capture webpage screenshots.
Parameters:
url
(required): Target URLfullpage
: Full page screenshotviewport_width
: Width in pixels (320-3840)viewport_height
: Height in pixels (240-2160)format
: Image format (png
,jpeg
,webp
)quality
: JPEG/WebP quality (0-100)omit_background
: Transparent background (PNG only)clip
: Region to capture
spider_transform
Transform HTML to clean, readable formats.
Parameters:
data
(required): HTML/text to transformreturn_format
(required): Target format (markdown
,text
,raw
,clean_html
)readability
: Apply readability processingclean
: Remove unnecessary elementsinclude_links
: Include hyperlinksinclude_images
: Include images
๐งช Testing
Run the comprehensive test suite:
Windows
# PowerShell
$env:SPIDER_API_KEY="your_api_key_here"
node test.js
# With debug output
$env:DEBUG="true"
$env:SPIDER_API_KEY="your_api_key_here"
node test.js
# Command Prompt
set SPIDER_API_KEY=your_api_key_here
node test.js
macOS/Linux
# Set your API key
export SPIDER_API_KEY=your_api_key_here
# Run tests
node test.js
# With debug output
DEBUG=true SPIDER_API_KEY=your_api_key_here node test.js
Test Suites
# Quick smoke tests
npm run test:quick
# Full comprehensive suite (100+ tests)
npm run test:full
# LinkedIn-specific tests
npm run test:linkedin
# Run specific category
npm run test:category -- --category scraping
๐ API Response Format
All tools return responses in a consistent format:
{
"success": true,
"results": [...],
"count": 10,
"costs": {
"total_cost": 0.00012,
"compute_cost": 0.00008,
"bandwidth_cost": 0.00004
},
"metadata": {
"duration": 1234,
"status": 200
}
}
๐ง Development
Building from Source
npm install
npm run build
Running in Development Mode
npm run dev
Project Structure
spider-mcp/
โโโ src/
โ โโโ index.ts # Main server implementation
โโโ dist/ # Compiled JavaScript
โโโ examples/ # Configuration examples
โโโ package.json # Dependencies and scripts
โโโ tsconfig.json # TypeScript configuration
โโโ README.md # This file
๐ Troubleshooting
Common Issues
"SPIDER_API_KEY environment variable is required"
- Ensure your API key is set in the environment or configuration
- Check the key is valid at spider.cloud
"Payment required" error
- Your API key needs credits
- Add credits at spider.cloud
"Rate limit exceeded"
- You've hit the API rate limit
- Wait a few minutes or upgrade your plan
Search tool timeout
- Search operations can take 15-30 seconds
- This is normal behavior for comprehensive searches
Debug Mode
Enable detailed logging:
Windows (PowerShell):
$env:DEBUG="true"
$env:SPIDER_API_KEY="your_key"
node dist/index.js
Windows (Command Prompt):
set DEBUG=true
set SPIDER_API_KEY=your_key
node dist/index.js
macOS/Linux:
DEBUG=true SPIDER_API_KEY=your_key node dist/index.js
๐ Error Handling
The server provides detailed error messages:
- 401: Invalid API key
- 402: Payment required (add credits)
- 429: Rate limit exceeded
- 500+: Server errors (contact support)
๐ Security
- API keys are never logged or stored
- All requests use HTTPS
- Bearer token authentication
- Input validation on all parameters
- Sanitized error messages
๐ Performance
- Configurable timeouts (default: 60s)
- Automatic retry logic for transient failures
- Connection pooling for efficiency
- Response caching at API level
- Optimized for concurrent requests
๐ค Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
๐ License
MIT License - see file for details
๐ Resources
๐ฌ Support
- MCP Server Issues: GitHub Issues
- Spider API Support: spider.cloud/support
- API Status: status.spider.cloud
Built with โค๏ธ for the MCP ecosystem