cryptonicsurfer/mcp-crawl4ai-server
If you are the rightful owner of mcp-crawl4ai-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
MCP Crawl4AI Server is a robust Model Context Protocol server designed to enhance AI agents with advanced web scraping and content extraction capabilities using Crawl4AI and Playwright.
MCP Crawl4AI Server
A powerful Model Context Protocol (MCP) server that provides AI agents with advanced web scraping and content extraction capabilities using Crawl4AI and Playwright.
🚀 Features
- 5 Powerful MCP Tools for comprehensive web scraping
- AI-Powered Content Extraction with Google Gemini and Anthropic Claude
- JavaScript Site Support with Playwright browser automation
- Structured Data Extraction using CSS selectors
- Batch Processing for multiple URLs
- Secure API Key Authentication
- Docker Deployment for easy VPS hosting
- Swedish Language Support and international content handling
🛠️ Available Tools
1. scrape_url
Basic web content extraction with markdown formatting
Parameters:
- url: Target website URL
- api_key: Authentication key
- wait_for: CSS selector to wait for (optional)
- css_selector: Specific content selector (optional)
- exclude_tags: HTML tags to exclude (optional)
2. scrape_with_css_extraction
Structured data extraction using CSS selectors
Parameters:
- url: Target website URL
- extraction_schema: Object mapping fields to CSS selectors
- api_key: Authentication key
- wait_for: Element to wait for (optional)
3. scrape_with_llm_extraction
AI-powered intelligent content analysis and extraction
Parameters:
- url: Target website URL
- extraction_prompt: Instructions for AI extraction
- api_key: Authentication key
- model: AI model (gemini-2.5-flash, claude-3-5-sonnet-20241022)
- provider: AI provider (google, anthropic)
- wait_for: Element to wait for (optional)
4. scrape_multiple_urls
Process multiple websites in batch (max 10 URLs)
Parameters:
- urls: Array of URLs to process
- api_key: Authentication key
- max_concurrent: Parallel processing limit (default: 3)
- css_selector: Selector for all URLs (optional)
5. get_server_status
Check server health, authentication status, and available tools
Parameters:
- api_key: Authentication key (optional)
📦 Quick Start
Option 1: Use Existing VPS Deployment
The server is already deployed and running! Just configure your AI agent:
Claude Desktop Configuration (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"crawl4ai-server": {
"command": "ssh",
"args": [
"pallefrej@46.246.38.24",
"cd ~/mcp-crawl4ai && docker exec -i mcp-crawl4ai-server python mcp_server.py"
],
"env": {
"MCP_API_KEY": "test-key-123"
}
}
}
}
Option 2: Local Installation
- Clone and Setup
git clone <this-repo>
cd mcp-crawl4ai
pip install -r requirements.txt
playwright install chromium
- Configure Environment
cp .env.example .env
# Edit .env with your API keys
- Run Server
python mcp_server.py
- Configure AI Agent
{
"mcpServers": {
"crawl4ai-local": {
"command": "python",
"args": ["/path/to/mcp-crawl4ai/mcp_server.py"],
"env": {
"MCP_API_KEY": "test-key-123"
}
}
}
}
Option 3: Docker Deployment
- Build and Run
docker-compose up -d --build
- Configure AI Agent
{
"mcpServers": {
"crawl4ai-docker": {
"command": "docker",
"args": ["exec", "-i", "mcp-crawl4ai-server", "python", "mcp_server.py"],
"env": {
"MCP_API_KEY": "test-key-123"
}
}
}
}
🔐 Authentication
Default API Keys
test-key-123- Development and testingproduction-key-456- Production usage
Custom API Keys
Add your own keys to .env:
MCP_API_KEYS=key1,key2,key3
GOOGLE_API_KEY=your-google-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
💡 Usage Examples
Basic Web Scraping
"Scrape the content from https://example.com and show me the main article"
Structured Data Extraction
"Extract all product names, prices, and descriptions from this e-commerce page"
AI-Powered Analysis
"Analyze this Swedish business website and extract: company info, services offered, contact details, and key benefits in Swedish"
Competitive Research
"Scrape these 5 competitor websites and compare their pricing models"
Content Monitoring
"Extract the latest news headlines from this news site and summarize the top 3 stories"
🌍 Supported Sites
- ✅ Static HTML Sites - Standard websites
- ✅ JavaScript/React Sites - GitHub, Reddit, modern SPAs
- ✅ Swedish Content - Euromaster.se, Swedish business sites
- ✅ E-commerce Sites - Product catalogs, pricing pages
- ✅ News Sites - Article extraction, headline monitoring
- ✅ Business Sites - Company info, service descriptions
🎯 AI Integration
Google Gemini Models
- gemini-2.5-flash - Fast, cost-effective extraction
- Best for: Quick summaries, basic data extraction
Anthropic Claude Models
- claude-3-5-sonnet-20241022 - Advanced reasoning and analysis
- Best for: Complex analysis, structured data, nuanced content
Example AI Extraction
Tool: scrape_with_llm_extraction
URL: https://company.com/about
Prompt: "Extract company data in JSON format: {name, founded, employees, services, contact_email}"
Model: claude-3-5-sonnet-20241022
🐳 Docker Configuration
Environment Variables
# Authentication
MCP_API_KEYS=test-key-123,production-key-456
# AI Integration
GOOGLE_API_KEY=your-google-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
# Crawl4AI Settings
CRAWL4AI_CACHE_DIR=/tmp/crawl4ai_cache
CRAWL4AI_LOG_LEVEL=INFO
Volume Mounts
crawl4ai_cache:/tmp/crawl4ai_cache- Browser cache for performance
Container Management
# Start server
docker-compose up -d
# View logs
docker logs mcp-crawl4ai-server --follow
# Restart server
docker-compose restart
# Update and rebuild
docker-compose down && docker-compose up -d --build
📊 Performance & Limits
- Max URLs per batch: 10 URLs
- Default concurrency: 3 parallel requests
- Browser pooling: Shared Chromium instance for efficiency
- Cache system: Persistent storage for improved performance
- Memory usage: ~200-500MB depending on content complexity
🔧 Troubleshooting
Server Not Responding
# Check container status
docker-compose ps
# View detailed logs
docker logs mcp-crawl4ai-server --tail 50
# Restart container
docker-compose restart
Authentication Issues
- Verify
MCP_API_KEYin your AI agent configuration - Check server logs for authentication attempts
- Ensure API key exists in
MCP_API_KEYSenvironment variable
Scraping Failures
- Some sites may block automated requests
- Try adding
wait_forparameter for dynamic content - Check if site requires specific user agents or headers
📁 Project Structure
mcp-crawl4ai/
├── mcp_server.py # Main MCP server
├── requirements.txt # Python dependencies
├── .env # Environment configuration
├── Dockerfile # Container definition
├── docker-compose.yml # Container orchestration
├── README.md # This file
├── MCP_CONFIGURATION.md # Detailed configuration guide
└── examples/ # Usage examples
🔄 Updates & Maintenance
Updating the Server
- Update code files
- Rebuild container:
docker-compose up -d --build - Test functionality with status check
Monitoring
- Container logs:
docker logs mcp-crawl4ai-server - Server status: Use
get_server_statustool - Resource usage:
docker stats mcp-crawl4ai-server
⚠️ Important Notes
- Respect robots.txt and website terms of service
- Rate limiting is built-in but be mindful of target sites
- JavaScript execution requires resources - monitor container memory
- API keys for LLM features are optional but enhance functionality
- Network access required for both scraping and AI API calls
📞 Support
Configuration Help
See for detailed setup instructions.
Common Issues
- MCP Server not detected: Check JSON syntax and restart AI client
- Permission denied: Verify SSH access and API keys
- Docker issues: Ensure Docker daemon is running
Debug Mode
Enable detailed logging:
CRAWL4AI_LOG_LEVEL=DEBUG
🎊 Success Indicators
When properly configured, you should see:
- 🟢 MCP server indicator in your AI client
- ✅ 5 tools available when asking "What tools do you have?"
- 🤖 Successful scraping of test websites
- 🔐 Authentication working with your API keys
📄 License
This project is designed for educational and development purposes. Ensure compliance with target websites' terms of service and applicable laws when scraping content.
Ready to supercharge your AI agent with web scraping capabilities? Start with the Quick Start guide above!