crawl4ai-mcp-server

CoachSteff/crawl4ai-mcp-server

3.1

If you are the rightful owner of crawl4ai-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

Crawl4AI MCP Server is a Model Context Protocol server that enables web crawling capabilities for MCP-enabled applications.

Tools
2
Resources
0
Prompts
0

Crawl4AI MCP Server

A Model Context Protocol (MCP) server that provides web crawling functionality via Crawl4AI. This server allows you to use Crawl4AI's powerful web crawling capabilities directly within MCP-enabled applications like Claude Desktop, BoltAI, and other MCP clients.

Features

  • 🌐 Web Crawling: Crawl any publicly accessible URL and extract content
  • 📄 Multiple Output Formats: Get content as Markdown, HTML, plain text, or structured JSON
  • 📊 Metadata Extraction: Extract page metadata including title, description, links, and more
  • 🔧 MCP Integration: Seamlessly integrates with any MCP-enabled application
  • 🐳 Flexible Deployment: Run locally, on a VPS, or in a Docker container
  • Async Performance: Built with Python asyncio for optimal performance

Prerequisites

  • Python 3.8+ (3.9+ recommended)
  • Crawl4AI (automatically installed as dependency)
  • Playwright browsers (automatically installed during setup)

Quick Start

1. Clone the repository

git clone https://github.com/CoachSteff/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server

2. Install dependencies

pip install -r requirements.txt

3. Install Playwright browsers

python -m playwright install chromium

4. Configure for MCP client

Copy the configuration template and update paths:

cp config.json.template config.json
# Edit config.json with your Python and server paths

5. Test the installation

python3 tests/test_wrapper.py

MCP Client Integration

Claude Desktop

Add this to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "crawl4ai": {
      "command": "/path/to/crawl4ai-mcp-server/crawl4ai-mcp-server",
      "args": [],
      "env": {}
    }
  }
}

Other MCP Clients

Use the wrapper script as your MCP server command:

  • Command: /path/to/crawl4ai-mcp-server/crawl4ai-mcp-server
  • Args: []
  • Env: {}

Available Tools

1. crawl_url

Crawl a single URL and extract content in your preferred format.

Parameters:

  • url (required): The URL to crawl
  • output_format (optional): Output format - markdown, html, text, or json (default: markdown)

Example Usage in Claude Desktop:

Please crawl https://example.com and extract the content as markdown.

2. get_page_metadata

Extract metadata from a webpage without downloading the full content.

Parameters:

  • url (required): The URL to get metadata from

Example Usage in Claude Desktop:

Get the metadata for https://example.com

Example Outputs

Markdown Output

# Example Page Title

This is the main content of the page...

## Section 1
Content here...

JSON Output

{
  "url": "https://example.com",
  "title": "Example Page Title",
  "markdown": "# Example Page Title\n\nThis is the main content...",
  "html": "<html>...</html>",
  "metadata": {
    "title": "Example Page Title",
    "description": "Page description",
    "language": "en"
  }
}

Metadata Output

{
  "url": "https://example.com",
  "title": "Example Page Title",
  "description": "Page description",
  "keywords": "example, test, page",
  "language": "en",
  "author": "Page Author",
  "links_count": 15,
  "media_count": 3,
  "word_count": 250,
  "status_code": 200
}

Advanced Usage

Running Examples

Try the included examples to see the server in action:

# Run basic usage examples
python3 examples/basic_usage.py

# Run advanced examples
python3 examples/advanced_usage.py

Docker Deployment

For containerized deployment:

# Build the Docker image
docker build -t crawl4ai-mcp-server .

# Run the container
docker run -p 3000:3000 crawl4ai-mcp-server

Configuration Options

The server can be configured through the config.json file:

{
  "python_path": "/usr/bin/python3",
  "server_path": "/path/to/server.py",
  "headless": true,
  "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
}

Troubleshooting

Common Issues

  1. Playwright browser not found

    python -m playwright install chromium
    
  2. Permission denied on wrapper script

    chmod +x crawl4ai-mcp-server
    
  3. Import errors

    pip install -r requirements.txt
    

Logs

Check the log file for debugging information:

tail -f crawl4ai-mcp.log

Development

Running Tests

# Run configuration tests
python3 tests/test_wrapper.py

# Run server tests (if available)
pytest tests/

Code Quality

The project uses several tools for code quality:

# Format code
black server.py

# Lint code
flake8 server.py

# Type checking
mypy server.py

Contributing

Contributions are welcome! Please see the file for guidelines on how to contribute to this project.

Code of Conduct

Please note that this project is released with a . By participating in this project you agree to abide by its terms.

License

This project is licensed under the MIT License - see the file for details.