firecrawl-lite-mcp-server

ariangibson/firecrawl-lite-mcp-server

3.3

If you are the rightful owner of firecrawl-lite-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Firecrawl Lite MCP Server is a privacy-first, standalone server for web scraping and data extraction using local browser automation and your own LLM API key.

Tools
4
Resources
0
Prompts
0

Firecrawl Lite MCP Server

License: MIT

A privacy-first, standalone MCP server that provides web scraping and data extraction tools using local browser automation and your own LLM API key. No external dependencies or API keys required

🎯 What Makes Firecrawl Lite Special

🔒 Privacy-First Architecture

  • Local Processing - All web scraping and data extraction happens on your machine
  • Your Data Stays Local - Content is processed locally, not sent to third parties
  • No External Service Lock-in - Doesn't require a cloud API
  • Complete Control - You own your data and infrastructure

💰 Cost-Effective & Transparent

  • Pay Only for LLM Usage - No additional subscription or API fees
  • Your LLM Provider - Compatible with OpenAI, xAI, Anthropic, Ollama, etc.
  • Predictable Costs - Transparent pricing based on your chosen LLM rates

⚡ Performance & Simplicity

  • Lightning-Fast Startup - Lightweight design means quick initialization
  • Single Container - Simple deployment with Docker support
  • Minimal Resource Usage - Optimized for efficiency and low memory footprint

🛠️ Available Tools

scrape_page - Extract content from a single webpage

  • Use case: Get webpage content for LLMs to read
  • Parameters: url, onlyMainContent

batch_scrape - Scrape multiple URLs in a single request

  • Use case: Process multiple pages efficiently
  • Parameters: urls[], onlyMainContent

extract_data - Extract structured data using LLM

  • Use case: Pull specific data from pages using natural language prompts
  • Parameters: urls[], prompt, enableWebSearch

extract_with_schema - Extract data using JSON schema

  • Use case: Extract structured data with predefined schema
  • Parameters: urls[], schema, prompt, enableWebSearch

screenshot - Take a screenshot of a webpage

  • Use case: Capture visual representation of pages
  • Parameters: url, width, height, fullPage

🚀 Quick Start (Recommended)

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "firecrawl-lite": {
      "command": "npx",
      "args": ["-y", "@ariangibson/firecrawl-lite-mcp-server"],
      "env": {
        "LLM_API_KEY": "your_llm_api_key_here",
        "LLM_PROVIDER_BASE_URL": "https://api.x.ai/v1",
        "LLM_MODEL": "grok-code-fast-1"
      }
    }
  }
}

Claude Code (CLI)

claude mcp add firecrawl-lite npx -- -y @ariangibson/firecrawl-lite-mcp-server --env LLM_API_KEY=your_key --env LLM_PROVIDER_BASE_URL=https://api.x.ai/v1 --env LLM_MODEL=grok-code-fast-1

Cursor

Add to your Cursor MCP configuration:

{
  "mcpServers": {
    "firecrawl-lite": {
      "command": "npx",
      "args": ["-y", "@ariangibson/firecrawl-lite-mcp-server"],
      "env": {
        "LLM_API_KEY": "your_llm_api_key_here",
        "LLM_PROVIDER_BASE_URL": "https://api.x.ai/v1",
        "LLM_MODEL": "grok-code-fast-1"
      }
    }
  }
}

⚙️ Configuration

Required Environment Variables

# Your LLM API key (xAI, OpenAI, Anthropic, etc.)
LLM_API_KEY=your_api_key_here

# LLM provider base URL
LLM_PROVIDER_BASE_URL=https://api.x.ai/v1

# LLM model name
LLM_MODEL=grok-code-fast-1

LLM Provider Examples

# xAI (Grok)
LLM_PROVIDER_BASE_URL=https://api.x.ai/v1
LLM_API_KEY=xai-your-key-here
LLM_MODEL=grok-code-fast-1

# OpenAI
LLM_PROVIDER_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-your-key-here
LLM_MODEL=gpt-4o-mini

# Anthropic
LLM_PROVIDER_BASE_URL=https://api.anthropic.com
LLM_API_KEY=sk-ant-your-key-here
LLM_MODEL=claude-3-haiku-20240307

# Local LLM (Ollama)
LLM_PROVIDER_BASE_URL=http://localhost:11434/v1
LLM_API_KEY=your-local-key
LLM_MODEL=llama2

🌐 Remote Deployment

For remote servers or Docker deployments, be sure to enable at least one of the HTTP endpoints (depending on which transport protocol you are planning to use) - these are not enabled by default:

Docker

docker run -d \
  -p 3000:3000 \
  -e ENABLE_HTTP_STREAMABLE_ENDPOINT=true \
  -e ENABLE_SSE_ENDPOINT=true \
  -e LLM_API_KEY=your_key_here \
  -e LLM_PROVIDER_BASE_URL=https://api.x.ai/v1 \
  -e LLM_MODEL=grok-code-fast-1 \
  ariangibson/firecrawl-lite-mcp-server:latest

Claude Code (Remote)

claude mcp add firecrawl-lite-remote http://your-server:3000/mcp -t http

Claude Desktop (Remote)

Method 1: Connectors (Recommended - HTTPS only) The official Claude Desktop method for remote MCP servers:

  • Go to Claude Desktop → Settings → Connectors
  • Add connector: https://your-server.com:3000/mcp
  • Requires: HTTPS server with valid SSL certificate

Method 2: mcp-proxy (HTTP/HTTPS fallback) For servers without SSL certificates:

pip install mcp-proxy

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "firecrawl-lite": {
      "command": "mcp-proxy",
      "args": ["http://your-server:3000/sse"]
    }
  }
}

🛠️ Advanced Configuration

Proxy Support

PROXY_SERVER_URL=http://your-proxy-server.com:1337
PROXY_SERVER_USERNAME=your-username
PROXY_SERVER_PASSWORD=your-password

Anti-Detection

SCRAPE_USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.3"
SCRAPE_DELAY_MIN=1000
SCRAPE_DELAY_MAX=3000

Performance Tuning

SCRAPE_VIEWPORT_WIDTH=1920
SCRAPE_VIEWPORT_HEIGHT=1080
SCRAPE_BATCH_DELAY_MIN=2000
SCRAPE_BATCH_DELAY_MAX=5000

🛠️ Troubleshooting

Chrome Issues

Chrome is automatically installed on first use. If you encounter issues:

# Manual installation
npx puppeteer browsers install chrome

# Reset if corrupted
rm -rf ~/.cache/puppeteer && npx puppeteer browsers install chrome

Connection Issues

  • Verify internet connectivity
  • Check LLM provider URL accessibility
  • Ensure API keys are valid
  • For corporate networks, configure proxy settings

📊 Usage Examples

Scrape a webpage

{
  "name": "scrape_page",
  "arguments": {
    "url": "https://example.com"
  }
}

Batch scrape multiple URLs

{
  "name": "batch_scrape",
  "arguments": {
    "urls": ["https://example.com", "https://example.org"],
    "onlyMainContent": true
  }
}

Extract data with prompt

{
  "name": "extract_data",
  "arguments": {
    "urls": ["https://example.com"],
    "prompt": "Extract the main article title and summary"
  }
}

Extract with schema

{
  "name": "extract_with_schema",
  "arguments": {
    "urls": ["https://example.com"],
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "description": {"type": "string"}
      }
    }
  }
}

🐳 Container Registries

Pre-built images are available:

Docker Hub: ariangibson/firecrawl-lite-mcp-server:latest
GitHub Container Registry: ghcr.io/ariangibson/firecrawl-lite-mcp-server:latest

Both support multi-architecture (amd64, arm64) with automatic updates.

🙏 Credits & Acknowledgments

This project is inspired by the excellent work of the original Firecrawl projects:

🔥 Firecrawl

The original Firecrawl project by Mendable.ai - a comprehensive web scraping platform with advanced features.

🔥 Firecrawl MCP Server

The official MCP server implementation by the Firecrawl team.

We give huge thanks to the Firecrawl team for their pioneering work in web scraping and MCP integration! 🚀

💡 Looking for enterprise-grade web scraping?
Visit firecrawl.com for their cloud service with zero setup complexity.

📝 License

MIT License - see for details.