web-search

vishalkg/web-search

3.3

If you are the rightful owner of web-search and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A standalone Model Context Protocol (MCP) server that enables web search using multiple search engines with parallel execution and result deduplication.

Tools
2
Resources
0
Prompts
0

WebSearch MCP Server

Python 3.12+ License: MIT Pylint Score

High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system.

✨ Features

  • πŸš€ Fast: Async implementation with parallel execution
  • πŸ” Multi-Engine: Google, Bing, DuckDuckGo, Startpage, Brave Search
  • πŸ›‘οΈ Intelligent Fallbacks: Googleβ†’Startpage, Bingβ†’DuckDuckGo, Brave (standalone)
  • πŸ“„ Content Extraction: Clean text extraction from web pages
  • πŸ’Ύ Smart Caching: LRU cache with compression and deduplication
  • πŸ”‘ API Integration: Google Custom Search, Brave Search APIs with quota management
  • ⚑ Resilient: Automatic failover and comprehensive error handling

πŸ“¦ Installation

Production Use (Recommended)

# Create virtual environment
python -m venv ~/.websearch/venv
source ~/.websearch/venv/bin/activate

# Install from GitHub
pip install git+https://github.com/vishalkg/web-search.git

Development

git clone https://github.com/vishalkg/web-search.git
cd web-search
pip install -e .

βš™οΈ Configuration

Q CLI

# Add to Q CLI (after installation)
q mcp add --name websearch --command ~/.websearch/venv/bin/websearch-server

# Test
q chat "search for python tutorials"

Claude Desktop

Add to your MCP settings file:

claude mcp add websearch ~/.websearch/venv/bin/websearch-server -s user

πŸ—‚οΈ File Structure (Installation Independent)

The server automatically creates and manages files in a unified user directory:

~/.websearch/                 # Single websearch directory
β”œβ”€β”€ venv/                    # Virtual environment (recommended)
β”œβ”€β”€ config/
β”‚   └── .env                 # Configuration file
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ search-metrics.jsonl # Search analytics
β”‚   └── quota/              # API quota tracking
β”‚       β”œβ”€β”€ google_quota.json
β”‚       └── brave_quota.json
β”œβ”€β”€ logs/
β”‚   └── web-search.log      # Application logs
└── cache/                  # Optional caching

Environment Variable Overrides

  • WEBSEARCH_HOME: Base directory (default: ~/.websearch)
  • WEBSEARCH_CONFIG_DIR: Config directory override
  • WEBSEARCH_LOG_DIR: Log directory override

πŸ”§ Usage

The server provides two main tools with multiple search modes:

Search Web

# Standard 5-engine search (backward compatible)
search_web("quantum computing applications", num_results=10)

# New 3-engine fallback search (optimized)
search_web_fallback("machine learning tutorials", num_results=5)

Search Engines:

  • Google Custom Search API (with Startpage fallback)
  • Bing (with DuckDuckGo fallback)
  • Brave Search API (standalone)
  • DuckDuckGo (scraping)
  • Startpage (scraping)

Fetch Page Content

# Extract clean text from URLs
fetch_page_content("https://example.com")
fetch_page_content(["https://site1.com", "https://site2.com"])  # Batch processing

πŸ—οΈ Architecture

websearch/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ search.py              # Sync search orchestration
β”‚   β”œβ”€β”€ async_search.py        # Async search orchestration
β”‚   β”œβ”€β”€ fallback_search.py     # 3-engine fallback system
β”‚   β”œβ”€β”€ async_fallback_search.py # Async fallback system
β”‚   β”œβ”€β”€ ranking.py             # Quality-first result ranking
β”‚   └── common.py              # Shared utilities
β”œβ”€β”€ engines/
β”‚   β”œβ”€β”€ google_api.py          # Google Custom Search API
β”‚   β”œβ”€β”€ brave_api.py           # Brave Search API
β”‚   β”œβ”€β”€ bing.py                # Bing scraping
β”‚   β”œβ”€β”€ duckduckgo.py          # DuckDuckGo scraping
β”‚   └── startpage.py           # Startpage scraping
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ unified_quota.py       # Unified API quota management
β”‚   β”œβ”€β”€ deduplication.py       # Result deduplication
β”‚   β”œβ”€β”€ advanced_cache.py      # Enhanced caching system
β”‚   └── http.py                # HTTP utilities
└── server.py                  # FastMCP server

πŸ”§ Advanced Configuration

Environment Variables

# API Configuration
export GOOGLE_CSE_API_KEY=your_google_api_key
export GOOGLE_CSE_ID=your_google_cse_id
export BRAVE_SEARCH_API_KEY=your_brave_api_key

# Quota Management (Optional)
export GOOGLE_DAILY_QUOTA=100        # Default: 100 requests/day
export BRAVE_MONTHLY_QUOTA=2000      # Default: 2000 requests/month

# Performance Tuning
export WEBSEARCH_CACHE_SIZE=1000
export WEBSEARCH_TIMEOUT=10
export WEBSEARCH_LOG_LEVEL=INFO

How to Get API Keys

Google Custom Search API
  1. API Key: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key"
  2. CSE ID: Go to https://cse.google.com/cse/ and follow prompts to create a search engine
Brave Search API
  1. Go to Brave Search API
  2. Sign up for a free account
  3. Go to your dashboard
  4. Copy the API key as BRAVE_API_KEY
  5. Free tier: 2000 requests/month

Quota Management

  • Unified System: Single quota manager for all APIs
  • Google: Daily quota (default 100 requests/day)
  • Brave: Monthly quota (default 2000 requests/month)
  • Storage: Quota files stored in ~/.websearch/ directory
  • Auto-reset: Quotas automatically reset at period boundaries
  • Fallback: Automatic fallback to scraping when quotas exhausted

Search Modes

  • Standard Mode: Uses all 5 engines for maximum coverage
  • Fallback Mode: Uses 3 engines with intelligent fallbacks for efficiency
  • API-First Mode: Prioritizes API calls over scraping when keys available

πŸ› Troubleshooting

IssueSolution
No resultsCheck internet connection and logs
API quota exhaustedSystem automatically falls back to scraping
Google API errorsVerify GOOGLE_CSE_API_KEY and GOOGLE_CSE_ID
Brave API errorsCheck BRAVE_SEARCH_API_KEY and quota status
Permission deniedchmod +x start.sh
Import errorsEnsure Python 3.12+ and dependencies installed
Circular import warningsFixed in v2.0+ (10.00/10 pylint score)

Debug Mode

# Enable detailed logging
export WEBSEARCH_LOG_LEVEL=DEBUG
python -m websearch.server

API Status Check

# Test API connectivity
cd debug/
python test_brave_api.py      # Test Brave API
python test_fallback.py       # Test fallback system

πŸ“ˆ Performance & Monitoring

Metrics

  • Pylint Score: 10.00/10 (perfect code quality)
  • Search Speed: ~2-3 seconds for 5-engine search
  • Fallback Speed: ~1-2 seconds for 3-engine search
  • Cache Hit Rate: ~85% for repeated queries
  • API Quota Efficiency: Automatic fallback prevents service interruption

Monitoring

Logs are written to web-search.log with structured format:

tail -f web-search.log | grep "search completed"

πŸ”’ Security

  • No hardcoded secrets: All API keys via environment variables
  • Clean git history: Secrets scrubbed from all commits
  • Input validation: Comprehensive sanitization of search queries
  • Rate limiting: Built-in quota management for API calls
  • Secure defaults: HTTPS-only requests, timeout protection

πŸš€ Performance Tips

  1. Use fallback mode for faster searches when you don't need maximum coverage
  2. Set API keys to reduce reliance on scraping (faster + more reliable)
  3. Enable caching for repeated queries (enabled by default)
  4. Tune batch sizes for content extraction based on your needs

🀝 Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Run tests (pytest)
  4. Commit changes (git commit -m 'Add amazing feature')
  5. Push to branch (git push origin feature/amazing-feature)
  6. Open Pull Request

πŸ“„ License

MIT License - see file for details.