vishalkg/web-search
3.3
If you are the rightful owner of web-search and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A standalone Model Context Protocol (MCP) server that enables web search using multiple search engines with parallel execution and result deduplication.
Tools
2
Resources
0
Prompts
0
WebSearch MCP Server
High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system.
β¨ Features
- π Fast: Async implementation with parallel execution
- π Multi-Engine: Google, Bing, DuckDuckGo, Startpage, Brave Search
- π‘οΈ Intelligent Fallbacks: GoogleβStartpage, BingβDuckDuckGo, Brave (standalone)
- π Content Extraction: Clean text extraction from web pages
- πΎ Smart Caching: LRU cache with compression and deduplication
- π API Integration: Google Custom Search, Brave Search APIs with quota management
- β‘ Resilient: Automatic failover and comprehensive error handling
π¦ Installation
Production Use (Recommended)
# Create virtual environment
python -m venv ~/.websearch/venv
source ~/.websearch/venv/bin/activate
# Install from GitHub
pip install git+https://github.com/vishalkg/web-search.git
Development
git clone https://github.com/vishalkg/web-search.git
cd web-search
pip install -e .
βοΈ Configuration
Q CLI
# Add to Q CLI (after installation)
q mcp add --name websearch --command ~/.websearch/venv/bin/websearch-server
# Test
q chat "search for python tutorials"
Claude Desktop
Add to your MCP settings file:
claude mcp add websearch ~/.websearch/venv/bin/websearch-server -s user
ποΈ File Structure (Installation Independent)
The server automatically creates and manages files in a unified user directory:
~/.websearch/ # Single websearch directory
βββ venv/ # Virtual environment (recommended)
βββ config/
β βββ .env # Configuration file
βββ data/
β βββ search-metrics.jsonl # Search analytics
β βββ quota/ # API quota tracking
β βββ google_quota.json
β βββ brave_quota.json
βββ logs/
β βββ web-search.log # Application logs
βββ cache/ # Optional caching
Environment Variable Overrides
WEBSEARCH_HOME
: Base directory (default:~/.websearch
)WEBSEARCH_CONFIG_DIR
: Config directory overrideWEBSEARCH_LOG_DIR
: Log directory override
π§ Usage
The server provides two main tools with multiple search modes:
Search Web
# Standard 5-engine search (backward compatible)
search_web("quantum computing applications", num_results=10)
# New 3-engine fallback search (optimized)
search_web_fallback("machine learning tutorials", num_results=5)
Search Engines:
- Google Custom Search API (with Startpage fallback)
- Bing (with DuckDuckGo fallback)
- Brave Search API (standalone)
- DuckDuckGo (scraping)
- Startpage (scraping)
Fetch Page Content
# Extract clean text from URLs
fetch_page_content("https://example.com")
fetch_page_content(["https://site1.com", "https://site2.com"]) # Batch processing
ποΈ Architecture
websearch/
βββ core/
β βββ search.py # Sync search orchestration
β βββ async_search.py # Async search orchestration
β βββ fallback_search.py # 3-engine fallback system
β βββ async_fallback_search.py # Async fallback system
β βββ ranking.py # Quality-first result ranking
β βββ common.py # Shared utilities
βββ engines/
β βββ google_api.py # Google Custom Search API
β βββ brave_api.py # Brave Search API
β βββ bing.py # Bing scraping
β βββ duckduckgo.py # DuckDuckGo scraping
β βββ startpage.py # Startpage scraping
βββ utils/
β βββ unified_quota.py # Unified API quota management
β βββ deduplication.py # Result deduplication
β βββ advanced_cache.py # Enhanced caching system
β βββ http.py # HTTP utilities
βββ server.py # FastMCP server
π§ Advanced Configuration
Environment Variables
# API Configuration
export GOOGLE_CSE_API_KEY=your_google_api_key
export GOOGLE_CSE_ID=your_google_cse_id
export BRAVE_SEARCH_API_KEY=your_brave_api_key
# Quota Management (Optional)
export GOOGLE_DAILY_QUOTA=100 # Default: 100 requests/day
export BRAVE_MONTHLY_QUOTA=2000 # Default: 2000 requests/month
# Performance Tuning
export WEBSEARCH_CACHE_SIZE=1000
export WEBSEARCH_TIMEOUT=10
export WEBSEARCH_LOG_LEVEL=INFO
How to Get API Keys
Google Custom Search API
- API Key: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key"
- CSE ID: Go to https://cse.google.com/cse/ and follow prompts to create a search engine
Brave Search API
- Go to Brave Search API
- Sign up for a free account
- Go to your dashboard
- Copy the API key as
BRAVE_API_KEY
- Free tier: 2000 requests/month
Quota Management
- Unified System: Single quota manager for all APIs
- Google: Daily quota (default 100 requests/day)
- Brave: Monthly quota (default 2000 requests/month)
- Storage: Quota files stored in
~/.websearch/
directory - Auto-reset: Quotas automatically reset at period boundaries
- Fallback: Automatic fallback to scraping when quotas exhausted
Search Modes
- Standard Mode: Uses all 5 engines for maximum coverage
- Fallback Mode: Uses 3 engines with intelligent fallbacks for efficiency
- API-First Mode: Prioritizes API calls over scraping when keys available
π Troubleshooting
Issue | Solution |
---|---|
No results | Check internet connection and logs |
API quota exhausted | System automatically falls back to scraping |
Google API errors | Verify GOOGLE_CSE_API_KEY and GOOGLE_CSE_ID |
Brave API errors | Check BRAVE_SEARCH_API_KEY and quota status |
Permission denied | chmod +x start.sh |
Import errors | Ensure Python 3.12+ and dependencies installed |
Circular import warnings | Fixed in v2.0+ (10.00/10 pylint score) |
Debug Mode
# Enable detailed logging
export WEBSEARCH_LOG_LEVEL=DEBUG
python -m websearch.server
API Status Check
# Test API connectivity
cd debug/
python test_brave_api.py # Test Brave API
python test_fallback.py # Test fallback system
π Performance & Monitoring
Metrics
- Pylint Score: 10.00/10 (perfect code quality)
- Search Speed: ~2-3 seconds for 5-engine search
- Fallback Speed: ~1-2 seconds for 3-engine search
- Cache Hit Rate: ~85% for repeated queries
- API Quota Efficiency: Automatic fallback prevents service interruption
Monitoring
Logs are written to web-search.log
with structured format:
tail -f web-search.log | grep "search completed"
π Security
- No hardcoded secrets: All API keys via environment variables
- Clean git history: Secrets scrubbed from all commits
- Input validation: Comprehensive sanitization of search queries
- Rate limiting: Built-in quota management for API calls
- Secure defaults: HTTPS-only requests, timeout protection
π Performance Tips
- Use fallback mode for faster searches when you don't need maximum coverage
- Set API keys to reduce reliance on scraping (faster + more reliable)
- Enable caching for repeated queries (enabled by default)
- Tune batch sizes for content extraction based on your needs
π€ Contributing
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature
) - Run tests (
pytest
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/amazing-feature
) - Open Pull Request
π License
MIT License - see file for details.