liliang-cn/mcp-websearch-server
If you are the rightful owner of mcp-websearch-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Web Search Server is a Model Context Protocol server that offers multi-engine web search capabilities with content extraction.
MCP Web Search Server
A Model Context Protocol (MCP) server that provides multi-engine web search capabilities with intelligent content extraction using a hybrid approach.
Features
- 🔍 Hybrid Search Engine: Fast goquery-based search results + intelligent chromedp content extraction
- 🌐 Multi-Engine Support: Bing, Brave, and DuckDuckGo with smart fallback mechanisms
- 📄 Intelligent Content Extraction: Advanced article parsing with multiple content selectors
- 🚀 Concurrent Processing: Parallel content extraction with rate limiting
- 🤖 AI-Ready Summaries: Aggregated content optimized for AI analysis and summarization
- 🛠️ MCP Protocol: Full compliance with Model Context Protocol specification
Installation
Via go install
go install github.com/liliang-cn/mcp-websearch-server@latest
From Source
git clone https://github.com/liliang-cn/mcp-websearch-server
cd mcp-websearch-server
go build -o mcp-websearch-server
Usage
Standalone
# Show help
mcp-websearch-server --help
# Run the server (stdio mode)
mcp-websearch-server
Integration with Claude Desktop
Add to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"websearch": {
"command": "mcp-websearch-server"
}
}
}
If installed via go install, make sure ~/go/bin is in your PATH.
Available Tools
🔍 websearch_basic
Basic web search returning titles, URLs and snippets from a single search engine using the hybrid approach.
Parameters:
query(string, required): The search querymax_results(int, optional): Maximum results to return (default: 10)
📄 websearch_with_content
Web search with intelligent content extraction from result pages using chromedp.
Parameters:
query(string, required): The search querymax_results(int, optional): Maximum results to return (default: 5)extract_content(bool, optional): Extract full page content (default: true)
🚀 websearch_multi_engine
Comprehensive search across multiple engines (Bing, Brave, DuckDuckGo) with content extraction.
Parameters:
query(string, required): The search querymax_results(int, optional): Maximum results to return (default: 3)engines(array, optional): Search engines to use ["bing", "brave", "duckduckgo"] (default: all)
🤖 websearch_ai_summary
Search and return AI-ready aggregated content optimized for analysis and summarization.
Parameters:
query(string, required): The search querymax_results(int, optional): Maximum results to return (default: 3)
Returns: Formatted markdown content with proper structure for AI processing.
Architecture
mcp-websearch-server/
├── main.go # Entry point with CLI flags
├── mcp/ # MCP protocol implementation
│ └── server.go # MCP server and tool registration
├── search/ # Search engine implementations
│ ├── interface.go # Common interfaces
│ ├── hybrid_searcher.go # Hybrid multi-engine searcher
│ ├── multi_engine.go # Basic multi-engine orchestration
│ ├── bing_goquery.go # Fast Bing search with goquery
│ ├── brave_goquery.go # Fast Brave search with goquery
│ ├── duckduckgo_goquery.go # Fast DuckDuckGo search with goquery
│ ├── bing.go # Original Bing search (chromedp)
│ ├── brave.go # Original Brave search (chromedp)
│ └── duckduckgo.go # Original DuckDuckGo search (chromedp)
├── extraction/ # Content extraction
│ ├── hybrid_extractor.go # Intelligent chromedp-based extraction
│ └── chromedp.go # Basic browser-based extraction
├── examples/ # Demo applications
│ ├── basic_search_demo/ # Basic search functionality demo
│ ├── hybrid_search_demo/ # Hybrid search with content extraction
│ └── mcp_tools_demo/ # MCP server tools demonstration
└── utils/ # Utilities
└── retry.go # Retry logic with backoff
Hybrid Approach
The server uses a sophisticated hybrid approach for optimal performance:
1. Fast Search Results (goquery)
- Bing: Scrapes
www.bing.com/searchwith proper CSS selectors - Brave: Scrapes
search.brave.com/searchfor results - DuckDuckGo: Scrapes
duckduckgo.comwith lite interface - Benefits: Fast response times, reliable result parsing
2. Intelligent Content Extraction (chromedp)
- Article Detection: Uses advanced selectors to find main content
- Content Cleaning: Removes scripts, styles, and navigation elements
- Fallback Strategy: Falls back to paragraph extraction if article content not found
- Benefits: High-quality content extraction, JavaScript handling
3. AI-Ready Aggregation
- Structured Output: Properly formatted markdown for AI processing
- Content Summarization: Truncates content intelligently at sentence boundaries
- Multi-Source: Combines content from multiple search engines
- Benefits: Optimized for AI analysis and summarization
Development
Prerequisites
- Go 1.21 or higher
- Chrome/Chromium browser (for content extraction)
Building
# Build the server
go build -o mcp-websearch-server
# Run tests
go test ./...
# Run tests with coverage
go test -cover ./...
# Format code
go fmt ./...
# Lint (requires golangci-lint)
golangci-lint run
Testing
The project includes comprehensive unit tests with 60%+ coverage:
# Run all tests
go test ./...
# Run with verbose output
go test -v ./...
# Generate coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out
Example Applications
# Test basic search functionality
go run ./examples/basic_search_demo/main.go
# Test hybrid search with content extraction
go run ./examples/hybrid_search_demo/main.go
# Test MCP server tools
go run ./examples/mcp_tools_demo/main.go
How It Works
- Search Request: Receives search query via MCP protocol
- Engine Selection: Uses goquery-based engines for fast results
- Search Execution: Performs HTTP-based search with proper headers
- Content Extraction: Uses chromedp for intelligent content extraction
- Aggregation: Combines and formats content for AI analysis
- Response: Returns structured results via MCP protocol
Search Engine Priority
The hybrid searcher prioritizes engines in this order:
- DuckDuckGo - Primary engine (privacy-focused)
- Bing - First fallback (comprehensive results)
- Brave - Second fallback (independent search)
If one engine fails, the server automatically tries the next available engine.
Error Handling
- Implements retry logic with exponential backoff
- Graceful fallback to alternative search engines
- Structured error messages via MCP protocol
- Timeout handling for long-running operations
- Rate limiting for content extraction
Performance
- Search Speed: ~200-500ms per search using goquery
- Content Extraction: ~2-5s per page using chromedp
- Concurrent Extraction: Limited to 2-3 simultaneous browser instances
- Memory Usage: Optimized with proper context cleanup
Dependencies
- MCP Go SDK: Model Context Protocol implementation
- chromedp: Browser automation for content extraction
- goquery: Fast HTML parsing and scraping
- Standard Library: HTTP client, context, sync primitives
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details
Acknowledgments
- Built with MCP Go SDK
- Uses chromedp for browser automation
- Uses goquery for HTML parsing
- Implements Model Context Protocol specification