IN-PUN-COAONE-AUTOMATNSA/python-mcp-websearch-server
If you are the rightful owner of python-mcp-websearch-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Python MCP WebSearch Server is a robust server that leverages the Model Context Protocol to provide advanced web search and research capabilities, supporting multiple search providers.
Python MCP WebSearch Server
A powerful Model Context Protocol (MCP) server that provides web search and deep research capabilities with support for multiple search providers.
🌟 Features
- Multiple Search Providers: DuckDuckGo (primary) and Google search support
- Deep Research: Iterative search with content extraction and analysis
- Content Extraction: Automatic web page content extraction using Trafilatura
- Async Architecture: High-performance asynchronous operations
- MCP Compliant: Full compatibility with the Model Context Protocol specification
- Production Ready: Tested and optimized for production use
� Quick Start
Installation
git clone https://github.com/IN-PUN-COAONE-AUTOMATNSA/python-mcp-websearch-server.git
cd python-mcp-websearch-server
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Usage
# Test the server
python test_server.py
# Run production demo
python demo.py
# Start the MCP server
python -m src.server
📚 Tools Reference
1. web_search
Perform a web search using specified provider.
Parameters:
query(string, required): The search queryprovider(string, optional): Search provider (duckduckgo,google, default:duckduckgo)max_results(integer, optional): Maximum results (1-100, default: 10)
Example:
{
"tool": "web_search",
"arguments": {
"query": "artificial intelligence trends 2024",
"provider": "duckduckgo",
"max_results": 5
}
}
2. deep_research
Perform comprehensive research on a topic with content extraction.
Parameters:
topic(string, required): The topic to researchdepth(integer, optional): Research depth 1-5 (default: 3)max_sources(integer, optional): Maximum sources to analyze (5-50, default: 20)provider(string, optional): Search provider (default:duckduckgo)
Example:
{
"tool": "deep_research",
"arguments": {
"topic": "sustainable energy solutions",
"depth": 4,
"max_sources": 15
}
}
🤖 Integration with AI Agents
Claude Desktop Integration
Add to your Claude Desktop configuration:
{
"mcpServers": {
"websearch": {
"command": "python",
"args": ["-m", "src.server"],
"cwd": "/absolute/path/to/python-mcp-websearch-server"
}
}
}
Continue.dev Integration
Add to your Continue configuration:
{
"mcpServers": [
{
"name": "websearch",
"command": ["python", "-m", "src.server"],
"cwd": "/path/to/python-mcp-websearch-server"
}
]
}
🔧 Configuration
Environment Variables
LOG_LEVEL: Logging level (DEBUG,INFO,WARNING,ERROR, default:INFO)SEARCH_PROVIDER: Default search provider (duckduckgo,google, default:duckduckgo)MAX_CONCURRENT_REQUESTS: Maximum concurrent search requests (default: 5)REQUEST_TIMEOUT: Request timeout in seconds (default: 30.0)
🚀 Docker Deployment
# Build and run with Docker
docker build -t mcp-websearch-server .
docker run -d --name mcp-websearch mcp-websearch-server
📁 Project Structure
├── src/
│ ├── server.py # Main MCP server
│ ├── models/ # Pydantic data models
│ │ └── search.py
│ ├── providers/ # Search provider implementations
│ │ ├── base.py
│ │ ├── duckduckgo.py
│ │ └── google.py
│ ├── tools/ # MCP tools
│ │ ├── search.py
│ │ └── deep_research.py
│ └── utils/ # Utilities
│ └── web_scraper.py
├── test_server.py # Main test suite
├── demo.py # Production demo
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
└── README.md # This file
🧪 Testing
Run the comprehensive test suite:
python test_server.py
This will test:
- Google and DuckDuckGo search providers
- Error handling and validation
- Provider availability and performance
- End-to-end functionality
📊 Performance Notes
| Provider | Avg Response Time | Success Rate | Notes |
|---|---|---|---|
| DuckDuckGo | 1.0-3.0s | 99% | Primary provider, highly reliable |
| 0.5-0.9s | 70%* | May be blocked by anti-bot measures |
*Google success rate varies due to anti-bot measures during automated testing
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- Built on the Model Context Protocol
- Search functionality powered by DuckDuckGo and Google
- Content extraction using Trafilatura and BeautifulSoup
- Designed for seamless AI agent integration
Ready for Production Use 🚀 | MCP Compliant ✅ | High Performance ⚡
1. Test the Server
# Test all functionality
python test_server.py
# Run production demo
python demo.py
# Run the MCP server
python -m src.server
2. Basic Usage Example
import asyncio
from src.tools.search import SearchTool
async def example():
search_tool = SearchTool()
# Perform web search
result = await search_tool.search(
query="Python programming tutorial",
provider="duckduckgo",
max_results=5
)
print(result) # JSON formatted results
asyncio.run(example())
📚 Tools Reference
1. web_search
Perform a web search using specified provider.
Parameters:
query(string, required): The search queryprovider(string, optional): Search provider (duckduckgo,google, default:duckduckgo)max_results(integer, optional): Maximum results (1-100, default: 10)
Example:
{
"tool": "web_search",
"arguments": {
"query": "artificial intelligence trends 2024",
"provider": "duckduckgo",
"max_results": 5
}
}
Response Format:
{
"query": "artificial intelligence trends 2024",
"provider": "duckduckgo",
"total_results": 5,
"search_time": 0.923,
"timestamp": "2024-01-15T10:30:00Z",
"results": [
{
"rank": 1,
"title": "AI Trends 2024: What to Expect",
"url": "https://example.com/ai-trends-2024",
"snippet": "The latest trends in artificial intelligence for 2024...",
"timestamp": "2024-01-15T10:30:00Z"
}
]
}
2. deep_research
Perform comprehensive research on a topic with content extraction.
Parameters:
topic(string, required): The topic to researchdepth(integer, optional): Research depth 1-5 (default: 3)max_sources(integer, optional): Maximum sources to analyze (5-50, default: 20)provider(string, optional): Search provider (default:duckduckgo)
Example:
{
"tool": "deep_research",
"arguments": {
"topic": "sustainable energy solutions",
"depth": 4,
"max_sources": 15,
"provider": "duckduckgo"
}
}
3. test_providers
Test all available search providers.
Parameters: None
Example:
{
"tool": "test_providers",
"arguments": {}
}
🤖 Integration with AI Agents
Claude Desktop Integration
Add to your Claude Desktop configuration (%APPDATA%\Claude\claude_desktop_config.json on Windows or ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"websearch": {
"command": "python",
"args": ["-m", "src.server"],
"cwd": "/absolute/path/to/python-mcp-websearch-server",
"env": {
"LOG_LEVEL": "INFO"
}
}
}
}
Continue.dev Integration
Add to your Continue configuration (.continue/config.json):
{
"mcpServers": [
{
"name": "websearch",
"command": ["python", "-m", "src.server"],
"cwd": "/path/to/python-mcp-websearch-server"
}
]
}
Codeium Integration
{
"mcp": {
"servers": {
"websearch": {
"command": "python -m src.server",
"working_directory": "/path/to/python-mcp-websearch-server"
}
}
}
}
Custom MCP Client Integration
import asyncio
from mcp.client.stdio import stdio_client
import subprocess
class WebSearchMCPClient:
def __init__(self, server_path):
self.server_path = server_path
self.process = None
self.client = None
async def start(self):
self.process = subprocess.Popen([
"python", "-m", "src.server"
], cwd=self.server_path, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
self.client = stdio_client(self.process.stdin, self.process.stdout)
await self.client.__aenter__()
async def search(self, query, provider="duckduckgo", max_results=10):
result = await self.client.call_tool("web_search", {
"query": query,
"provider": provider,
"max_results": max_results
})
return result
async def research(self, topic, depth=3, max_sources=20):
result = await self.client.call_tool("deep_research", {
"topic": topic,
"depth": depth,
"max_sources": max_sources
})
return result
async def close(self):
if self.client:
await self.client.__aexit__(None, None, None)
if self.process:
self.process.terminate()
# Usage example
async def main():
client = WebSearchMCPClient("/path/to/python-mcp-websearch-server")
await client.start()
# Perform search
results = await client.search("Python machine learning libraries", "duckduckgo", 5)
print(results)
# Perform research
research = await client.research("quantum computing applications", depth=3, max_sources=10)
print(research)
await client.close()
asyncio.run(main())
🔧 Configuration
Environment Variables
LOG_LEVEL: Logging level (DEBUG,INFO,WARNING,ERROR, default:INFO)SEARCH_PROVIDER: Default search provider (duckduckgo,google, default:duckduckgo)MAX_CONCURRENT_REQUESTS: Maximum concurrent search requests (default: 5)REQUEST_TIMEOUT: Request timeout in seconds (default: 30.0)
Provider Configuration
# Custom configuration
config = {
"request_timeout": 45.0,
"max_results": 50,
"user_agent": "Custom Bot 1.0",
"max_concurrent_extractions": 10
}
search_tool = SearchTool(config)
Search Provider Notes
- DuckDuckGo: ✅ Primary provider - No API key required, highly reliable
- Google: ⚠️ Secondary provider - May be blocked by anti-bot measures
- Bing: 🚧 Future implementation - Will require API key
🛠️ Development
Setup Development Environment
# Clone the repo
git clone https://github.com/abhilashjaiswal0110/python-mcp-websearch-server.git
cd python-mcp-websearch-server
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run tests
python test_server.py
python demo.py
Adding New Search Providers
- Create a new provider class in
src/providers/:
from .base import BaseSearchProvider
from ..models.search import SearchResult, SearchResponse, SearchProvider
class NewSearchProvider(BaseSearchProvider):
def get_provider_name(self) -> SearchProvider:
return SearchProvider.NEW_PROVIDER
async def search(self, query: str, max_results: int = 10) -> SearchResponse:
# Implementation here
pass
- Add to
SearchProviderenum insrc/models/search.py - Register in
src/tools/search.py
🚀 Production Deployment
Docker Deployment
# Build image
docker build -t mcp-websearch-server .
# Run container
docker run -d --name mcp-websearch \
-e LOG_LEVEL=INFO \
-e SEARCH_PROVIDER=duckduckgo \
mcp-websearch-server
Systemd Service (Linux)
[Unit]
Description=MCP WebSearch Server
After=network.target
[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/python-mcp-websearch-server
ExecStart=/path/to/python-mcp-websearch-server/venv/bin/python -m src.server
Restart=always
Environment=LOG_LEVEL=INFO
Environment=SEARCH_PROVIDER=duckduckgo
[Install]
WantedBy=multi-user.target
📊 Performance Benchmarks
| Provider | Avg Response Time | Success Rate | Max Results | Notes |
|---|---|---|---|---|
| DuckDuckGo | 1.0-3.0s | 99% | 100 | Primary provider |
| 0.5-0.9s | 70%* | 100 | May be blocked |
*Google success rate varies due to anti-bot measures
✅ Production Readiness Checklist
- ✅ Core Functionality: Web search and deep research working
- ✅ Error Handling: Robust error handling and logging
- ✅ Type Safety: Full type hints and validation
- ✅ Async Performance: High-performance async operations
- ✅ Content Extraction: Multiple extraction methods
- ✅ Provider Redundancy: Multiple search providers
- ✅ MCP Compliance: Full MCP specification compliance
- ✅ Docker Support: Ready for containerized deployment
- ✅ Documentation: Comprehensive documentation and examples
- ✅ Testing: Comprehensive test suite
🤝 Contributing
We welcome contributions! Please see our for details.
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- Built on the Model Context Protocol
- Search functionality powered by DuckDuckGo and Google
- Content extraction using Trafilatura and BeautifulSoup
- Inspired by the need for reliable AI agent web search capabilities
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: abhilash.jaiswal@atos.net
Ready for Production Use 🚀 | MCP Compliant ✅ | High Performance ⚡