python-mcp-websearch-server by IN-PUN-COAONE-AUTOMATNSA - MCP Server

Python MCP WebSearch Server

A powerful Model Context Protocol (MCP) server that provides web search and deep research capabilities with support for multiple search providers.

🌟 Features

Multiple Search Providers: DuckDuckGo (primary) and Google search support
Deep Research: Iterative search with content extraction and analysis
Content Extraction: Automatic web page content extraction using Trafilatura
Async Architecture: High-performance asynchronous operations
MCP Compliant: Full compatibility with the Model Context Protocol specification
Production Ready: Tested and optimized for production use

� Quick Start

Installation

git clone https://github.com/IN-PUN-COAONE-AUTOMATNSA/python-mcp-websearch-server.git
cd python-mcp-websearch-server
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Usage

# Test the server
python test_server.py

# Run production demo
python demo.py

# Start the MCP server
python -m src.server

📚 Tools Reference

1. `web_search`

Perform a web search using specified provider.

Parameters:

query (string, required): The search query
provider (string, optional): Search provider (duckduckgo, google, default: duckduckgo)
max_results (integer, optional): Maximum results (1-100, default: 10)

Example:

{
  "tool": "web_search",
  "arguments": {
    "query": "artificial intelligence trends 2024",
    "provider": "duckduckgo",
    "max_results": 5
  }
}

2. `deep_research`

Perform comprehensive research on a topic with content extraction.

Parameters:

topic (string, required): The topic to research
depth (integer, optional): Research depth 1-5 (default: 3)
max_sources (integer, optional): Maximum sources to analyze (5-50, default: 20)
provider (string, optional): Search provider (default: duckduckgo)

Example:

{
  "tool": "deep_research",
  "arguments": {
    "topic": "sustainable energy solutions",
    "depth": 4,
    "max_sources": 15
  }
}

🤖 Integration with AI Agents

Claude Desktop Integration

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "websearch": {
      "command": "python",
      "args": ["-m", "src.server"],
      "cwd": "/absolute/path/to/python-mcp-websearch-server"
    }
  }
}

Continue.dev Integration

Add to your Continue configuration:

{
  "mcpServers": [
    {
      "name": "websearch",
      "command": ["python", "-m", "src.server"],
      "cwd": "/path/to/python-mcp-websearch-server"
    }
  ]
}

🔧 Configuration

Environment Variables

LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR, default: INFO)
SEARCH_PROVIDER: Default search provider (duckduckgo, google, default: duckduckgo)
MAX_CONCURRENT_REQUESTS: Maximum concurrent search requests (default: 5)
REQUEST_TIMEOUT: Request timeout in seconds (default: 30.0)

🚀 Docker Deployment

# Build and run with Docker
docker build -t mcp-websearch-server .
docker run -d --name mcp-websearch mcp-websearch-server

📁 Project Structure

├── src/
│   ├── server.py              # Main MCP server
│   ├── models/                # Pydantic data models
│   │   └── search.py
│   ├── providers/             # Search provider implementations
│   │   ├── base.py
│   │   ├── duckduckgo.py
│   │   └── google.py
│   ├── tools/                 # MCP tools
│   │   ├── search.py
│   │   └── deep_research.py
│   └── utils/                 # Utilities
│       └── web_scraper.py
├── test_server.py             # Main test suite
├── demo.py                    # Production demo
├── requirements.txt           # Python dependencies
├── Dockerfile                 # Docker configuration
└── README.md                  # This file

🧪 Testing

Run the comprehensive test suite:

python test_server.py

This will test:

Google and DuckDuckGo search providers
Error handling and validation
Provider availability and performance
End-to-end functionality

📊 Performance Notes

Provider	Avg Response Time	Success Rate	Notes
DuckDuckGo	1.0-3.0s	99%	Primary provider, highly reliable
Google	0.5-0.9s	70%*	May be blocked by anti-bot measures

*Google success rate varies due to anti-bot measures during automated testing

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments

Built on the Model Context Protocol
Search functionality powered by DuckDuckGo and Google
Content extraction using Trafilatura and BeautifulSoup
Designed for seamless AI agent integration

Ready for Production Use 🚀 | MCP Compliant ✅ | High Performance ⚡

1. Test the Server

# Test all functionality
python test_server.py

# Run production demo
python demo.py

# Run the MCP server
python -m src.server

2. Basic Usage Example

import asyncio
from src.tools.search import SearchTool

async def example():
    search_tool = SearchTool()
    
    # Perform web search
    result = await search_tool.search(
        query="Python programming tutorial",
        provider="duckduckgo",
        max_results=5
    )
    
    print(result)  # JSON formatted results

asyncio.run(example())

📚 Tools Reference

1. `web_search`

Perform a web search using specified provider.

Parameters:

query (string, required): The search query
provider (string, optional): Search provider (duckduckgo, google, default: duckduckgo)
max_results (integer, optional): Maximum results (1-100, default: 10)

Example:

{
  "tool": "web_search",
  "arguments": {
    "query": "artificial intelligence trends 2024",
    "provider": "duckduckgo",
    "max_results": 5
  }
}

Response Format:

{
  "query": "artificial intelligence trends 2024",
  "provider": "duckduckgo",
  "total_results": 5,
  "search_time": 0.923,
  "timestamp": "2024-01-15T10:30:00Z",
  "results": [
    {
      "rank": 1,
      "title": "AI Trends 2024: What to Expect",
      "url": "https://example.com/ai-trends-2024",
      "snippet": "The latest trends in artificial intelligence for 2024...",
      "timestamp": "2024-01-15T10:30:00Z"
    }
  ]
}

2. `deep_research`

Perform comprehensive research on a topic with content extraction.

Parameters:

topic (string, required): The topic to research
depth (integer, optional): Research depth 1-5 (default: 3)
max_sources (integer, optional): Maximum sources to analyze (5-50, default: 20)
provider (string, optional): Search provider (default: duckduckgo)

Example:

{
  "tool": "deep_research",
  "arguments": {
    "topic": "sustainable energy solutions",
    "depth": 4,
    "max_sources": 15,
    "provider": "duckduckgo"
  }
}

3. `test_providers`

Test all available search providers.

Parameters: None

Example:

{
  "tool": "test_providers",
  "arguments": {}
}

🤖 Integration with AI Agents

Claude Desktop Integration

Add to your Claude Desktop configuration (%APPDATA%\Claude\claude_desktop_config.json on Windows or ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "websearch": {
      "command": "python",
      "args": ["-m", "src.server"],
      "cwd": "/absolute/path/to/python-mcp-websearch-server",
      "env": {
        "LOG_LEVEL": "INFO"
      }
    }
  }
}

Continue.dev Integration

Add to your Continue configuration (.continue/config.json):

{
  "mcpServers": [
    {
      "name": "websearch",
      "command": ["python", "-m", "src.server"],
      "cwd": "/path/to/python-mcp-websearch-server"
    }
  ]
}

Codeium Integration

{
  "mcp": {
    "servers": {
      "websearch": {
        "command": "python -m src.server",
        "working_directory": "/path/to/python-mcp-websearch-server"
      }
    }
  }
}

Custom MCP Client Integration

import asyncio
from mcp.client.stdio import stdio_client
import subprocess

class WebSearchMCPClient:
    def __init__(self, server_path):
        self.server_path = server_path
        self.process = None
        self.client = None
    
    async def start(self):
        self.process = subprocess.Popen([
            "python", "-m", "src.server"
        ], cwd=self.server_path, stdin=subprocess.PIPE, 
           stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        
        self.client = stdio_client(self.process.stdin, self.process.stdout)
        await self.client.__aenter__()
    
    async def search(self, query, provider="duckduckgo", max_results=10):
        result = await self.client.call_tool("web_search", {
            "query": query,
            "provider": provider,
            "max_results": max_results
        })
        return result
    
    async def research(self, topic, depth=3, max_sources=20):
        result = await self.client.call_tool("deep_research", {
            "topic": topic,
            "depth": depth,
            "max_sources": max_sources
        })
        return result
    
    async def close(self):
        if self.client:
            await self.client.__aexit__(None, None, None)
        if self.process:
            self.process.terminate()

# Usage example
async def main():
    client = WebSearchMCPClient("/path/to/python-mcp-websearch-server")
    await client.start()
    
    # Perform search
    results = await client.search("Python machine learning libraries", "duckduckgo", 5)
    print(results)
    
    # Perform research
    research = await client.research("quantum computing applications", depth=3, max_sources=10)
    print(research)
    
    await client.close()

asyncio.run(main())

🔧 Configuration

Environment Variables

LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR, default: INFO)
SEARCH_PROVIDER: Default search provider (duckduckgo, google, default: duckduckgo)
MAX_CONCURRENT_REQUESTS: Maximum concurrent search requests (default: 5)
REQUEST_TIMEOUT: Request timeout in seconds (default: 30.0)

Provider Configuration

# Custom configuration
config = {
    "request_timeout": 45.0,
    "max_results": 50,
    "user_agent": "Custom Bot 1.0",
    "max_concurrent_extractions": 10
}

search_tool = SearchTool(config)

Search Provider Notes

DuckDuckGo: ✅ Primary provider - No API key required, highly reliable
Google: ⚠️ Secondary provider - May be blocked by anti-bot measures
Bing: 🚧 Future implementation - Will require API key

🛠️ Development

Setup Development Environment

# Clone the repo
git clone https://github.com/abhilashjaiswal0110/python-mcp-websearch-server.git
cd python-mcp-websearch-server

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Run tests
python test_server.py
python demo.py

Adding New Search Providers

Create a new provider class in src/providers/:

from .base import BaseSearchProvider
from ..models.search import SearchResult, SearchResponse, SearchProvider

class NewSearchProvider(BaseSearchProvider):
    def get_provider_name(self) -> SearchProvider:
        return SearchProvider.NEW_PROVIDER
    
    async def search(self, query: str, max_results: int = 10) -> SearchResponse:
        # Implementation here
        pass

Add to SearchProvider enum in src/models/search.py
Register in src/tools/search.py

🚀 Production Deployment

Docker Deployment

# Build image
docker build -t mcp-websearch-server .

# Run container
docker run -d --name mcp-websearch \
  -e LOG_LEVEL=INFO \
  -e SEARCH_PROVIDER=duckduckgo \
  mcp-websearch-server

Systemd Service (Linux)

[Unit]
Description=MCP WebSearch Server
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/python-mcp-websearch-server
ExecStart=/path/to/python-mcp-websearch-server/venv/bin/python -m src.server
Restart=always
Environment=LOG_LEVEL=INFO
Environment=SEARCH_PROVIDER=duckduckgo

[Install]
WantedBy=multi-user.target

📊 Performance Benchmarks

Provider	Avg Response Time	Success Rate	Max Results	Notes
DuckDuckGo	1.0-3.0s	99%	100	Primary provider
Google	0.5-0.9s	70%*	100	May be blocked

*Google success rate varies due to anti-bot measures

✅ Production Readiness Checklist

✅ Core Functionality: Web search and deep research working
✅ Error Handling: Robust error handling and logging
✅ Type Safety: Full type hints and validation
✅ Async Performance: High-performance async operations
✅ Content Extraction: Multiple extraction methods
✅ Provider Redundancy: Multiple search providers
✅ MCP Compliance: Full MCP specification compliance
✅ Docker Support: Ready for containerized deployment
✅ Documentation: Comprehensive documentation and examples
✅ Testing: Comprehensive test suite

🤝 Contributing

We welcome contributions! Please see our for details.

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments

Built on the Model Context Protocol
Search functionality powered by DuckDuckGo and Google
Content extraction using Trafilatura and BeautifulSoup
Inspired by the need for reliable AI agent web search capabilities

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: abhilash.jaiswal@atos.net

Ready for Production Use 🚀 | MCP Compliant ✅ | High Performance ⚡

IN-PUN-COAONE-AUTOMATNSA/python-mcp-websearch-server