UpperMoon0/Web-Scout-MCP-Server
If you are the rightful owner of Web-Scout-MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Web Scout MCP Server is a comprehensive Model Context Protocol server designed for web scraping, searching, and analysis, providing AI assistants with powerful web intelligence capabilities through a standardized MCP interface.
scrape_url
Scrape content from a website.
search_web
Search the web for information.
analyze_website
Perform comprehensive website analysis.
analyze_content
Analyze text content for insights.
search_domain
Search within a specific domain.
monitor_website
Monitor a website for changes.
Web Scout MCP Server
A comprehensive Model Context Protocol (MCP) server for web scraping, searching, and analysis. This server provides AI assistants with powerful web intelligence capabilities through a standardized MCP interface.
Features
š Web Scraping
- Multi-engine support: HTTP requests and JavaScript-enabled scraping with Selenium
- Content extraction: Text, links, images, and metadata
- Smart parsing: BeautifulSoup-powered HTML analysis
- Caching system: Efficient content caching and history tracking
š Web Search
- Multiple search APIs: Google Custom Search, Bing Search API
- Search types: Web, images, news, videos
- Domain-specific search: Search within specific websites
- Fallback system: Mock search when APIs are unavailable
š Website Analysis
- SEO Analysis: Title optimization, meta descriptions, heading structure
- Performance Metrics: Page size, resource counts, load estimations
- Accessibility Audit: Alt text, ARIA labels, form labels
- Security Assessment: HTTPS usage, security headers, mixed content
- Technology Detection: Frameworks, CMSs, analytics tools
š Content Analysis
- Text Processing: Summary generation, word counts, reading time
- Sentiment Analysis: Positive/negative/neutral classification
- Entity Extraction: People, organizations, dates, emails, URLs
- Keyword Extraction: Topic identification and frequency analysis
- Readability Scoring: Flesch Reading Ease calculation
š Website Monitoring
- Change Detection: Monitor websites for content changes
- Configurable Intervals: Custom check frequencies
- History Tracking: Maintain monitoring logs and statistics
Installation
Prerequisites
- Python 3.8 or higher
- Chrome/Chromium browser (for JavaScript scraping)
- ChromeDriver (automatically managed by Selenium)
Install Dependencies
pip install -r requirements.txt
Development Installation
pip install -e .
Configuration
Environment Variables
Create a .env
file or set environment variables:
# Required for enhanced search functionality (optional)
GOOGLE_API_KEY=your_google_api_key
GOOGLE_CSE_ID=your_custom_search_engine_id
BING_API_KEY=your_bing_search_api_key
# Server Configuration
WEB_SCOUT_USER_AGENT=Web-Scout-MCP/0.1.0
WEB_SCOUT_MAX_RETRIES=3
WEB_SCOUT_TIMEOUT=30
WEB_SCOUT_HEADLESS=true
WEB_SCOUT_CACHE_DIR=.web_scout_cache
WEB_SCOUT_LOG_LEVEL=INFO
WEB_SCOUT_ENV=production
API Keys Setup
Google Custom Search API
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the Custom Search API
- Create credentials (API key)
- Set up a Custom Search Engine at cse.google.com
Bing Search API
- Go to Azure Portal
- Create a Bing Search resource
- Get your API key from the resource
Usage
Running the MCP Server
python -m src.server
MCP Configuration
Add to your MCP settings file (e.g., claude_desktop_config.json
):
{
"mcpServers": {
"web-scout": {
"command": "python",
"args": ["-m", "src.server"],
"cwd": "/path/to/web-scout",
"env": {
"WEB_SCOUT_USER_AGENT": "Web-Scout-MCP/0.1.0",
"GOOGLE_API_KEY": "your_google_api_key",
"GOOGLE_CSE_ID": "your_cse_id",
"BING_API_KEY": "your_bing_api_key"
}
}
}
}
Available Tools
scrape_url
Scrape content from a website.
Parameters:
url
(string, required): The URL to scrapeuse_javascript
(boolean, optional): Use JavaScript renderingextract_links
(boolean, optional): Extract all linksextract_images
(boolean, optional): Extract all images
search_web
Search the web for information.
Parameters:
query
(string, required): Search querymax_results
(integer, optional): Maximum results to returnsearch_type
(string, optional): Type of search (web, images, news, videos)
analyze_website
Perform comprehensive website analysis.
Parameters:
url
(string, required): Website URL to analyzeinclude_seo
(boolean, optional): Include SEO analysisinclude_performance
(boolean, optional): Include performance analysisinclude_accessibility
(boolean, optional): Include accessibility analysisinclude_security
(boolean, optional): Include security analysis
analyze_content
Analyze text content for insights.
Parameters:
content
(string, required): Text content to analyzeinclude_sentiment
(boolean, optional): Include sentiment analysisinclude_entities
(boolean, optional): Include entity extractioninclude_keywords
(boolean, optional): Include keyword extraction
search_domain
Search within a specific domain.
Parameters:
domain
(string, required): Domain to search withinquery
(string, required): Search querymax_results
(integer, optional): Maximum results to return
monitor_website
Monitor a website for changes.
Parameters:
url
(string, required): URL to monitorcheck_interval
(integer, optional): Check interval in minutesnotify_changes
(boolean, optional): Whether to notify about changes
Available Resources
Static Resources
webscout://history/scraping
: Scraping operation historywebscout://cache/analysis
: Cached analysis results
Dynamic Resources
webscout://{domain}/analysis
: Analysis data for specific domainwebscout://{domain}/content
: Scraped content for specific domain
Architecture
src/
āāā __init__.py # Package initialization
āāā server.py # Main MCP server implementation
āāā config.py # Configuration management
āāā services/ # Core service modules
ā āāā __init__.py
ā āāā scraping_service.py # Web scraping functionality
ā āāā search_service.py # Search functionality
ā āāā analysis_service.py # Analysis functionality
āāā tools/ # MCP tool implementations
āāā __init__.py
āāā web_scout_tools.py # Tool handlers
Development
Project Structure
The project follows a modular architecture:
- Services: Core business logic for scraping, searching, and analysis
- Tools: MCP tool implementations that call services
- Server: MCP protocol handling and resource management
- Config: Environment-specific configuration management
Running Tests
pytest tests/
Code Formatting
black src/
flake8 src/
Type Checking
mypy src/
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Troubleshooting
Common Issues
- ChromeDriver not found: Install ChromeDriver or ensure it's in your PATH
- Permission errors: Check file permissions for cache directory
- API rate limits: Implement delays between requests or upgrade API plans
- Memory issues: Reduce max_content_length for large pages
Debug Mode
Set environment variable for detailed logging:
WEB_SCOUT_LOG_LEVEL=DEBUG python -m src.server
Support
For issues and questions:
- Create an issue on GitHub
- Check the documentation
- Review the troubleshooting section
Roadmap
- Advanced NLP analysis with spaCy
- PDF and document scraping support
- Real-time website monitoring with webhooks
- Database storage for long-term caching
- API rate limiting and queuing system
- Distributed scraping with multiple workers
- Machine learning-based content classification