webcrawl-mcp

webcrawl-mcp

3.2

If you are the rightful owner of webcrawl-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Webcrawl MCP Server is a production-ready implementation of the Model Context Protocol (MCP) designed for comprehensive web crawling and intelligent content extraction.

The Webcrawl MCP Server is a robust and fully compliant implementation of the Model Context Protocol (MCP), offering advanced web crawling capabilities. It is designed to handle complex web crawling tasks with features like intelligent content extraction, link analysis, and sitemap generation. The server is built to be production-ready, ensuring reliability and efficiency in handling large-scale web data extraction. It supports modern transport protocols and provides a suite of tools for various web-related tasks, including content search and web search. The server is also equipped with an abort functionality to gracefully cancel long-running operations, making it a versatile tool for developers and businesses looking to automate web data collection and analysis.

Features

  • 100% MCP Compliant: Ensures full compliance with the latest MCP specification, providing a reliable and standardized protocol implementation.
  • Abort Functionality: Allows for the graceful cancellation of long-running operations, enhancing control and flexibility.
  • Smart Crawling: Features intelligent content extraction with relevance scoring to prioritize important information.
  • Link Analysis: Offers advanced link extraction and categorization for comprehensive web data analysis.
  • Sitemap Generation: Automatically generates detailed sitemaps to map out website structures efficiently.

Tools

  1. crawl

    Basic web crawling tool for extracting page content, metadata, and links.

  2. smartCrawl

    Intelligent crawling tool with relevance scoring and smart navigation.

  3. extractLinks

    Tool for extracting and categorizing links from a web page.

  4. searchInPage

    Tool for searching specific content within a web page.

  5. generateSitemap

    Tool for creating a sitemap by crawling website pages.

  6. webSearch

    Tool for performing web searches with multi-engine support.

  7. getDateTime

    Utility tool for date/time functions with timezone support.