website-downloader

pskill9/website-downloader

4.1

website-downloader is hosted online, so all tools can be tested directly either in theInspector tabor in theOnline Client.

If you are the rightful owner of website-downloader and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Website Downloader MCP Server is designed to download entire websites using wget, preserving the website structure and converting links for local use.

Try website-downloader with chat:

MCPHub score:4.07

Has a README

Github repo has a README.md.

Has a License

Github repo doesn't have a valid license.

Server can be inspected

View server inspector

Server schema can be extracted

Can get at lease one tool info from the README or server.

Online hosted on MCPHub

Can be automatically deployed by MCPHub.

Has social accounts

Do not have any social accounts.

Claimed by the author or certified by MCPHub

If you are the author, claim authorship

AI Evaluation Report
Total Score: 7/10

The agent demonstrates a strong capability in downloading websites and providing detailed summaries of the downloaded content. It successfully handles various requests, including downloading specific pages from Wikipedia, Arxiv, and Wikinews, and provides comprehensive details about the files downloaded, including HTML files, static assets, and any encountered issues. The agent is transparent about potential errors, such as incomplete or invalid multibyte sequences, which enhances trust in its operations. However, the agent encountered a limitation when attempting to download from the EPA website, suggesting potential restrictions or technical issues. This indicates a limitation in handling certain websites, which slightly impacts its overall reliability. Despite this, the agent's performance is generally accurate, informative, and aligns well with its stated capabilities.

  • Test case 1
    Score: 9/10
    Download Python official documentation website to /data/python-docs directory with depth level 2, and list the context of the downloads.

    The Python official documentation website has been successfully downloaded to the /data/python-docs directory with a depth level of 2. Here is a summary of the contents that were downloaded:

    Overview of Downloaded Content

    1. HTML Files:

      • Index Pages:
        • index.html
        • about.html
        • copyright.html
        • download.html
        • search.html
      • Documentation Sections:
        • tutorial/index.html: The tutorial section
        • library/index.html: The library reference
        • reference/index.html: The language reference
        • using/index.html: How to use Python
        • faq/index.html: Frequently asked questions
    2. Static Assets:

      • CSS Files:
        • pygments.css
        • classic.css
        • pydoctheme.css
        • Various other CSS files for theming and styling.
      • JavaScript Files:
        • documentation_options.js
        • doctools.js
        • Various scripts for search functionality and interactivity.
    3. Images:

      • SVG images like py.svg used in the documentation.
    4. Archives:

      • Compressed files of the documentation in different formats (HTML, PDF):
        • python-3.13-docs-html.zip
        • python-3.13-docs-pdf-a4.zip
        • python-3.13-docs-html.tar.bz2
        • python-3.13-docs-pdf-a4.tar.bz2
    5. Other Index Files:

      • General and module index pages that list available modules and their descriptions.

    Conclusion

    The downloaded documentation includes a comprehensive set of HTML pages, stylesheets, scripts, images, and compressed archives of the documentation for Python 3.13. You can explore these files in the specified directory to access the official Python documentation offline.

  • Test case 2
    Score: 9/10
    Download https://en.wikipedia.org/wiki/Artificial_intelligence to /data/offline/wikipedia directory with depth level 1, and list the context of the downloads.

    The website "Artificial Intelligence" from Wikipedia has been successfully downloaded to the specified directory. Here are the details of the download:

    Download Details:

    • URL: Artificial Intelligence - Wikipedia
    • Downloaded File: /data/offline/wikipedia/en.wikipedia.org/wiki/Artificial_intelligence.html
    • File Size: Approximately 1.2 MB
    • Download Speed: 7.94 MB/s
    • Total Files Downloaded: 2 files (the main HTML page and a robots.txt file)

    Context of Downloads:

    1. Main Page:

      • The main content of the page regarding "Artificial Intelligence," which includes sections about its history, applications, methods, and implications.
    2. Robots.txt File:

      • A robots.txt file was also downloaded, which provides information about the web crawler access policies for the Wikipedia domain.
    Note:
    • The process encountered several "Incomplete or invalid multibyte sequence" warnings, but the main content was successfully downloaded.
    • Links within the downloaded HTML file have been converted for local accessibility.

    You can access the downloaded content in the specified directory: /data/offline/wikipedia/en.wikipedia.org/wiki/Artificial_intelligence.html.

  • Test case 3
    Score: 9/10
    Download https://arxiv.org/list/cs.LG/recent to /data/academic/arxiv-ml directory with depth level 1, and list the context of the downloads.

    The download operation for the website at arxiv.org has been successfully completed, although there were several errors related to incomplete or invalid multibyte sequences. Here’s a summary of what was downloaded:

    Download Summary

    • Total files downloaded: 63
    • Total size: Approximately 5.2 MB
    • Files saved in: /data/academic/arxiv-ml/

    Key Files and Directories

    1. Main Page:

      • recent.html - This is the main page listing recent submissions in the field of machine learning.
    2. Additional Pages:

      • Various pages with different skip parameters (e.g., ?skip=0&show=50, ?skip=100&show=50, etc.) which contain additional listings.
    3. CSS, JS, and Image Files:

      • Multiple CSS and JavaScript files used for styling and functionality.
      • Favicon and other image files related to the site.
    4. Robots.txt Files:

      • Files indicating the web crawling policies of the site.

    Errors Encountered

    • Multiple occurrences of "Incomplete or invalid multibyte sequence encountered" which might indicate issues with certain character encodings during the download process.

    Conclusion

    You can navigate to the /data/academic/arxiv-ml/ directory to view the downloaded files. If you need further assistance or specific information from the downloaded content, please let me know!

  • Test case 4
    Score: 2/10
    Download https://www.epa.gov/ to /data/gov directory with depth level 2, and list the context of the downloads.

    It seems that I'm currently unable to download the website from the EPA due to restrictions or technical issues. However, I can guide you on how to do this manually:

    How to Download the Website Manually

    1. Download Wget:

      • If you haven't already, download and install Wget. It's a free utility for non-interactive download of files from the web.
    2. Open a Terminal or Command Prompt:

      • On Windows, search for "Command Prompt" or "PowerShell".
      • On macOS or Linux, open the Terminal.
    3. Run the Wget Command: Use the following command to download the website with a depth level of 2:

      wget --recursive --level=2 --no-parent --directory-prefix=/data/gov https://www.epa.gov/
      

    Explanation of the Command:

    • --recursive: Enables recursive downloading.
    • --level=2: Sets the maximum depth to 2 levels.
    • --no-parent: Prevents downloading files from parent directories.
    • --directory-prefix=/data/gov: Specifies the output directory for downloaded files.

    After Downloading:

    • Navigate to the /data/gov directory to find the downloaded files.
    • You can open the HTML files in a browser to view the content.

    If you need help with something specific regarding the EPA website or its content, let me know!

  • Test case 5
    Score: 9/10
    Download https://en.wikinews.org/wiki/Malcolm-Jamal_Warner,_%27Cosby_Show%27_actor,_drowns_in_Costa_Rica to /data/news directory with depth level 1, and list the context of the downloads.

    The download of the webpage regarding Malcolm-Jamal Warner from Wikinews has been completed successfully. Here's a summary of the download details:

    Download Summary

    • URL: Malcolm-Jamal Warner, 'Cosby Show' actor, drowns in Costa Rica
    • Output Directory: /data/news
    • Depth Level: 1
    • Total Files Downloaded: 3
    • Total Size: Approximately 100 KB
    • Files Saved:
      • The main article saved as:
        • /data/news/en.wikinews.org/wiki/Malcolm-Jamal_Warner,_\'Cosby_Show\'_actor,_drowns_in_Costa_Rica.html
      • The robots.txt file saved as:
        • /data/news/en.wikinews.org/robots.txt

    Context of the Downloads

    1. Main Article: This HTML file contains the complete article about Malcolm-Jamal Warner's tragic incident. It includes details such as the circumstances surrounding the drowning and any relevant statements or reactions from family or public figures.

    2. robots.txt: This file indicates the rules for web crawlers regarding which parts of the website can be accessed or indexed.

    The main content can be viewed in the saved HTML file. If you need further analysis or specific information from the downloaded content, please let me know!