mcp-playwright-scraper
If you are the rightful owner of mcp-playwright-scraper and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol (MCP) server that scrapes web content and converts it to Markdown.
The mcp-playwright-scraper is a Model Context Protocol server designed to scrape web content and convert it into Markdown format. It leverages Playwright for headless browser automation, allowing it to handle modern web pages, including those with heavy JavaScript. BeautifulSoup is used for HTML parsing and cleanup, while Pypandoc is employed for high-quality HTML to Markdown conversion. This server is particularly useful for developers and content creators who need to extract and format web content efficiently. It supports both SSL verification and non-verification, providing flexibility in handling different web environments. The server can be installed via pip or run directly using uvx, making it accessible for various development setups.
Features
- Headless browser automation with Playwright
- HTML parsing and cleanup using BeautifulSoup
- High-quality HTML to Markdown conversion with Pypandoc
- SSL verification option for secure scraping
- Easy installation and execution via pip or uvx
Tools
scrape_to_markdown
Scrapes content from a URL and converts it to Markdown. Requires 'url' parameter and optional 'verify_ssl' parameter.