alicenjr/BeautifulSoup-MCP-Server
If you are the rightful owner of BeautifulSoup-MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Flipkart Scraper MCP is a lightweight HTML scraper using Requests and BeautifulSoup, with a small SQLite store and an MCP server for exposing scraping and read APIs.
flipkart-scraper-mcp
Lightweight HTML scraper using Requests + BeautifulSoup with a small SQLite store and an MCP server (via fastmcp) exposing scraping and read APIs.
This project does not use Playwright. It performs fast, static HTML fetches suitable for simple pages and list pagination.
Features
- MCP tools: health check, quick title fetch, multi-page scrape with storage, and read access to scraped text
- SQLite storage with normalized tables for pages, text content, headings, links, and images
- Simple pagination: use
{page}placeholder or an automatic?page=query parameter
Requirements
- Python >= 3.13
- Windows, macOS, or Linux
Dependencies (see pyproject.toml):
requestsbeautifulsoup4lxmlfastmcp
Installation
Using uv (recommended):
uv sync
Or using pip:
pip install -e .
# or, if not using an editable install:
pip install -r <(uv export --format requirements-txt) # optional if you export
Running
Interactive CLI mode (scrape pages, then optionally start the server):
python min.py
Non-interactive MCP server mode:
python min.py --server
The default SQLite database path is flipkart.db in the project root.
Available MCP Tools
The MCP server (from min.py) registers the following tools:
-
health_check- Returns a basic success payload to verify the server is running.
-
fetch_page_title- Input:
{ "url": string } - Fetches a single URL and returns
{ ok, title, url, status_code }.
- Input:
-
scrape_pages_store_sqlite- Input:
{ "url": string, "num_pages"?: number } - Behavior:
- If
urlincludes{page}, it is replaced with page numbers starting at 1. - Otherwise a
pagequery parameter is added/replaced.
- If
- Returns: summary
{ ok, pages_scraped, errors, results[] }and stores data into SQLite.
- Input:
-
read_page_text- Input:
{ "page_id"?: number, "contains"?: string, "limit"?: number, "offset"?: number } - Read-only access to the
page_texttable. Returns{ ok, rows, count }.
- Input:
Example Calls (conceptual)
Fetch a quick title:
mcp call fetch_page_title '{"url": "https://example.com"}'
Scrape multiple pages using a placeholder:
mcp call scrape_pages_store_sqlite '{"url": "https://example.com/list?page={page}", "num_pages": 3}'
Scrape when the site already uses a ?page= query:
mcp call scrape_pages_store_sqlite '{"url": "https://example.com/list?q=phones", "num_pages": 2}'
Query stored text content containing a keyword (latest first):
mcp call read_page_text '{"contains": "iphone", "limit": 20}'
Database Schema
On first run, the following tables are created in flipkart.db:
-
pagesidINTEGER PRIMARY KEYurlTEXT UNIQUEstatus_codeINTEGERtitleTEXTmeta_descriptionTEXTmeta_keywordsTEXTfetched_atTEXT (UTC ISO8601)
-
page_textidINTEGER PRIMARY KEYpage_idINTEGER REFERENCESpages(id)contentTEXT
-
headingsidINTEGER PRIMARY KEYpage_idINTEGER REFERENCESpages(id)levelTEXT (e.g.,h1..h6)textTEXT
-
linksidINTEGER PRIMARY KEYpage_idINTEGER REFERENCESpages(id)hrefTEXTtextTEXT
-
imagesidINTEGER PRIMARY KEYpage_idINTEGER REFERENCESpages(id)srcTEXTaltTEXT
Notes and Limitations
- Designed for static HTML. Sites requiring JS rendering may return limited content.
- Keep reasonable request timeouts and respect target sites' robots and rate limits.
- The database file path is fixed to
flipkart.dbin the current implementation.
Development
Run with auto-reload or your preferred debugger. Code entrypoint is min.py. The FastMCP app name is test-scraper.