DrBoom-233/ODLP_MCP
If you are the rightful owner of ODLP_MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
ODLP MCP is a lightweight Model Context Protocol (MCP) service that leverages a large language model (LLM) to extract product and price information from e-commerce pages.
🚀 ODLP MCP
A lightweight MCP service that uses a large language model (LLM) to extract price and product information from e‑commerce pages.
✨ Highlights
- Adapts to many e-commerce layouts and page structures.
- Extracts fields like product name, price, size, color, country/region, etc.
- No hardcoded rules, just describe your needs in natural language.
✨ Difference between ODLP_MCP and crawl4ai (https://github.com/unclecode/crawl4ai):
- ODLP_MCP focuses on e-commerce product data extraction, while crawl4ai is a general web scraping framework. ODLP_MCP is designed for robust price data extraction across diverse e-commerce sites.
✨ How it works
- Describe your extraction needs in natural language (e.g., “Extract product title, current price, available sizes, shipping country, and return JSON”).
- The LLM plans steps, generates CSS/XPath selectors, and performs extraction automatically.
🔧 Before You Start
- Save a page as MHTML: in your browser use "Save as" → "Webpage, Single File (*.mhtml)". To ensure extraction success, please go to category specific grid-style product listing pages, where the DOM Tree traversing algorithm can work. Example url like:
https://www.metro.ca/en/online-grocery/aisles/fruits-vegetables
. Not likehttps://www.amazon.ca/
. - Obtain an LLM API key (e.g., OpenAI). Recommended: one general model (e.g.,
gpt-4o-mini
) and one reasoning model (e.g.,o4-mini
).
🛠 Environment Setup
The project manages Python dependencies with uv.
# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# create and activate virtual environment
uv venv
source .venv/bin/activate
# install project dependencies
uv sync
🔑 API Key Configuration
config.py
reads API keys and model names from environment variables. Create a .env
file in the project root:
OPENAI_API_KEY=your_openai_key
OPENAI_MODEL=gpt-4o-mini
OPENAI_REASONING_MODEL=o4-mini
Use OPENAI_MODEL
for general extraction and OPENAI_REASONING_MODEL
for selector/reasoning tasks.
🔁 Model & Service Customization
OpenAI is the default. To switch providers (e.g., DeepSeek), update API calls in:
config.py
extractor/ocr.py
extractor/css_selector_generator.py
OCR service can be changed via the service_type
parameter of process_ocr_price
.
🔌 Connect to MCP Clients
Create an MCP.json
to register this server with an MCP-compatible client:
{
"servers": {
"ODLP_MCP": {
"type": "stdio",
"command": "uv",
"args": [
"run",
"--project", "${workspaceFolder}",
"--with", "mcp",
"--with", "python-dotenv",
"--with", "openai",
"--with", "drissionpage",
"--with", "beautifulsoup4",
"--with", "playwright",
"--with", "pytesseract",
"mcp",
"run",
"/absolute/path/to/server.py"
]
}
}
}
Replace /absolute/path/to/server.py
with the absolute path to server.py
on your machine.
⭐ From my side I use GitHub Copilot in VS Code as the client. Tutorial website: https://code.visualstudio.com/docs/copilot/customization/mcp-servers; you may use other MCP-compatible clients (Claude Code, Gemini CLI, etc.).