JeremyDong22/taobao_mcp
If you are the rightful owner of taobao_mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Model Context Protocol (MCP) server for scraping Taobao/Tmall product information.
Taobao MCP Server
Model Context Protocol (MCP) server for scraping Taobao/Tmall (淘宝/天猫) product information.
This MCP server enables AI assistants to fetch comprehensive product data from Taobao and Tmall, including product details, images, specifications, customer reviews, and Q&A sections. Perfect for product research, comparison, and analysis.
🌟 Features
- Automatic Link Detection: Recognizes Taobao/Tmall links in Chinese share text
- Multiple Input Formats: Supports product IDs, direct URLs, short links, and share text
- Comprehensive Data: Scrapes titles, prices, images, specs, reviews, and Q&A
- Persistent Sessions: Browser session remains logged in across multiple requests
- Bilingual Support: Handles both English and Chinese (中文) input/output
- Markdown Output: Returns structured, AI-friendly Markdown format
📦 What Gets Scraped
For each product, this MCP server extracts:
| Data Type | Details |
|---|---|
| Basic Info | Title (标题), Price (价格), Store Name (店铺), Product ID |
| Images | Thumbnail images (缩略图) + Detailed product images (详情图) |
| Parameters | Product specifications (参数) and attributes (属性) |
| Reviews | Customer reviews (用户评价) with text, ratings, and photos |
| Q&A | Customer questions and seller answers (问答) |
🚀 Installation
Prerequisites
- Python 3.10 or higher
uvpackage manager (or regularpip)- Chromium browser (installed automatically by Playwright)
Step 1: Install Dependencies
cd taobao_mcp
uv pip install -e .
# OR
pip install -e .
Step 2: Install Playwright Browser
playwright install chromium
Step 3: Configure MCP Server
Create or edit the MCP configuration file:
For Claude Code CLI
Create .mcp.json in your project root:
{
"mcpServers": {
"taobao-scraper": {
"command": "python3",
"args": [
"/absolute/path/to/taobao_mcp/server.py"
],
"env": {}
}
}
}
Replace /absolute/path/to/ with the actual path to this directory!
For Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"taobao-scraper": {
"command": "python3",
"args": [
"/absolute/path/to/taobao_mcp/server.py"
]
}
}
}
Step 4: Restart Your AI Assistant
- Claude Code CLI: Start a new conversation or reload
- Claude Desktop: Restart the application
🎯 Usage Workflow
Step 1: Initialize Login (REQUIRED FIRST)
Before scraping any products, you MUST initialize the browser session:
User: "Initialize Taobao login"
AI: [Calls taobao_initialize_login]
The AI will:
- Launch a Chrome browser window
- Navigate to Taobao homepage
- Check if login is required
- If login needed: Wait for you to scan the QR code (扫码登录)
- Save the session for future use
This only needs to be done ONCE per session!
Step 2: Fetch Product Information
After initialization, you can fetch products using any of these formats:
User: "【淘宝】假一赔四 https://e.tb.cn/h.StvCjJlWxkNatsx?tk=xxx MF937 「UBV美式休闲空气层棉...」"
AI: [Calls taobao_fetch_product_info with the full text]
User: "Analyze this product: https://detail.tmall.com/item.htm?id=881280651752"
AI: [Calls taobao_fetch_product_info with the URL]
User: "Research product 881280651752"
AI: [Calls taobao_fetch_product_info with the ID]
The AI will automatically:
- Extract the product ID/URL from your message
- Navigate to the product page
- Scrape all available information
- Return structured Markdown
🔧 Available Tools
1. taobao_initialize_login
Purpose: Initialize browser session and handle Taobao authentication
Parameters: None
When to call:
- User mentions Taobao (淘宝), Tmall (天猫), or provides a product link
- MUST be called BEFORE
taobao_fetch_product_info - Only needs to be called ONCE per session
Returns:
- Status:
success,login_required,already_initialized, orerror - Message with next steps
Example:
User: "帮我做一个research" + Taobao link
AI: → Calls taobao_initialize_login first
2. taobao_fetch_product_info
Purpose: Scrape comprehensive product information and return as Markdown
Parameters:
product_url_or_id(string): Product ID, URL, short link, or share text
Supported formats:
- Product ID:
"881280651752" - Direct URL:
"https://detail.tmall.com/item.htm?id=881280651752" - Short link:
"https://e.tb.cn/h.StvCjJlWxkNatsx?tk=xxx" - Share text:
"【淘宝】假一赔四 https://e.tb.cn/h.xxx MF937 「商品名称」"
Returns:
- Markdown-formatted product information
- Includes metadata (scrape time, image counts, review counts)
- Image URLs as Markdown image links
- Parameter tables
- Structured reviews and Q&A
Example:
User: "【淘宝】product https://e.tb.cn/h.xxx"
AI: → Calls taobao_fetch_product_info(product_url_or_id='【淘宝】product https://e.tb.cn/h.xxx')
🤖 How AI Assistants Will Use This
Automatic Detection
When you mention Taobao-related keywords or provide links, the AI will automatically:
Trigger Keywords (English):
- Taobao, Tmall, product, scrape, research, analyze, compare, reviews
Trigger Keywords (Chinese):
- 淘宝, 天猫, 商品, 分析, 对比, 评价, 价格
Trigger Patterns:
- URLs starting with
https://e.tb.cn/ - URLs containing
detail.tmall.comoritem.taobao.com - Share text like
【淘宝】...
Example Conversations
Scenario 1: Research Request
User: "帮我做一个research" + [Taobao link]
AI思考:
1. User wants to research a Taobao product
2. Need to initialize first (if not already done)
3. Then fetch the product info
AI执行:
→ taobao_initialize_login()
→ taobao_fetch_product_info(product_url_or_id='[link]')
→ Analyzes the returned Markdown and presents insights
Scenario 2: Product Comparison
User: "Compare these two products: [link1] and [link2]"
AI执行:
→ taobao_initialize_login() (if needed)
→ taobao_fetch_product_info([link1])
→ taobao_fetch_product_info([link2])
→ Compares prices, specs, reviews, etc.
Scenario 3: User Doesn't Know About MCP
User: "Can you help me browse Taobao?"
AI:
"I can help you research Taobao products! I have access to a Taobao scraping tool.
First, I need to initialize the browser session. This will open a browser window where you may need to scan a QR code if login is required.
Let me start the initialization..."
→ taobao_initialize_login()
⚠️ Important Notes
Prerequisites
-
ALWAYS call
taobao_initialize_loginfirst- This is mandatory before any product scraping
- The AI should do this automatically when detecting Taobao content
-
Browser must remain open
- The browser window will stay open to maintain the session
- Don't close it manually during scraping
-
QR Code Login
- If Taobao requires login, scan the QR code in the browser window
- Use the Taobao mobile app to scan
- Session will be saved for future use
Data Freshness
- Scrapes live data from Taobao/Tmall
- Reviews and prices are current as of scrape time
- Scraped data is returned immediately (not cached)
Limitations
- Only works with public product pages
- Some products may require login to view full details
- Rate limiting may apply for too many requests in short time
- Page structure changes may require updates to selectors
🐛 Troubleshooting
Error: "Browser not initialized"
Cause: Trying to fetch product before initializing
Solution:
AI should call: taobao_initialize_login()
Error: "Could not extract product ID"
Cause: Invalid product link or ID format
Solution: Verify the input format matches one of:
- Product ID (12-13 digits):
881280651752 - Direct URL:
https://detail.tmall.com/item.htm?id=881280651752 - Short link:
https://e.tb.cn/h.xxx - Share text containing the above
Error: "Login required"
Cause: Session expired or not logged in
Solution:
- Call
taobao_initialize_loginagain - Scan QR code in browser if prompted
- Retry fetching the product
Browser Doesn't Open
Cause: Playwright browser not installed or permission issues
Solution:
playwright install chromium
# Verify installation
python3 -c "from playwright.sync_api import sync_playwright; p = sync_playwright().start(); browser = p.chromium.launch(); print('✓ OK'); browser.close(); p.stop()"
Short Link Resolution Fails
Cause: Network issues or invalid short link
Solution: Try using the direct product URL or ID instead
MCP Server Connection Failed (Claude Code)
Symptoms:
- MCP server shows as "failed" in
/mcpmanagement panel - Error in logs:
"No such file or directory (os error 2)" - Logs location:
/Users/yourusername/Library/Caches/claude-cli-nodejs/-Users-yourusername-Desktop-test/mcp-logs-taobao-scraper/
Common Causes:
- Incorrect path in
.mcp.json: The configuration file may have wrong directory paths - Using wrong Python interpreter: Not using the virtual environment's Python
- Double-nested directories: Path like
/path/to/taobao_mcp/taobao_mcp(incorrect)
Solution:
-
Check your
.mcp.jsonconfiguration (located in your project root):{ "mcpServers": { "taobao-scraper": { "command": "/absolute/path/to/taobao_mcp/.venv/bin/python", "args": [ "/absolute/path/to/taobao_mcp/server.py" ], "env": {} } } } -
Verify paths are correct:
# Check Python exists ls -la /path/to/taobao_mcp/.venv/bin/python # Check server.py exists ls -la /path/to/taobao_mcp/server.py # Test server can start /path/to/taobao_mcp/.venv/bin/python /path/to/taobao_mcp/server.py <<< '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}' -
Common mistakes to avoid:
- ❌ Using
python3instead of full path to venv Python - ❌ Path like
/path/to/taobao_mcp/taobao_mcp/server.py(double directory) - ❌ Relative paths like
./server.py(use absolute paths) - ✅ Correct: Full absolute paths to both Python and server.py
- ❌ Using
-
After fixing, restart Claude Code to reload the MCP server
Debug Tips:
- Run
claude --debugto see detailed logs - Check log files in
~/Library/Caches/claude-cli-nodejs/ - Look for the most recent log file in the
mcp-logs-taobao-scraper/directory
📂 File Structure
taobao_mcp/
├── README.md # This file (comprehensive documentation)
├── server.py # Main MCP server with tool registration
├── taobao_scraper.py # Core scraping logic and browser automation
├── pyproject.toml # Python dependencies
├── USAGE.txt # Quick reference guide
└── .venv/ # Virtual environment (created during install)
🔄 Update & Maintenance
Updating the MCP Server
cd taobao_mcp
git pull # If using git
uv pip install -e . --force-reinstall
Clearing Browser Cache
If experiencing issues, clear the browser profile:
rm -rf ../user_data/chrome_profile
Then re-initialize and login again.
Version Information
- Current Version: 1.2
- Last Updated: 2025-11-17
- MCP Protocol Version: 1.0
- Python SDK Version: Compatible with mcp>=0.9.0
💡 Tips for Best Results
- Use Chinese share text directly - No need to extract the link manually
- Initialize once per session - Don't re-initialize unless login expires
- Wait for initialization to complete - Scan QR code if prompted
- Keep browser window open - Don't close it during scraping
- Use direct URLs when possible - Faster than resolving short links
🌐 Language Support
This MCP server fully supports:
- English: All tool descriptions and error messages
- Chinese (中文): Recognizes Chinese product names, descriptions, and share text
- Mixed Input: Handles bilingual text seamlessly
📝 License & Credits
Created for research and product analysis purposes.
Dependencies:
- MCP Python SDK - Model Context Protocol implementation
- Playwright - Browser automation
- Pydantic - Input validation
- aiohttp - HTTP client
🆘 Support
If you encounter issues:
- Check this README thoroughly
- Verify installation steps were followed correctly
- Check browser profile permissions
- Try clearing browser cache and re-initializing
- Ensure Taobao website is accessible from your network
Happy Scraping! 🎉
This MCP server helps AI assistants research Taobao products efficiently.