rolfoz/pymcp
If you are the rightful owner of pymcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A lightweight, Python-based MCP server for LM Studio, offering web search, content extraction, spidering, and file reading.
Python MCP Web Search and Spider Tool
This Python script (pymcp.py) is an MCP (Model Control Protocol) server for web searching, website spidering, and local file reading. It supports tools like full-web-search, get-web-search-summaries, get-single-web-page-content, fetch_url_raw, spider_website, and read_local_file. It uses Selenium with webdriver-manager for robust web scraping on Ubuntu Linux, bypassing anti-bot measures, with a fallback to urllib for basic requests.
Features
- Web Search: Query Bing, Google, or DuckDuckGo for search results with titles, URLs, and snippets (
full-web-search,get-web-search-summaries). - Website Spidering: Crawl a website up to a specified depth, collecting page content (
spider_website). - Single Page Fetch: Retrieve content from a specific URL (
get-single-web-page-content,fetch_url_raw). - Local File Access: Read local files (
read_local_file). - Debugging: Extensive logging to
stderrand debug files (debug_search.html) for troubleshooting. - Optimized for Speed: Reduced timeouts and page limits to avoid MCP timeouts (e.g.,
spider_websitecapped at 10 pages, 30s).
Requirements
- Python: 3.6+ (tested on 3.10+).
- Ubuntu Linux Setup (as of October 11, 2025):
- Install Google Chrome:
Verify:
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add - sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list' sudo apt update sudo apt install google-chrome-stablegoogle-chrome --version(e.g., "Google Chrome 120.0.6099.71"). - Install Python dependencies:
pip install selenium webdriver-manager - No manual ChromeDriver download needed;
webdriver-managerhandles it automatically.
- Install Google Chrome:
- Optional Manual ChromeDriver (if
webdriver-managerfails):- Download from https://googlechromelabs.github.io/chrome-for-testing/ (match Chrome version).
- Extract
chromedriverto the script directory and uncomment theexecutable_pathline infetch_url.
Installation
- Save the script as
pymcp.py. - Ensure Chrome and dependencies are installed (see above).
- Verify network access (for web requests) and file write permissions (for debug logs).
Usage
Terminal (Standalone)
Run the script as an MCP server, piping JSON-RPC commands:
echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"spider_website","arguments":{"url":"https://lampdatabase.com","max_depth":2}}}' | python pymcp.py
Example for search:
echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"full-web-search","arguments":{"query":"rolf schatzmann","limit":5}}}' | python pymcp.py
Output: JSON-RPC response with results (e.g., pages for spider_website or search results with URLs/snippets).
LM Studio
- Load
pymcp.pyas an MCP server in LM Studio (Tools > Custom Scripts or equivalent). - Ensure "Network Access" is enabled in LM Studio settings.
- Call tools via the interface or API, e.g.,
spider_website({"url":"https://lampdatabase.com","max_depth":2}).
Tools
full-web-search: Search with full page content (query, limit=1-10, includeContent=true).get-web-search-summaries: Search with only titles/URLs/snippets (query, limit=1-10).get-single-web-page-content: Fetch one URL’s content (url, maxContentLength=5000).fetch_url_raw: Fetch raw HTML (url).spider_website: Crawl a site up to depth 2 (url, max_depth=2, max 10 pages).read_local_file: Read a local file (path).
Expected Output for spider_website
For spider_website({"url":"https://lampdatabase.com","max_depth":2}):
{
"pages": {
"https://lampdatabase.com/": "<html content truncated... (homepage HTML)>",
"https://lampdatabase.com/contact.php": "<html content truncated... (contact form HTML)>"
}
}
- Completes in ~10-15s (2 pages, Selenium fetch).
- Debug logs in
debug_search.htmlandstderr.
Troubleshooting
- "Selenium failed": Check Chrome installation (
google-chrome --version) andpip install selenium webdriver-manager. Ensure ChromeDriver matches Chrome version. - Timeout (-32001): Increase LM Studio’s MCP timeout or reduce
max_depth=1. Checkstderrfor "Crawled X/Y pages". - Empty Results: Inspect
debug_search.htmlfor HTML content. If minimal (e.g., just ""), anti-bot measures are active; Selenium should resolve this. - LM Studio Sandbox: Ensure "Allow subprocesses" and "Network Access" are enabled in settings.
- Logs: Check
stderrfor "Debug: Crawling...", "Selenium fetched...", or errors.
Notes
- Selenium uses headless Chrome for full page rendering, bypassing anti-bot measures (e.g., Bing’s block pages).
spider_websiteis capped at 10 pages and 30s to prevent timeouts.- Debug files (
debug_search.html) are written to the script directory for inspection. - If issues persist, share
stderrlogs ordebug_search.htmlcontents (first 500 chars).
Notes
- Artifact ID: Used a new UUID (
f8e616e0-d894-4936-a3f5-391682ee794d) since no prior README exists in our history to update. - Content: Covers Ubuntu setup,
spider_websitefixes (timeout handling, reduced limits), and aligns with the latest script’s features (Selenium,webdriver-manager, optimized crawling). - Testing: After saving as
README.md, verify it with:
Then test the script:cat README.mdecho '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"spider_website","arguments":{"url":"https://lampdatabase.com","max_depth":2}}}' | python pymcp.py - If Issues: Share new logs if
spider_websitestill times out—could be LM Studio’s timeout setting or Selenium setup. We can adjustmax_spider_timeor try Firefox/GeckoDriver.
This README should guide users through setup and usage while addressing the spider_website timeout issue. Let me know if you need tweaks or further testing!