pymcp

rolfoz/pymcp

3.2

If you are the rightful owner of pymcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

A lightweight, Python-based MCP server for LM Studio, offering web search, content extraction, spidering, and file reading.

Tools
6
Resources
0
Prompts
0

Python MCP Web Search and Spider Tool

This Python script (pymcp.py) is an MCP (Model Control Protocol) server for web searching, website spidering, and local file reading. It supports tools like full-web-search, get-web-search-summaries, get-single-web-page-content, fetch_url_raw, spider_website, and read_local_file. It uses Selenium with webdriver-manager for robust web scraping on Ubuntu Linux, bypassing anti-bot measures, with a fallback to urllib for basic requests.

Features

  • Web Search: Query Bing, Google, or DuckDuckGo for search results with titles, URLs, and snippets (full-web-search, get-web-search-summaries).
  • Website Spidering: Crawl a website up to a specified depth, collecting page content (spider_website).
  • Single Page Fetch: Retrieve content from a specific URL (get-single-web-page-content, fetch_url_raw).
  • Local File Access: Read local files (read_local_file).
  • Debugging: Extensive logging to stderr and debug files (debug_search.html) for troubleshooting.
  • Optimized for Speed: Reduced timeouts and page limits to avoid MCP timeouts (e.g., spider_website capped at 10 pages, 30s).

Requirements

  • Python: 3.6+ (tested on 3.10+).
  • Ubuntu Linux Setup (as of October 11, 2025):
    1. Install Google Chrome:
      wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
      sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
      sudo apt update
      sudo apt install google-chrome-stable
      
      Verify: google-chrome --version (e.g., "Google Chrome 120.0.6099.71").
    2. Install Python dependencies:
      pip install selenium webdriver-manager
      
    3. No manual ChromeDriver download needed; webdriver-manager handles it automatically.
  • Optional Manual ChromeDriver (if webdriver-manager fails):

Installation

  1. Save the script as pymcp.py.
  2. Ensure Chrome and dependencies are installed (see above).
  3. Verify network access (for web requests) and file write permissions (for debug logs).

Usage

Terminal (Standalone)

Run the script as an MCP server, piping JSON-RPC commands:

echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"spider_website","arguments":{"url":"https://lampdatabase.com","max_depth":2}}}' | python pymcp.py

Example for search:

echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"full-web-search","arguments":{"query":"rolf schatzmann","limit":5}}}' | python pymcp.py

Output: JSON-RPC response with results (e.g., pages for spider_website or search results with URLs/snippets).

LM Studio

  1. Load pymcp.py as an MCP server in LM Studio (Tools > Custom Scripts or equivalent).
  2. Ensure "Network Access" is enabled in LM Studio settings.
  3. Call tools via the interface or API, e.g., spider_website({"url":"https://lampdatabase.com","max_depth":2}).

Tools

  • full-web-search: Search with full page content (query, limit=1-10, includeContent=true).
  • get-web-search-summaries: Search with only titles/URLs/snippets (query, limit=1-10).
  • get-single-web-page-content: Fetch one URL’s content (url, maxContentLength=5000).
  • fetch_url_raw: Fetch raw HTML (url).
  • spider_website: Crawl a site up to depth 2 (url, max_depth=2, max 10 pages).
  • read_local_file: Read a local file (path).

Expected Output for spider_website

For spider_website({"url":"https://lampdatabase.com","max_depth":2}):

{
  "pages": {
    "https://lampdatabase.com/": "<html content truncated... (homepage HTML)>",
    "https://lampdatabase.com/contact.php": "<html content truncated... (contact form HTML)>"
  }
}
  • Completes in ~10-15s (2 pages, Selenium fetch).
  • Debug logs in debug_search.html and stderr.

Troubleshooting

  • "Selenium failed": Check Chrome installation (google-chrome --version) and pip install selenium webdriver-manager. Ensure ChromeDriver matches Chrome version.
  • Timeout (-32001): Increase LM Studio’s MCP timeout or reduce max_depth=1. Check stderr for "Crawled X/Y pages".
  • Empty Results: Inspect debug_search.html for HTML content. If minimal (e.g., just ""), anti-bot measures are active; Selenium should resolve this.
  • LM Studio Sandbox: Ensure "Allow subprocesses" and "Network Access" are enabled in settings.
  • Logs: Check stderr for "Debug: Crawling...", "Selenium fetched...", or errors.

Notes

  • Selenium uses headless Chrome for full page rendering, bypassing anti-bot measures (e.g., Bing’s block pages).
  • spider_website is capped at 10 pages and 30s to prevent timeouts.
  • Debug files (debug_search.html) are written to the script directory for inspection.
  • If issues persist, share stderr logs or debug_search.html contents (first 500 chars).

Notes

  • Artifact ID: Used a new UUID (f8e616e0-d894-4936-a3f5-391682ee794d) since no prior README exists in our history to update.
  • Content: Covers Ubuntu setup, spider_website fixes (timeout handling, reduced limits), and aligns with the latest script’s features (Selenium, webdriver-manager, optimized crawling).
  • Testing: After saving as README.md, verify it with:
    cat README.md
    
    Then test the script:
    echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"spider_website","arguments":{"url":"https://lampdatabase.com","max_depth":2}}}' | python pymcp.py
    
  • If Issues: Share new logs if spider_website still times out—could be LM Studio’s timeout setting or Selenium setup. We can adjust max_spider_time or try Firefox/GeckoDriver.

This README should guide users through setup and usage while addressing the spider_website timeout issue. Let me know if you need tweaks or further testing!