mcp-server-python

AIwithhassan/mcp-server-python

3.3

If you are the rightful owner of mcp-server-python and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

This project provides a minimal, async MCP server for retrieving and cleaning official documentation content for popular AI/Python ecosystem libraries.

Tools
1
Resources
0
Prompts
0

MCP Server: Documentation Retrieval & Web Scraping (uv + FastMCP)

This project provides a minimal, async MCP (Model Context Protocol) server that exposes a tool for retrieving and cleaning official documentation content for popular AI / Python ecosystem libraries. It uses:

  • fastmcp to define and run the MCP server over stdio.
  • httpx for async HTTP calls.
  • serper.dev for Google-like search (via API).
  • groq API (LLM) to clean raw HTML into readable text chunks.
  • python-dotenv for environment variable management.
  • uv as the package manager & runner (fast, lockfile-based, Python 3.11+).

Features

  • Search restricted to official docs domains (uv, langchain, openai, llama-index).
  • Tool: get_docs(query, library) returns concatenated cleaned sections with SOURCE: labels.
  • Streaming-safe async design (chunking large HTML pages before LLM cleaning).
  • Separate client.py demonstrating how to connect as an MCP client and call the tool, then post-process with an LLM.

Quick Start

Prerequisites:

1. Clone & Install

git clone <your-repo-url> mcp-server-python
cd mcp-server-python
uv sync

This will create/refresh a .venv based on pyproject.toml + uv.lock.

2. Environment Variables

Create a .env file in the project root:

SERPER_API_KEY=your_serper_api_key_here
GROQ_API_KEY=your_groq_api_key_here

Optional: add other model settings if you later extend functionality.

3. Run the MCP Server Directly

uv run mcp_server.py

The server will start and wait on stdio (no extra output unless you add logging). It registers the tool get_docs.

4. Use the Provided Client

uv run client.py

You should see something like:

Available tools: ['get_docs']
ANSWER: <model-produced answer referencing SOURCE lines>

If the list is empty, ensure the server started correctly and no exceptions were raised (add logging—see below).


Tool: get_docs

Signature:

get_docs(query: str, library: str) -> str

Supported libraries (keys): uv, langchain, openai, llama-index.

Flow:

  1. Build a site-restricted query: site:<docs-domain> <query>.
  2. Call Serper API for organic results.
  3. Fetch each result URL (async) via httpx.
  4. Split HTML into ~4000‑char chunks (memory safety & LLM limits).
  5. Clean each chunk using Groq LLM (openai/gpt-oss-20b) with a system prompt.
  6. Concatenate and label each block with SOURCE: <url> for traceability.

Returned value: A large text blob suitable for retrieval-augmented prompting, preserving source attribution lines.


Architecture

File overview:

FilePurpose
mcp_server.pyDefines FastMCP instance and implements search_web, fetch_url, and the get_docs tool.
client.pyLaunches server via stdio, lists tools, calls get_docs, then feeds result to an LLM for a user-friendly answer.
utils.pyHTML cleaning helper (currently uses LLM + trafilatura for extraction and Groq for chunk transformation).
.envEnvironment variables (excluded from VCS).
pyproject.tomlDeclares dependencies and metadata.
uv.lockReproducible lockfile generated by uv.

Dependency Notes

Core runtime deps (from pyproject.toml):

  • fastmcp – MCP server helper.
  • httpx – async HTTP client.
  • groq – Groq API client.
  • python-dotenv – load variables from .env.
  • trafilatura – heuristic content extraction (currently partially used / can be extended).

Tip: If you add more scraping tools, reuse a single httpx.AsyncClient for performance.


Logging & Debugging

To see what the server is doing, you can temporarily add:

import logging, sys
logging.basicConfig(level=logging.INFO, stream=sys.stderr)

Place near the top of mcp_server.py after imports. Since protocol uses stdout for JSON-RPC, send logs to stderr only.

Common issues:

  • Empty tool list: The server exited early or crashed—add logging.
  • SERPER_API_KEY missing → 401 or empty search results.
  • GROQ_API_KEY missing → LLM cleaning fails (exception in get_response_from_llm).
  • Network timeouts: Adjust timeout in httpx.AsyncClient calls.

Extending

Ideas:

  • Add caching layer (e.g., sqlite or in-memory dict) to avoid re-fetching same URLs.
  • Parallelize URL fetch + clean with asyncio.gather() (mind rate limits / LLM cost).
  • Add another tool (e.g., summarize_diff, list_endpoints).
  • Provide structured JSON output (list of sources + cleaned text) instead of concatenated string.
  • Add tests using pytest + pytest-asyncio (mock Serper + LLM APIs).

Example Programmatic Use (Without Client Wrapper)

If you want to call the tool directly in a Python script using the client-side MCP library:

from mcp.client.stdio import stdio_client
from mcp import ClientSession, StdioServerParameters
import asyncio

async def demo():
	params = StdioServerParameters(command="uv", args=["run", "mcp_server.py"])
	async with stdio_client(params) as (r, w):
		async with ClientSession(r, w) as session:
			await session.initialize()
			tools = await session.list_tools()
			print([t.name for t in tools.tools])
			docs = await session.call_tool("get_docs", {"query": "install", "library": "uv"})
			print(docs.content[:500])

asyncio.run(demo())

Running With Active Virtualenv

If you have an already activated virtual environment and want to use that instead of the project’s pinned environment, you can force uv to target it:

uv run --active client.py

Otherwise, uv will warn that your active $VIRTUAL_ENV differs from the project .venv but continue using the project environment.


License

Add a license section here (e.g., MIT) if you intend to distribute.


Troubleshooting Cheat Sheet

SymptomCauseFix
No tools listedServer not running / crashedAdd stderr logging; run uv run mcp_server.py manually
AttributeError on .textCleaner returned NoneEnsure you return actual string from fetch_url / LLM call
401 from SerperBad/missing API keyCheck .env and reload shell
Empty search resultsNarrow querySimplify query or verify domain key
High latencyMany sequential LLM chunk callsBatch or reduce chunk size

Contributing

  1. Fork & branch.
  2. Run uv sync.
  3. Add tests for new tools (if added).
  4. Open PR with clear description.

Roadmap (Optional)

  • [] Add JSON schema metadata for tool params.
  • [] Structured response format (list of {source, text}).
  • [] Add caching layer.
  • [] Add rate limiting/backoff.
  • [] Add CI workflow (lint + tests).

Acknowledgments

  • Serper.dev for search API
  • Groq for fast OSS model serving
  • Astral for uv
  • MCP ecosystem for protocol foundation