iqbal-waqar/MCP-SERVER
If you are the rightful owner of MCP-SERVER and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
An intelligent web scraping tool built with the Model Context Protocol (MCP) that searches documentation, fetches web content, and provides AI-powered responses using Groq's LLM API.
AI Web Scraping MCP Server
An intelligent web scraping tool built with the Model Context Protocol (MCP) that searches documentation, fetches web content, and provides AI-powered responses.
🚀 Features
- Smart Documentation Search: Search through popular library documentation (LangChain, OpenAI, Llama-Index, UV)
- AI-Powered Web Scraping: Automatically clean and extract meaningful content from web pages
- Rate-Limited API Integration: Built-in handling for Groq API rate limits
- MCP Protocol: Seamless integration with MCP-compatible clients
- Async Processing: High-performance asynchronous operations
📋 Prerequisites
- Python 3.10 or higher
- UV package manager
- Valid API keys for:
- Serper API (for web search)
- Groq API (for LLM processing)
🔧 Installation
-
Clone the repository:
git clone <https://github.com/iqbal-waqar/MCP-SERVER> cd mcp-server-python -
Install dependencies using UV:
uv sync -
Set up environment variables: Create a
.envfile in the root directory:SERPER_API_KEY=your_serper_api_key_here GROQ_API_KEY=your_groq_api_key_here
🔑 API Keys Setup
Serper API Key
- Visit Serper.dev
- Sign up for a free account
- Get your API key from the dashboard
- Add it to your
.envfile
Groq API Key
- Visit Groq Console
- Create an account and verify your email
- Navigate to API Keys section
- Generate a new API key
- Add it to your
.envfile
⚠️ Important: Use a good quality API key with sufficient rate limits. The free tier has limited tokens per minute (6000 TPM) which may cause rate limiting issues during heavy usage.
🏗️ Project Structure
mcp-server-python/
├── mcp_server.py # Main MCP server implementation
├── client.py # Example client for testing
├── utils.py # Utility functions for HTML cleaning and LLM calls
├── .env # Environment variables (create this)
├── pyproject.toml # Project dependencies
└── README.md # This file
🚀 Usage
Running the MCP Server
The server runs using the stdio transport protocol:
uv run mcp_server.py
Using the Client
Test the server with the included client:
uv run client.py
Available Tools
get_docs
Search documentation for specific libraries and queries.
Parameters:
query(string): The search query (e.g., "How to publish a package with UV")library(string): The library to search in (langchain,openai,llama-index,uv)
Example:
result = await session.call_tool("get_docs", {
"query": "How to publish a package with uv on gitlab",
"library": "uv"
})
🔧 Configuration
Supported Libraries
- LangChain:
python.langchain.com/docs - OpenAI:
platform.openai.com/docs - Llama-Index:
docs.llamaindex.ai/en/stable - UV:
docs.astral.sh/uv
LLM Model
Currently configured to use llama-3.1-8b-instant from Groq. You can modify this in:
mcp_server.py(line 52)client.py(line 41)
⚠️ Common Issues & Solutions
Rate Limiting
If you encounter rate limit errors:
- Wait: Rate limits reset after a short period (usually 60 seconds)
- Upgrade: Consider upgrading your Groq plan for higher limits
- Optimize: Reduce the chunk size in
fetch_urlfunction - Retry: The system will automatically retry after rate limit resets
API Key Issues
- Ensure your
.envfile is in the root directory - Verify API keys are valid and active
- Check that you have sufficient credits/quota
Import Errors
Make sure you're running commands from within the virtual environment:
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
🛠️ Development
Adding New Documentation Sources
To add support for new documentation sites, update the docs_urls dictionary in mcp_server.py:
docs_urls = {
"your-library": "docs.yourlibrary.com",
# ... existing entries
}
Popular Libraries You Can Add:
- FastAPI:
fastapi.tiangolo.com - Django:
docs.djangoproject.com - Flask:
flask.palletsprojects.com - Pandas:
pandas.pydata.org/docs - NumPy:
numpy.org/doc - Scikit-learn:
scikit-learn.org/stable - TensorFlow:
tensorflow.org/api_docs - PyTorch:
pytorch.org/docs - Requests:
docs.python-requests.org - Pydantic:
docs.pydantic.dev - SQLAlchemy:
docs.sqlalchemy.org - Celery:
docs.celeryq.dev - Streamlit:
docs.streamlit.io - Gradio:
gradio.app/docs
Simply add any library's official documentation URL to expand the search capabilities!
Customizing LLM Behavior
Modify the system prompts in:
fetch_url()function for web scraping behaviorclient.pyfor response formatting
📊 Performance Notes
- Token Usage: Each web page can consume 1000-5000 tokens depending on content size
- Rate Limits: Free Groq tier allows 6000 tokens per minute
- Processing Time: Typical response time is 5-15 seconds per query
- Concurrent Requests: Limited by API rate limits
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🆘 Support
If you encounter issues:
- Check the Common Issues section
- Verify your API keys and quotas
- Ensure you're using the latest dependencies
- Open an issue with detailed error messages
🔗 Useful Links
Note: This tool is designed for educational and development purposes. Please respect rate limits and terms of service for all APIs used.