mcp_server

sriramkreddy10/mcp_server

3.1

If you are the rightful owner of mcp_server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The MCP Server is a modular, production-ready FastAPI server designed to manage and orchestrate multiple AI/LLM-powered models through a unified, scalable interface.

🧠 MCP Server (Model Compute Paradigm)

A modular, production-ready FastAPI server built to route and orchestrate multiple AI/LLM-powered models behind a unified, scalable interface. It supports streaming chat, LLM-based routing, and multi-model pipelines (like analyze → summarize → recommend) – all asynchronously and fully Dockerized.


🎯 Project Score (Production Readiness)

CapabilityStatusDetails
🧠 Multi-Model Orchestration✅ CompleteDynamic routing between chat, summarize, sentiment, recommend
🤖 LLM-Based Task Router✅ CompleteGPT-powered routing via "auto" task type
🔁 Async FastAPI + Concurrency✅ CompleteAsync/await + concurrent task execution with simulated/model API delays
🔊 GPT Streaming Support✅ Completetext/event-stream chunked responses for chat endpoints
🧪 Unit + Mocked API Tests✅ CompletePytest-based test suite with mocked run() responses
🐳 Dockerized + Clean Layout✅ CompletePython 3.13 base image, no Conda dependency, production-ready Dockerfile
📦 Metadata-Driven Registry✅ CompleteModel metadata loaded from external YAML config
🔐 Rate Limiting & Retry⏳ In ProgressHandles 429 retry loop; rate limiting controls WIP
🧪 CI + Docs⏳ NextGitHub Actions + Swagger/Redoc planned

🧩 Why This Project? (Motivation)

Modern ML/LLM deployments often involve:

  • Multiple task types and model backends (OpenAI, HF, local, REST)
  • Routing decisions based on input intent
  • Combining outputs of multiple models (e.g., summarize + recommend)
  • Handling 429 retries, async concurrency, streaming responses

🔧 However, building such an LLM backend API server that is:

  • Async + concurrent
  • Streamable
  • Pluggable (via metadata)
  • Testable
  • Dockerized … is non-trivial and not easily found in one single place.

💡 What We’ve Built (Solution)

This repo is a production-ready PoC of an MCP (Model-Compute Paradigm) architecture:

  • FastAPI-based microserver to handle multiple tasks via /task endpoint
  • ✅ Task router that can:
    • 🔁 Dispatch to specific model types (chat, sentiment, summarize, recommend)
    • 🤖 Use an LLM to infer which task to run (auto)
    • 🧠 Run multiple models in sequence (analyze)
  • ✅ GPT streaming via text/event-stream
  • ✅ Async/await enabled architecture for concurrency
  • ✅ Clean modular code for easy extension
  • ✅ Dockerized for deployment
  • ✅ Tested using Pytest with mocking

🛠️ Use Cases

Use CaseMCP Server Support
Build your own ChatGPT-style APIchat task with streaming
Build intelligent task routerauto task with GPT-powered intent parsing
Build AI pipelines (like RAG/RL)analyze task with sequential execution
Swap between OpenAI/HuggingFace APIs✅ Via model_registry.yaml config
Add custom models (e.g., OCR, vision)✅ Just add a new module + registry entry

🚀 Features

  • Async FastAPI server
  • 🧠 Task-based Model Routing (chat, sentiment, recommender, summarize)
  • 📄 Model Registry from YAML/JSON
  • 🔁 Automatic Retry and Rate Limit Handling for APIs
  • 🔄 Streaming Responses for Chat
  • 🧪 Unit Tests + Mocked API Calls
  • 🐳 Dockerized for production deployment
  • 📦 Modular structure, ready for CI/CD

🏗 Architecture Overview

               ┌────────────┐
               │  Frontend  │
               └─────┬──────┘
              ┌────────────┐        YAML/JSON
              │  FastAPI   │◄────┐ Model Registry
              │   Server   │     │
              └─────┬──────┘     ▼
     ┌──────────────┼──────────────┐
     │              │              │
     ▼              ▼              ▼
 [chat]         [sentiment]   [recommender]
  GPT-4         HF pipeline   stub logic / API

---
🛠 Setup
📦 Install dependencies
git clone https://github.com/YOUR_USERNAME/mcp-server.git
cd mcp-server
---
# Optional: create virtualenv
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
or
conda create -n <env_name>
conda activate <env_name>

pip install -r requirements.txt

▶️ Run the server
uvicorn app:app --reload

Access the docs at: http://localhost:8000/docs


🧪 Running Tests
pytest tests/

Unit tests mock external API calls using unittest.mock.AsyncMock.

🐳 Docker Support
🔨 Build image
docker build -t mcp-server .

🚀 Run container
docker run -p 8000:8000 mcp-server

🧰 Example API Request
curl -X POST http://localhost:8000/task \
  -H "Content-Type: application/json" \
  -d '{
        "type": "chat",
        "input": "What are the benefits of restorative yoga?"
      }'

🔍 Directory Structure
mcp/
├── app.py                  # FastAPI entry
├── models/                 # ML models (chat, sentiment, etc.)
├── agent/
│   ├── task_router.py      # Task router
│   └── model_registry.py   # Registry loader
├── registry/models.yaml    # YAML registry of model metadata
├── tests/                  # Unit tests
├── Dockerfile
├── requirements.txt
├── README.md
└── .env / .gitignore


🤝 Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you’d like to change.

📄 License
MIT

✨ Author
Built by Sriram Kumar Reddy Challa