ParthaPRay/Ollama_MCP_Gradio
If you are the rightful owner of Ollama_MCP_Gradio and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
This project demonstrates a privacy-aware, locally hosted LLM agent using Ollama, MCP, and Gradio.
add_data
Insert any SQL row into the database.
read_data
Run SQL SELECT queries and return results.
đ§ Gradio + Ollama + MCP: Privacy-Aware Local LLM Agent Demo
Overview
This project demonstrates how to build a privacy-aware, locally hosted LLM agent that uses Ollama (for running LLMs on your hardware), the Model Context Protocol (MCP) for safe tool calling, and Gradio for a conversational web UIâall powered by a local SQLite database and exposed as both an agent and an MCP server.
Key Concepts
-
Model Context Protocol (MCP):
- An open protocol that lets LLMs âcallâ local or remote tools as APIsâstandardizing how tools (DBs, functions, search, etc.) are plugged into LLM workflows.
- MCP allows privacy-respecting, auditable tool use, as all function calls and data access can be monitored locally.
-
Localized Ollama LLM:
- Ollama lets you run state-of-the-art LLMs (like Granite, Llama 3, Qwen, etc.) entirely on your machineâno data leaves your computer.
- This project uses the Granite 3.1 MoE model for local inference.
-
Privacy-Aware Agent:
- Your queries and data never leave your device.
- All tools and database operations are locally executed.
- The architecture is compatible with edge devices and self-hosted deployment.
Architecture
server.py
â launches a FastMCP MCP server exposing database tools over HTTP.client.py
â runs a Gradio chat UI and connects a local LLM (via Ollama) to the MCP tools for tool-augmented responses.
High-level Flow
- User interacts via Gradio UI (
client.py
). - Agent uses Ollama LLM + MCP client to invoke tools (e.g., read/write SQLite).
- MCP server (
server.py
) exposes the tool API (add_data/read_data) and executes SQL on your local DB. - All logic and data remain private and local.
đĽď¸ System Configuration
Operating System
- Ubuntu 24.04 LTS
- Kernel:
6.11.0-25-generic
- Architecture:
x86_64
(64-bit)
Processor (CPU)
- Model: 13th Gen IntelŽ Core⢠i9-13950HX
- Cores: 24 cores / 32 threads
- Max Frequency: 5.50 GHz
- Virtualization: VT-x supported
- L1 Cache: 896 KiB (Data), 1.3 MiB (Instruction)
- L2 Cache: 32 MiB
- L3 Cache: 36 MiB
Graphics (GPU)
- NVIDIA RTX 5000 Ada Generation
- VRAM: 16 GB
- Driver Version: 550.144.03
- CUDA Version: 12.4
Python Environment
- Python Version: 3.11.9
- Virtual Environment:
python -m venv final1
Misc
- Virtualization Capabilities: Enabled (VT-x)
- NUMA Nodes: 1 (all CPUs in node0: 0â31)
server.py â MCP Server for SQLite
Purpose:
Expose SQLite as a set of tools (add_data
, read_data
) via MCP so any MCP-compatible LLM agent can safely query/update the database.
Highlights:
-
Uses FastMCP for quick MCP server setup.
-
Initializes SQLite, creates two tables:
people
andinteractions
. -
Exposes two tools:
add_data(query)
: Insert any SQL row (for demo purposes; could be restricted for production).read_data(query)
: Run SQL SELECT queries and return results.
-
Designed for local usage; easy to swap DBs or add more tools.
Code Summary:
import sqlite3
from fastmcp import FastMCP
# Create and configure MCP server
mcp = FastMCP(name="SQLiteMCPServer", port=8000, transport="streamable-http", ...)
# Setup SQLite
...
# Tool: Insert SQL record
@mcp.tool(name="add_data", ...)
def add_data(query: str) -> bool:
...
# Tool: Query records
@mcp.tool(name="read_data", ...)
def read_data(query: str = "SELECT * FROM people") -> list:
...
# Start server
if __name__ == "__main__":
mcp.run(transport="streamable-http", host="127.0.0.1", port=8000)
client.py â Gradio Chatbot with MCP-Aware Ollama Agent
Purpose: A Gradio chatbot interface powered by a local LLM (via Ollama) that can autonomously call MCP-exposed database tools (add/read) as part of its workflow.
Highlights:
- Ollama LLM: Runs
granite3.1-moe
locallyâno data sent to external servers. - MCP Client: Connects to the MCP server at
http://127.0.0.1:8000/mcp
and loads available tools dynamically. - FunctionAgent: An LLM agent (via
llama_index
) that can use both language reasoning and tool-calling to fulfill queries. - Gradio UI: Simple chat interface + recent interactions display.
- Full local logging: Each user-agent chat and tool call is logged to SQLite for auditability and privacy.
Code Structure:
# Import LLM (Ollama), MCP client, Gradio, etc.
from llama_index.llms.ollama import Ollama
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from gradio.queueing import Queue
...
# Set up Ollama LLM
llm = Ollama(model="granite3.1-moe", ...)
# Connect to MCP server, get tool specs
mcp_client = BasicMCPClient("http://127.0.0.1:8000/mcp")
mcp_spec = McpToolSpec(client=mcp_client)
# Initialize FunctionAgent with loaded tools
agent = FunctionAgent(...)
# Gradio UI: Chatbot, input, buttons, history display
with gr.Blocks(...):
...
# Message handling: Sends chat to agent, which may call tools, logs all activity
def handle_message(...):
...
How to Run Locally
The entire application is designed to run inside a dedicated Python virtual environment (mcpollama
).
Assumptions:
- Ollama is already installed (using
curl -fsSL https://ollama.com/install.sh | sh
). - The model
granite3.1-moe
is already pulled (ollama pull granite3.1-moe
).
Step-by-Step Instructions
-
Create and activate a virtual environment:
python3 -m venv final1 source final1/bin/activate
-
Install Python requirements:
pip install -r requirements.txt
-
Start the MCP server (in terminal 1):
python server.py
-
Start Ollama (in a separate terminal, if not already running):
ollama serve
-
Launch the Gradio chat UI (in terminal 2):
python client.py
- Open the Gradio web interface Navigate to the link shown in your terminal (typically http://127.0.0.1:7860).
- Check sqlitebrowser demo.db
Tip:
Keep your mcpollama
virtual environment activated whenever running these scripts to avoid conflicts with system Python packages.
If you need to (re)pull the Ollama model:
ollama pull granite3.1-moe
Privacy & Security Notes
- Everything runs locally (code, model, and data): No cloud inference, no remote DBs unless you configure it!
- All tool calls are routed via the MCP server, making tool invocation explicit and monitorable.
- No user data is sent externally unless you specifically write a tool that does so.
MCP + Ollama + Gradio: Whatâs Unique?
-
Local LLM Reasoning: The agent is truly privateâyour prompts, data, and results are never seen by any third party.
-
Composable Tool Use: You can add more tools (APIs, custom Python functions, etc.) as MCP endpoints and the agent will auto-discover them.
-
Reproducible for hackathons, research, and teaching Easily demo local LLM agent autonomy and privacy with minimal setup.
Example Use Cases
-
Query people in the database: âWho are all doctors over 30?â
-
Add a new person: âAdd a person named Akash, age 35, profession scientist.â
-
View tool call traces and timing for debugging and research.
License
Apache-2.0
Credits
- Partha Pratim Ray (2025 Gradio Agents & MCP Hackathon)
- Ollama, Gradio, FastMCP, LlamaIndex