llm-vulnerability-mcp by arjun-krishna1 - MCP Server

LLM Vulnerability Scanner

An open-source MCP server that lets agents or humans trigger garak scans against any LLM endpoint and receive a concise vulnerability report (hallucination, prompt-injection, data-leak, toxicity, etc.).

TODO

MVP

Create repo
Run MCP hello world
Run garak hello world
Add dependeincies mcp[cli], garak
Wrap garak CLI in async function
Parse garak output to a JSON summary
Do end to end demo on Cursor
Prepare demo presentation with two slives
Installation guide (uvx --from git+https://github.com/arjun-krishna1/llm-vulnerability-mcp) and JSON

Nice to haves

Custom probe selection
Streaming progress updates
Batc comparison

MVP

One MCP tool called scan_model that takes model_type, model_name, and (optionally) api_key, runs garak with its default probe set, and returns a JSON summary plus the original garak.log as an MCP resource.

Usage

The MCP server exposes a scan_model tool that can be used to scan LLM models for vulnerabilities.

Example usage in Claude:

Use the scan_model tool to test the model at https://openrouter.ai/models/mistralai/mistral-7b-instruct

The tool will:

Parse OpenRouter URLs automatically
Use the OPENROUTER_API_KEY from environment if not provided
Return a JSON summary of vulnerabilities found

Parameters:

model_type: Type of model (e.g., 'openai', 'openrouter', 'huggingface')
model_name: Model name or OpenRouter URL
api_key: Optional API key (uses OPENROUTER_API_KEY env var if not provided)

Nice-to-haves

choose probes interactively
live progress streaming
batch compare models
downloadable HTML dashboard
Slack / Discord webhook alerts
GitHub Action wrapper
OWASP-style severity score
persistent scan history (SQLite).