arjun-krishna1/llm-vulnerability-mcp
If you are the rightful owner of llm-vulnerability-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The LLM Vulnerability Scanner is an open-source MCP server designed to facilitate the scanning of LLM endpoints for vulnerabilities such as hallucinations, prompt-injection, data leaks, and toxicity.
scan_model
A tool that scans LLM models for vulnerabilities using Garak and returns a JSON summary.
LLM Vulnerability Scanner
An open-source MCP server that lets agents or humans trigger garak scans against any LLM endpoint and receive a concise vulnerability report (hallucination, prompt-injection, data-leak, toxicity, etc.).
TODO
MVP
- Create repo
- Run MCP hello world
- Run garak hello world
- Add dependeincies mcp[cli], garak
- Wrap garak CLI in async function
- Parse garak output to a JSON summary
- Do end to end demo on Cursor
- Prepare demo presentation with two slives
- Installation guide (uvx --from git+https://github.com/arjun-krishna1/llm-vulnerability-mcp) and JSON
Nice to haves
- Custom probe selection
- Streaming progress updates
- Batc comparison
MVP
One MCP tool called scan_model that takes model_type, model_name, and (optionally) api_key, runs garak with its default probe set, and returns a JSON summary plus the original garak.log as an MCP resource.
Usage
The MCP server exposes a scan_model
tool that can be used to scan LLM models for vulnerabilities.
Example usage in Claude:
Use the scan_model tool to test the model at https://openrouter.ai/models/mistralai/mistral-7b-instruct
The tool will:
- Parse OpenRouter URLs automatically
- Use the OPENROUTER_API_KEY from environment if not provided
- Return a JSON summary of vulnerabilities found
Parameters:
model_type
: Type of model (e.g., 'openai', 'openrouter', 'huggingface')model_name
: Model name or OpenRouter URLapi_key
: Optional API key (uses OPENROUTER_API_KEY env var if not provided)
Nice-to-haves
- choose probes interactively
- live progress streaming
- batch compare models
- downloadable HTML dashboard
- Slack / Discord webhook alerts
- GitHub Action wrapper
- OWASP-style severity score
- persistent scan history (SQLite).