bryankthompson/mcp_vulnerable_testbed
If you are the rightful owner of mcp_vulnerable_testbed and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Vulnerable Testbed is a server designed for testing the security assessment capabilities of the MCP Inspector tool, containing intentional security vulnerabilities.
MCP Vulnerable Testbed
⚠️ CRITICAL WARNING ⚠️
THIS SERVER CONTAINS INTENTIONAL SECURITY VULNERABILITIES
- Purpose: Testing MCP Inspector security assessment tool ONLY
- DO NOT use in production
- DO NOT expose to untrusted networks
- DO NOT connect to real data sources
- DO NOT deploy outside of localhost testing environments
Purpose
This MCP server is designed to test the security assessment capabilities of the MCP Inspector tool. It contains deliberately vulnerable tools that exhibit real security vulnerabilities to validate detection accuracy.
Dual Setup Architecture:
- Vulnerable Server (
mcp-vulnerable-testbed): Intentionally broken for baseline testing - Hardened Server (
mcp-hardened-testbed): For testing Inspector-guided fixes - Both servers run side-by-side for direct comparison
Tool Categories
HIGH Risk Vulnerable Tools (32 tools)
These tools actually execute malicious payloads including command injection, authentication bypass, session vulnerabilities, and cryptographic failures:
- Core Execution Vulnerabilities (9 tools): calculator, system_exec, data_leak, tool_override, config_modifier, fetcher, deserializer, template, file_reader
- Auth & State Vulnerabilities (4 tools): auth_bypass, admin_action, chain_executor, session
- OWASP/DVMCP Patterns (7 tools): document_processor, service_status, network_diagnostic, safe_executor (blacklist bypass), crypto_tool, encryption_tool, plus AUP violations
- AUP Violations (8 tools): political_campaign, fraud, harassment, privacy_violation, medical_advice, drm_bypass, hiring_bot, scada
- Challenge #14-22 Vulnerable (4+ tools): weather, directory_lookup, summarizer, malicious_calculate, cron, script_generator, auth_response, sse_reconnect, content_processor, excessive_permissions, scope_escalation
See docs/TOOLS-REFERENCE.md for detailed vulnerability breakdown per tool.
MEDIUM Risk Vulnerable Tools (10 tools)
- Encoding & Parsing (3 tools): unicode_processor, nested_parser, package_installer
- Temporal Vulnerabilities (1 tool): rug_pull (requires 11+ invocations to trigger)
- Content Type Vulnerabilities (1 tool): content_processor (Challenge #20)
- Additional Patterns (5 tools): Various encoding and blacklist bypass techniques
SAFE Control Tools (15 tools)
These tools should NOT be flagged as vulnerable (false positive testing):
- Original 9 tools: storage, search, list, info, echo, validate, logger, json_formatter, url_validator
- Challenge #14 Safe Resources (2 tools): safe_resource_access_tool, safe_resource_validator_tool
- Challenge #15-18 Safe (4 tools): safe_math_tool, safe_path_validator_tool, safe_command_help_tool, safe_network_check_tool
Utility Tools (2 tools)
- get_testbed_info - Returns server metadata, configuration, and tool counts
- reset_testbed_state - Clears all stateful tracking for clean test runs
Total: 59 tools (32 HIGH risk + 10 MEDIUM risk + 15 SAFE + 2 utility) + 12 resources
Security Testing Challenges
This testbed includes 23 advanced security testing challenges that evaluate the sophistication of security assessment tools:
Challenge #1: Tool Annotation Deception
Can your security tool detect when a tool's behavior contradicts its own description?
Five HIGH-risk tools intentionally use deceptive MCP annotations that misrepresent their behavior:
| Tool | Deceptive Annotation | Actual Behavior |
|---|---|---|
| vulnerable_calculator_tool | readOnlyHint=True, title="Safe Calculator" | Uses eval() to execute code |
| vulnerable_system_exec_tool | readOnlyHint=True, title="Safe Command Runner" | Uses subprocess.run() |
| vulnerable_data_leak_tool | readOnlyHint=True, title="Safe Data Query" | Exposes credentials |
| vulnerable_config_modifier_tool | readOnlyHint=True, title="Safe Config Viewer" | Modifies runtime config |
| vulnerable_fetcher_tool | openWorldHint=False, title="Safe URL Validator" | Performs SSRF attacks |
A sophisticated security auditor should flag the mismatch between annotations and actual code behavior.
Challenge #2: Temporal Rug Pull Vulnerability
Can your tool detect vulnerabilities that only manifest after multiple invocations?
The vulnerable_rug_pull_tool tests stateful security analysis:
- Invocations 1-10: Behaves normally, returns safe responses
- Invocation 11+: Executes system commands via
subprocess.run()
A simple, stateless scanner would miss this vulnerability entirely. Detection requires:
- Tracking invocation state across multiple calls
- Testing tools with 11+ invocations
- Analyzing behavioral changes over time
Challenge #3: Denial of Service via Unbounded Input
Can your tool identify missing input validation that enables resource exhaustion?
The testbed implements a deliberate distinction between safe and vulnerable tools:
| Tool Type | Input Validation | Behavior |
|---|---|---|
Safe tools (safe_*) | ✅ 10KB limit enforced | Rejects oversized input with error |
Vulnerable tools (vulnerable_*) | ❌ No validation | Processes any size input |
Test case:
# Safe tool rejects 15KB payload
safe_storage_tool(data="A" * 15000)
# → "Input exceeds maximum length of 10000 characters"
# Vulnerable tool processes 15KB payload (DoS risk)
vulnerable_calculator_tool(query="A" * 15000)
# → Processes without rejection
A security auditor should detect:
- Missing
MAX_INPUT_LENGTHvalidation in vulnerable tools - Potential for memory exhaustion attacks
- Asymmetric protection between tool categories
Challenge #4-#13: Advanced MCP-Specific Attacks
Challenges #4-#13 test MCP-specific vulnerabilities:
- Challenge #4: Fail-Open Authentication (CVE-2025-52882) - Authentication failures grant access instead of denying
- Challenge #5: Mixed Auth Patterns - Distinguishing fail-open vs fail-closed implementations
- Challenge #6: Chained Exploitation - Multi-tool attack chains with output injection and state poisoning
- Challenge #7: Cross-Tool State-Based Authorization - Privilege escalation via shared configuration state
- Challenge #8: Indirect Prompt Injection via Tool Output - Unsanitized content in tool responses
- Challenge #9: Secret Leakage via Error Messages - Credentials exposed in verbose error handling
- Challenge #10: Network Diagnostic Command Injection - shell=True with unsanitized input
- Challenge #11: Weak Blacklist Bypass - Incomplete security controls (blacklist anti-pattern)
- Challenge #12: Session Management Vulnerabilities - Session fixation, predictable tokens, no timeout
- Challenge #13: Cryptographic Failures (OWASP A02:2021) - Weak hashing, ECB mode, hardcoded keys
Challenge #14-#20: Advanced Resource-Based and Persistence Attacks
- Challenge #14: Resource-Based Vulnerabilities - MCP resources with injection points (notes://{user_id}, internal://secrets, company://data/{department})
- Challenge #15: Tool Description Poisoning - Hidden instructions embedded in tool descriptions (weather, directory_lookup, summarizer)
- Challenge #16: Multi-Server Shadowing - Tool name collision attacks (trusted_calculate_tool vs malicious_calculate_tool)
- Challenge #17: Persistence Mechanisms - Post-exploitation persistence (cron_tool, script_generator_tool)
- Challenge #18: JWT Token Leakage - Authentication token exposure in responses (auth_response_tool)
- Challenge #19: SSE Session Desync Attack - Predictable event IDs, no validation, session scope bypass (sse_reconnect_tool)
- Challenge #20: Content Type Confusion Attack - MIME type mismatch, polyglot attacks, magic byte bypass (content_processor_tool)
See CLAUDE.md for complete challenge specifications and test implementations in tests/.
Installation
cd /home/bryan/mcp-servers/mcp-vulnerable-testbed
docker-compose up -d --build
This starts both servers:
- Vulnerable:
http://localhost:10900/mcp - Hardened:
http://localhost:10901/mcp
Usage
HTTP Transport (Default)
Both servers run with HTTP transport by default for easy Inspector integration.
Connection URLs:
- Vulnerable Server:
http://localhost:10900/mcp - Hardened Server:
http://localhost:10901/mcp
Test connectivity:
./test-http-endpoint.sh
MCP Inspector HTTP Config:
{
"mcpServers": {
"vulnerable-testbed": {
"url": "http://localhost:10900/mcp",
"transport": "http"
},
"hardened-testbed": {
"url": "http://localhost:10901/mcp",
"transport": "http"
}
}
}
stdio Transport (Alternative)
To use stdio transport instead of HTTP:
- Edit
docker-compose.ymland setTRANSPORT=stdiofor both services - Restart containers:
docker-compose restart - Use stdio connection:
{
"mcpServers": {
"vulnerable-testbed": {
"command": "docker",
"args": [
"exec",
"-i",
"mcp-vulnerable-testbed",
"python3",
"src/server.py"
]
},
"hardened-testbed": {
"command": "docker",
"args": [
"exec",
"-i",
"mcp-hardened-testbed",
"python3",
"src/server.py"
]
}
}
}
Note: Use python3 src/server.py directly, NOT python3 -m mcp run src/server.py
MCP Inspector Testing Workflow
- Start both containers:
docker-compose up -d - Run Inspector on vulnerable server (
http://localhost:10900/mcp) - Review vulnerability findings and recommended fixes
- Apply fixes to hardened server (
./src-hardened/) - Rebuild:
docker-compose up -d --build - Run Inspector on hardened server (
http://localhost:10901/mcp) - Compare results to validate fixes
MCP Inspector Assessment Results
Latest Results (December 2024)
| Server | Vulnerabilities | Risk Level | Status |
|---|---|---|---|
| Vulnerable (10900) | 125+ | HIGH | ❌ FAIL |
| Hardened (10901) | 0 | LOW | ✅ PASS |
Key Metrics:
- Total tools per server: 59 (32 HIGH, 10 MEDIUM, 15 SAFE, 2 utility) + 12 resources
- Detection rate: 100% (all 42 vulnerable tools detected)
- False positive rate: 0% (all 15 safe tools correctly classified)
- Pytest validation: 873+ total tests across 29 test files (25 resource-based injection, 41 tool description poisoning, 40 multi-server shadowing, 41 persistence mechanisms, 35 JWT token leakage, 28 SSE session desync, 28 content type confusion, 25 excessive permissions, 20 Challenge #22 fixes, 6 type safety, plus additional coverage)
See docs/VULNERABILITY-VALIDATION-RESULTS.md for detailed breakdown.
Expected Assessment Results
Expected Detections (100% Recall)
The inspector SHOULD flag these 42 tools as vulnerable:
HIGH Risk (32 tools):
- Core execution (9): calculator, system_exec, data_leak, tool_override, config_modifier, fetcher, deserializer, template, file_reader
- Auth/state (4): auth_bypass, admin_action, chain_executor, session
- DVMCP/OWASP (7): document_processor, service_status, network_diagnostic, crypto_tool, encryption_tool, safe_executor, plus AUP base patterns
- AUP violations (8): political_campaign, fraud, harassment, privacy_violation, medical_advice, drm_bypass, hiring_bot, scada
- Challenge #14-22 (4+): weather, directory_lookup, summarizer, malicious_calculate, cron, script_generator, auth_response, sse_reconnect, content_processor, excessive_permissions, scope_escalation
MEDIUM Risk (10 tools):
- Encoding/parsing (3): unicode_processor, nested_parser, package_installer
- Temporal (1): rug_pull (requires 11+ invocations)
- Content type (1): content_processor (Challenge #20)
- Additional patterns (5): Various encoding and bypass techniques
Expected Safe Classifications (0% False Positives)
The inspector should NOT flag these 15 tools:
- ✅ safe_storage_tool_mcp, safe_search_tool_mcp, safe_list_tool_mcp, safe_info_tool_mcp, safe_echo_tool_mcp
- ✅ safe_validate_tool_mcp, safe_logger_tool_mcp, safe_json_formatter_tool_mcp, safe_url_validator_tool_mcp
- ✅ safe_math_tool, safe_path_validator_tool, safe_command_help_tool, safe_network_check_tool
- ✅ safe_resource_access_tool, safe_resource_validator_tool
Testing Strategy
Phase 1: Baseline Testing
# Connect inspector to vulnerable testbed
# Run full assessment
# Verify all 42 tools are tested
Phase 2: Validation
- HIGH risk tools: 32 should be flagged
- MEDIUM risk tools: 10 should be flagged
- SAFE tools: 15 should NOT be flagged
- Resources: 5 should be tested for injection points
- Target: 100% detection (42/42), 0% false positives (0/15)
Phase 3: Advanced Challenges
- Challenges #1-#3: Annotation deception, temporal rug pull, DoS via unbounded input
- Challenges #4-#7: Auth bypass, chained exploitation, cross-tool state
- Challenges #8-#13: Indirect injection, secret leakage, network injection, blacklist bypass, session management, cryptographic failures
- Challenges #14-#24: Resource-based injection, tool description poisoning, multi-server shadowing, persistence mechanisms, JWT token leakage, SSE session desync, content type confusion, excessive permissions scope, multi-parameter template injection, binary resource attacks
Configuration
Transport Mode
Set in docker-compose.yml:
environment:
- TRANSPORT=http # HTTP transport (default)
# - TRANSPORT=stdio # Alternative: stdio transport
- HOST=0.0.0.0 # Required for Docker HTTP
- LOG_LEVEL=info
Vulnerability Modes
Control vulnerability behavior per container:
# Vulnerable server (default)
environment:
- VULNERABILITY_MODE=high # All vulnerabilities active
# Hardened server (default)
environment:
- VULNERABILITY_MODE=safe # All vulnerabilities disabled
Available modes:
high: All vulnerabilities active (default for vulnerable server)medium: Only MEDIUM and LOW risk activelow: Only LOW risk activesafe: All vulnerabilities disabled (default for hardened server)
Logs
Container logs:
- Vulnerable:
./logs/vulnerable-testbed.log - Hardened:
./logs-hardened/vulnerable-testbed.log
Monitor vulnerabilities triggered:
# Vulnerable server
tail -f logs/vulnerable-testbed.log | grep "VULNERABILITY TRIGGERED"
# Hardened server
tail -f logs-hardened/vulnerable-testbed.log | grep "VULNERABILITY TRIGGERED"
# Both servers via Docker logs
docker logs -f mcp-vulnerable-testbed 2>&1 | grep "VULNERABILITY TRIGGERED"
docker logs -f mcp-hardened-testbed 2>&1 | grep "VULNERABILITY TRIGGERED"
Safety Measures
- Isolated Docker container with resource limits
- No real credentials - all secrets are fake
- Localhost only - not exposed to external networks
- Clear warnings on container startup
- Limited command execution - dangerous commands truncated
Testing the Inspector
# 1. Start the testbed
docker-compose up -d
# 2. Connect MCP Inspector
cd ~/inspector
npm run dev
# 3. Configure connection to vulnerable-testbed
# 4. Run security assessment
# 5. Review results:
# - Verify 31 vulnerabilities detected (22 HIGH + 9 MEDIUM)
# - Verify 9 safe tools not flagged (0% false positives)
# - Test Challenge #1: Annotation deception (5 tools)
# - Test Challenge #2: Rug pull after 11+ calls
# 6. Document findings
Continuous Integration
This repository includes AI-powered code review via GitHub Actions (.github/workflows/code-review.yml):
- Automatically reviews all pull requests using Claude Sonnet 4
- Detects security vulnerabilities specific to MCP testbed patterns
- Posts findings as PR comments with P0-P3 severity levels
- Requires
ANTHROPIC_API_KEYin repository secrets
Cleanup
# Stop and remove containers
docker-compose down
# Remove images
docker rmi mcp-vulnerable-testbed-vulnerable-testbed
docker rmi mcp-vulnerable-testbed-hardened-testbed
# Clean up logs
rm -rf logs/ logs-hardened/
Security Note
This server is designed to help improve security tooling by providing realistic test cases. It should only be run in controlled, isolated testing environments. All vulnerabilities are intentional and documented.
License
FOR TESTING PURPOSES ONLY - Not for production use
Contact
Built for testing the MCP Inspector assessment module at Anthropic