README - build-output-tools-mcp by jgordley

🛠️ Build Output Tools MCP

An MCP server that executes build/test commands and routes the output to smaller LLMs to analyze the outputs and provide concise, actionable summaries while storing full outputs for detailed inspection when needed.

Blog post: https://gordles.io/blog/llm-friendly-test-suite-outputs-pytest-llm

Why This Server?

The primary goal of this tool is to save context in the main coding agent thread by reducing the amount of tokens processed when executing builds or tests.

Instead of flooding Claude's context with thousands of lines of build output, this server:

Executes any build/test command - npm test, pytest, docker build, cargo test, etc.
Provides intelligent summaries - LLM analyzes output and gives the actual meat of what the main thread needs to do
Stores full outputs - Access complete logs when you need detailed analysis
Maintains history - Track builds over time with unique IDs

Quick Navigation

Getting Started - Setup in 5 minutes
Available Tools - Overview of all MCP tools
Usage Examples - Real-world scenarios
Configuration - API keys and settings

Features

Secure Provider-Based Execution

# Safe, provider-based commands only
run_build("/my-app", "npm", ["run", "test"])
run_build("/my-api", "pytest", ["--cov=src", "tests/"])
run_build("/my-container", "docker", ["build", "-t", "myapp", "."])
run_build("/my-python", "unittest", ["discover", "-s", "tests"])

Supported Test/Build Frameworks

Provider	Description	Common Flags	Example Usage
pytest	Python testing framework	`--cov=src`, `--verbose`, `-x`, `--tb=short`	`run_build("/app", "pytest", ["--cov=src", "tests/"])`
unittest	Python built-in testing	`discover`, `-s`, `-p`, `-v`	`run_build("/app", "unittest", ["discover", "-s", "tests"])`
npm	Node.js package manager	`run`, `test`, `install`, `--coverage`	`run_build("/app", "npm", ["run", "test", "--", "--coverage"])`
docker	Container platform	`build`, `run`, `-t`, `--no-cache`	`run_build("/app", "docker", ["build", "-t", "myapp", "."])`

Intelligent Analysis

Smart summaries from OpenRouter LLMs (Mistral, Gemini, etc.)
Error highlighting - Focuses on actionable failures
Success metrics - Extract test counts, coverage, performance data
Configurable models - Choose the right LLM for your analysis needs

Output Storage & Retrieval

Automatic storage - Every build gets a unique ID
Full text access - Retrieve complete stdout/stderr when needed
Build history - Track builds over time
Auto-cleanup - Manage disk space automatically

Workflow Integration

Perfect for development workflows:

Run builds - Get instant intelligent summaries
Debug failures - Access full logs for detailed analysis
Track progress - Monitor builds over time
Share results - Build IDs make collaboration easy

Getting Started

Prerequisites

Python 3.9+
OpenRouter API key
Claude Code CLI (recommended) or other MCP compatible Agent CLI

1. Quick Setup

# Clone and setup
git clone https://github.com/your-username/build-output-tools-mcp.git
cd build-output-tools-mcp

# One-command setup
./run_server.sh

The setup script will:

Create Python virtual environment
Install dependencies
Setup .env file
Optionally add to Claude Code

2. Add API Key

Edit .env file with your OpenRouter API key:

OPENROUTER_API_KEY=your_openrouter_api_key_here
DEFAULT_MODEL=mistralai/mistral-small-3.2-24b-instruct-2506:free

3. Start Using!

In Claude Code:

# List available providers
list_providers()

# Run tests with specific provider
run_build(
  project_path="/path/to/my-app",
  provider="npm",
  flags=["run", "test"]
)

Available Tools

`run_build` - Execute & Analyze Commands

Execute build/test commands using supported providers and get intelligent analysis.

Parameters:

project_path (string) - Directory to run command in
provider (string) - Build provider: "pytest", "unittest", "npm", or "docker"
flags (optional list) - List of flags/arguments to pass to the provider
timeout (optional int) - Command timeout in seconds (default: 600)
model (optional string) - LLM model for analysis

Returns:

Summarized build or test results from a smaller LLM
Build ID for retrieving full output
Exit code and basic metrics

Example:

run_build(
  project_path="/my-react-app",
  provider="npm",
  flags=["run", "test", "--", "--coverage"],
  timeout=300,
  model="openai/gpt-4o-mini"
)

`get_build_output` - Retrieve Full Output

Get complete stdout/stderr from any previous build.

Parameters:

build_id (string) - Build ID from run_build result

Returns:

Complete stdout and stderr text
Command details and metadata
Execution timestamps

Example:

# First run a build
result = run_build("/my-app", "npm", ["test"])
build_id = json.loads(result)["build_id"]

# Later, get full output for detailed analysis
full_output = get_build_output(build_id)

`list_build_history` - Browse Past Builds

List recent builds with their IDs and summaries.

Parameters:

limit (optional int) - Maximum builds to return (default: 10)

Returns:

List of recent builds with metadata
Build IDs for retrieving full outputs

`list_providers` - Supported Providers

Show supported build/test providers and example usage.

Returns:

List of supported providers
Example flags for each provider
Usage guidance

`list_models` - Available AI Models

Show available LLM models for analysis.

Returns:

List of common OpenRouter models
Default model configuration

`cleanup_old_builds` - Manage Storage

Clean up old build outputs to save disk space.

Parameters:

max_age_days (optional int) - Maximum age in days (default: 7)

Usage Examples

Basic Build Analysis

# Run tests and get summary
result = run_build("/my-app", "npm", ["run", "test"])

# Output:
{
  "status": "failed",
  "summary": "Tests failed: 2 of 15 tests failed in UserAuth module. TypeError in login validation - expected string but received undefined.",
  "build_id": "1704123456_1234",
  "exit_code": 1
}

Detailed Error Investigation

# Get full output for debugging
full_output = get_build_output("1704123456_1234")

# Access complete logs
stdout = json.loads(full_output)["stdout"]
stderr = json.loads(full_output)["stderr"]

Docker Build Analysis

# Analyze Docker builds
run_build(
  project_path="/my-container-app",
  provider="docker",
  flags=["build", "-t", "myapp:latest", "."]
)

# Output might be:
{
  "status": "success", 
  "summary": "Docker build completed successfully. Image size: 1.2GB. Build time: 3m 45s. All layers cached except final application layer.",
  "build_id": "1704123789_5678"
}

Python Testing with Coverage

# Run Python tests with coverage
run_build(
  project_path="/my-python-api",
  provider="pytest",
  flags=["--cov=src", "--cov-report=term-missing", "tests/"],
  model="anthropic/claude-3-haiku"
)

# Run unittest discovery
run_build(
  project_path="/my-python-api",
  provider="unittest",
  flags=["discover", "-s", "tests", "-p", "test_*.py"]
)

Build History Tracking

# Check recent builds
history = list_build_history(5)

Configuration

Environment Variables

Configure in .env file:

# OpenRouter API Configuration (Required)
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

# Default model for analysis
DEFAULT_MODEL=mistralai/mistral-small-3.2-24b-instruct-2506:free

# Command timeout (seconds)
DEFAULT_TIMEOUT=600

Model Selection

Popular OpenRouter models for build analysis:

Fast & Free:

mistralai/mistral-small-3.2-24b-instruct-2506:free
google/gemini-flash-1.5-8b:free
deepseek/deepseek-r1-0528:free
qwen/qwen3-32b:free
google/gemini-2.0-flash-exp:free
mistralai/mistral-nemo:free

Balanced:

anthropic/claude-3-haiku
openai/gpt-4o-mini

High Quality:

anthropic/claude-3-sonnet
openai/gpt-4o

Storage Configuration

Build outputs are stored in build_outputs/ directory:

JSON files with full command output
Index file for quick lookups
Automatic cleanup after 7 days (configurable)

Claude Code Integration

Automatic Setup

The run_server.sh script can automatically add the server to Claude Code:

./run_server.sh
# Choose 'Y' when prompted to add to Claude Code

Manual Setup

claude mcp add build-output-tools -s user -- /path/to/.venv/bin/python /path/to/src/build_output_tools_mcp/server.py

Server Management

# Stop the MCP server connection
./scripts/stop_server.sh

# Completely remove from Claude Code
./scripts/uninstall_server.sh

# Reinstall after removal
./run_server.sh

Claude Desktop Setup

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "build-output-tools": {
      "command": "/path/to/.venv/bin/python",
      "args": ["/path/to/src/build_output_tools_mcp/server.py"]
    }
  }
}

Troubleshooting

Common Issues

MCP Server Connection Issues:

# Stop and restart the server
./scripts/stop_server.sh
./scripts/uninstall_server.sh
./run_server.sh

API Key Not Working:

# Check your .env file
cat .env | grep OPENROUTER_API_KEY

# Verify API key at https://openrouter.ai/

Command Timeouts:

# Increase timeout for long builds
run_build("/my-app", "npm", ["run", "build"], timeout=1200)

Storage Full:

# Clean up old builds
cleanup_old_builds(max_age_days=3)

Testing the Server

# Run test suite
python -m pytest tests/ -v

# Test with sample providers
run_build("/tmp", "npm", ["--version"])
run_build("/tmp", "pytest", ["--help"])

Advanced Usage

Provider Discovery & Validation

# Discover available providers
providers = list_providers()
print(json.loads(providers)["supported_providers"])  # ["pytest", "unittest", "npm", "docker"]

# Invalid provider handling
result = run_build("/my-app", "invalid_provider", ["test"])
# Returns: {"status": "error", "error": "Unsupported provider: invalid_provider..."}

Custom Models

# Use specific models for different analysis needs
run_build("/my-app", "npm", ["test"], model="anthropic/claude-3-sonnet")  # Deep analysis
run_build("/my-app", "npm", ["run", "lint"], model="google/gemini-flash-1.5")  # Quick checks

Error Pattern Detection

# Get history and analyze patterns
history = list_build_history(50)

# Use build IDs to analyze common failure patterns
failing_builds = [b for b in history if not b["success"]]

Contributing

We welcome contributions! Some potential areas for improvement:

Direct API integration with providers - Removing the hard dependency on OpenRouter and adding API access for other model providers like Anthropic or OpenAI
Safety checks for running test commands - The current method of using subprocess with a guaranteed start to the command like pytest or docker is a good start, but there could be potential still for malicious activity workarounds.
Framework-specific parsers - Looking for contributors to add support for more testing frameworks!
Build comparison - Diff analysis between builds

License

MIT License - see LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: This README
API Reference: See tool descriptions above

jgordley/build-output-tools-mcp

🛠️ Build Output Tools MCP

Why This Server?

Quick Navigation

Features

Secure Provider-Based Execution

Supported Test/Build Frameworks

Intelligent Analysis

Output Storage & Retrieval

Workflow Integration

Getting Started

Prerequisites

1. Quick Setup

2. Add API Key

3. Start Using!

Available Tools

run_build - Execute & Analyze Commands

get_build_output - Retrieve Full Output

list_build_history - Browse Past Builds

list_providers - Supported Providers

list_models - Available AI Models

cleanup_old_builds - Manage Storage

Usage Examples

Basic Build Analysis

Detailed Error Investigation

Docker Build Analysis

Python Testing with Coverage

Build History Tracking

Configuration

Environment Variables

Model Selection

Storage Configuration

Claude Code Integration

Automatic Setup

Manual Setup

Server Management

Claude Desktop Setup

Troubleshooting

Common Issues

Testing the Server

Advanced Usage

Provider Discovery & Validation

Custom Models

Error Pattern Detection

Contributing

License

Support

`run_build` - Execute & Analyze Commands

`get_build_output` - Retrieve Full Output

`list_build_history` - Browse Past Builds

`list_providers` - Supported Providers

`list_models` - Available AI Models

`cleanup_old_builds` - Manage Storage