build-output-tools-mcp

jgordley/build-output-tools-mcp

3.3

If you are the rightful owner of build-output-tools-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Build Output Tools MCP server is designed to streamline build and test command execution, providing intelligent summaries and storing full outputs for detailed inspection.

🛠️ Build Output Tools MCP

An MCP server that executes build/test commands and routes the output to smaller LLMs to analyze the outputs and provide concise, actionable summaries while storing full outputs for detailed inspection when needed.

Blog post: https://gordles.io/blog/llm-friendly-test-suite-outputs-pytest-llm

Why This Server?

The primary goal of this tool is to save context in the main coding agent thread by reducing the amount of tokens processed when executing builds or tests.

Instead of flooding Claude's context with thousands of lines of build output, this server:

  • Executes any build/test command - npm test, pytest, docker build, cargo test, etc.
  • Provides intelligent summaries - LLM analyzes output and gives the actual meat of what the main thread needs to do
  • Stores full outputs - Access complete logs when you need detailed analysis
  • Maintains history - Track builds over time with unique IDs

Quick Navigation

Features

Secure Provider-Based Execution

# Safe, provider-based commands only
run_build("/my-app", "npm", ["run", "test"])
run_build("/my-api", "pytest", ["--cov=src", "tests/"])
run_build("/my-container", "docker", ["build", "-t", "myapp", "."])
run_build("/my-python", "unittest", ["discover", "-s", "tests"])

Supported Test/Build Frameworks

ProviderDescriptionCommon FlagsExample Usage
pytestPython testing framework--cov=src, --verbose, -x, --tb=shortrun_build("/app", "pytest", ["--cov=src", "tests/"])
unittestPython built-in testingdiscover, -s, -p, -vrun_build("/app", "unittest", ["discover", "-s", "tests"])
npmNode.js package managerrun, test, install, --coveragerun_build("/app", "npm", ["run", "test", "--", "--coverage"])
dockerContainer platformbuild, run, -t, --no-cacherun_build("/app", "docker", ["build", "-t", "myapp", "."])

Intelligent Analysis

  • Smart summaries from OpenRouter LLMs (Mistral, Gemini, etc.)
  • Error highlighting - Focuses on actionable failures
  • Success metrics - Extract test counts, coverage, performance data
  • Configurable models - Choose the right LLM for your analysis needs

Output Storage & Retrieval

  • Automatic storage - Every build gets a unique ID
  • Full text access - Retrieve complete stdout/stderr when needed
  • Build history - Track builds over time
  • Auto-cleanup - Manage disk space automatically

Workflow Integration

Perfect for development workflows:

  1. Run builds - Get instant intelligent summaries
  2. Debug failures - Access full logs for detailed analysis
  3. Track progress - Monitor builds over time
  4. Share results - Build IDs make collaboration easy

Getting Started

Prerequisites

  • Python 3.9+
  • OpenRouter API key
  • Claude Code CLI (recommended) or other MCP compatible Agent CLI

1. Quick Setup

# Clone and setup
git clone https://github.com/your-username/build-output-tools-mcp.git
cd build-output-tools-mcp

# One-command setup
./run_server.sh

The setup script will:

  • Create Python virtual environment
  • Install dependencies
  • Setup .env file
  • Optionally add to Claude Code

2. Add API Key

Edit .env file with your OpenRouter API key:

OPENROUTER_API_KEY=your_openrouter_api_key_here
DEFAULT_MODEL=mistralai/mistral-small-3.2-24b-instruct-2506:free

3. Start Using!

In Claude Code:

# List available providers
list_providers()

# Run tests with specific provider
run_build(
  project_path="/path/to/my-app",
  provider="npm",
  flags=["run", "test"]
)

Available Tools

run_build - Execute & Analyze Commands

Execute build/test commands using supported providers and get intelligent analysis.

Parameters:

  • project_path (string) - Directory to run command in
  • provider (string) - Build provider: "pytest", "unittest", "npm", or "docker"
  • flags (optional list) - List of flags/arguments to pass to the provider
  • timeout (optional int) - Command timeout in seconds (default: 600)
  • model (optional string) - LLM model for analysis

Returns:

  • Summarized build or test results from a smaller LLM
  • Build ID for retrieving full output
  • Exit code and basic metrics

Example:

run_build(
  project_path="/my-react-app",
  provider="npm",
  flags=["run", "test", "--", "--coverage"],
  timeout=300,
  model="openai/gpt-4o-mini"
)

get_build_output - Retrieve Full Output

Get complete stdout/stderr from any previous build.

Parameters:

  • build_id (string) - Build ID from run_build result

Returns:

  • Complete stdout and stderr text
  • Command details and metadata
  • Execution timestamps

Example:

# First run a build
result = run_build("/my-app", "npm", ["test"])
build_id = json.loads(result)["build_id"]

# Later, get full output for detailed analysis
full_output = get_build_output(build_id)

list_build_history - Browse Past Builds

List recent builds with their IDs and summaries.

Parameters:

  • limit (optional int) - Maximum builds to return (default: 10)

Returns:

  • List of recent builds with metadata
  • Build IDs for retrieving full outputs

list_providers - Supported Providers

Show supported build/test providers and example usage.

Returns:

  • List of supported providers
  • Example flags for each provider
  • Usage guidance

list_models - Available AI Models

Show available LLM models for analysis.

Returns:

  • List of common OpenRouter models
  • Default model configuration

cleanup_old_builds - Manage Storage

Clean up old build outputs to save disk space.

Parameters:

  • max_age_days (optional int) - Maximum age in days (default: 7)

Usage Examples

Basic Build Analysis

# Run tests and get summary
result = run_build("/my-app", "npm", ["run", "test"])

# Output:
{
  "status": "failed",
  "summary": "Tests failed: 2 of 15 tests failed in UserAuth module. TypeError in login validation - expected string but received undefined.",
  "build_id": "1704123456_1234",
  "exit_code": 1
}

Detailed Error Investigation

# Get full output for debugging
full_output = get_build_output("1704123456_1234")

# Access complete logs
stdout = json.loads(full_output)["stdout"]
stderr = json.loads(full_output)["stderr"]

Docker Build Analysis

# Analyze Docker builds
run_build(
  project_path="/my-container-app",
  provider="docker",
  flags=["build", "-t", "myapp:latest", "."]
)

# Output might be:
{
  "status": "success", 
  "summary": "Docker build completed successfully. Image size: 1.2GB. Build time: 3m 45s. All layers cached except final application layer.",
  "build_id": "1704123789_5678"
}

Python Testing with Coverage

# Run Python tests with coverage
run_build(
  project_path="/my-python-api",
  provider="pytest",
  flags=["--cov=src", "--cov-report=term-missing", "tests/"],
  model="anthropic/claude-3-haiku"
)

# Run unittest discovery
run_build(
  project_path="/my-python-api",
  provider="unittest",
  flags=["discover", "-s", "tests", "-p", "test_*.py"]
)

Build History Tracking

# Check recent builds
history = list_build_history(5)

Configuration

Environment Variables

Configure in .env file:

# OpenRouter API Configuration (Required)
OPENROUTER_API_KEY=your_openrouter_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1

# Default model for analysis
DEFAULT_MODEL=mistralai/mistral-small-3.2-24b-instruct-2506:free

# Command timeout (seconds)
DEFAULT_TIMEOUT=600

Model Selection

Popular OpenRouter models for build analysis:

Fast & Free:

  • mistralai/mistral-small-3.2-24b-instruct-2506:free
  • google/gemini-flash-1.5-8b:free
  • deepseek/deepseek-r1-0528:free
  • qwen/qwen3-32b:free
  • google/gemini-2.0-flash-exp:free
  • mistralai/mistral-nemo:free

Balanced:

  • anthropic/claude-3-haiku
  • openai/gpt-4o-mini

High Quality:

  • anthropic/claude-3-sonnet
  • openai/gpt-4o

Storage Configuration

Build outputs are stored in build_outputs/ directory:

  • JSON files with full command output
  • Index file for quick lookups
  • Automatic cleanup after 7 days (configurable)

Claude Code Integration

Automatic Setup

The run_server.sh script can automatically add the server to Claude Code:

./run_server.sh
# Choose 'Y' when prompted to add to Claude Code

Manual Setup

claude mcp add build-output-tools -s user -- /path/to/.venv/bin/python /path/to/src/build_output_tools_mcp/server.py

Server Management

# Stop the MCP server connection
./scripts/stop_server.sh

# Completely remove from Claude Code
./scripts/uninstall_server.sh

# Reinstall after removal
./run_server.sh

Claude Desktop Setup

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "build-output-tools": {
      "command": "/path/to/.venv/bin/python",
      "args": ["/path/to/src/build_output_tools_mcp/server.py"]
    }
  }
}

Troubleshooting

Common Issues

MCP Server Connection Issues:

# Stop and restart the server
./scripts/stop_server.sh
./scripts/uninstall_server.sh
./run_server.sh

API Key Not Working:

# Check your .env file
cat .env | grep OPENROUTER_API_KEY

# Verify API key at https://openrouter.ai/

Command Timeouts:

# Increase timeout for long builds
run_build("/my-app", "npm", ["run", "build"], timeout=1200)

Storage Full:

# Clean up old builds
cleanup_old_builds(max_age_days=3)

Testing the Server

# Run test suite
python -m pytest tests/ -v

# Test with sample providers
run_build("/tmp", "npm", ["--version"])
run_build("/tmp", "pytest", ["--help"])

Advanced Usage

Provider Discovery & Validation

# Discover available providers
providers = list_providers()
print(json.loads(providers)["supported_providers"])  # ["pytest", "unittest", "npm", "docker"]

# Invalid provider handling
result = run_build("/my-app", "invalid_provider", ["test"])
# Returns: {"status": "error", "error": "Unsupported provider: invalid_provider..."}

Custom Models

# Use specific models for different analysis needs
run_build("/my-app", "npm", ["test"], model="anthropic/claude-3-sonnet")  # Deep analysis
run_build("/my-app", "npm", ["run", "lint"], model="google/gemini-flash-1.5")  # Quick checks

Error Pattern Detection

# Get history and analyze patterns
history = list_build_history(50)

# Use build IDs to analyze common failure patterns
failing_builds = [b for b in history if not b["success"]]

Contributing

We welcome contributions! Some potential areas for improvement:

  • Direct API integration with providers - Removing the hard dependency on OpenRouter and adding API access for other model providers like Anthropic or OpenAI
  • Safety checks for running test commands - The current method of using subprocess with a guaranteed start to the command like pytest or docker is a good start, but there could be potential still for malicious activity workarounds.
  • Framework-specific parsers - Looking for contributors to add support for more testing frameworks!
  • Build comparison - Diff analysis between builds

License

MIT License - see LICENSE file for details.

Support

  • Issues: GitHub Issues
  • Documentation: This README
  • API Reference: See tool descriptions above