veritas-mcp by movibe-ai - MCP Server

🛡️ Veritas MCP - AgentGuard

Prevents AI agents from reporting false success by actually running tests, builds, and verification

🎯 Problem Solved

AI agents frequently lie to users about work quality:

❌ "Tests pass" → Tests never actually ran
❌ "Feature works" → Code doesn't compile
❌ "Build succeeds" → Build was never attempted
❌ "No breaking changes" → System completely broken

🛡️ AgentGuard Solution

MANDATORY verification before any agent success claim:

✅ Actually executes tests, builds, and validation
✅ Evidence capture with screenshots and logs
✅ Comprehensive reporting with failure analysis
✅ Blocks false positive claims automatically

🚀 Quick Start

Installation

npm install @movibe-ai/veritas-mcp

Basic Usage

import { AgentGuard } from '@movibe-ai/veritas-mcp';

const guard = new AgentGuard();

// Verify agent claims before reporting to user
const result = await guard.verifyAgentClaim({
  claim: {
    type: 'feature_implementation',
    description: 'Added user authentication with JWT tokens',
    changedFiles: ['src/auth.ts', 'src/middleware.ts'],
    expectedBehavior: 'Users can login and access protected routes',
    testCommands: ['npm test', 'npm run test:integration'],
    buildCommands: ['npm run build']
  },
  evidenceLevel: 'comprehensive'
});

if (result.success && result.result.verified) {
  console.log('✅ Agent claim verified - safe to report to user');
  console.log(`Confidence: ${result.result.confidence}%`);
} else {
  console.log('🚨 Agent claim FAILED - do not report success!');
  console.log('Failures:', result.result?.failures);
}

🏗️ Architecture

Core Components

AgentGuard.verifyAgentClaim(claim) → VerificationResult
├── CodeChangeVerifier     // Syntax, compilation, imports validation
├── TestExecutionEngine    // ACTUALLY runs tests (Jest/pytest/XCTest)
├── BuildSystemIntegrator  // ACTUALLY builds (npm/gradle/xcode)
└── VerificationOrchestrator // Coordinates all verification steps

Verification Process

Code Verification: Syntax ✓ Compilation ✓ Imports ✓
Test Execution: Unit ✓ Integration ✓ E2E ✓
Build Verification: npm ✓ gradle ✓ xcode ✓
Evidence Collection: Screenshots, logs, metrics
Comprehensive Report: Pass/fail with detailed analysis

🎖️ Features

Multi-Platform Support

Node.js: npm, yarn, pnpm builds with Jest/Mocha tests
Android: Gradle builds with JUnit/Espresso tests
iOS: Xcode builds with XCTest
Web: Webpack/Vite builds with Playwright/Cypress E2E

Framework Support

Test Frameworks: Jest, pytest, XCTest, Mocha, JUnit
Build Systems: npm, yarn, pnpm, gradle, xcodebuild, maven
Code Analysis: TypeScript, JavaScript, Python, Java, Swift

Evidence Collection

Screenshots: Visual proof of functionality
Videos: E2E test recordings
Logs: Detailed execution logs
Performance: Metrics and timing data

📊 Verification Results

Success Response

{
  success: true,
  result: {
    verified: true,
    confidence: 95,
    codeVerification: { /* detailed results */ },
    testResults: [ /* all test executions */ ],
    buildResults: [ /* all build attempts */ ],
    evidence: { /* screenshots, logs, etc */ },
    failures: [] // Empty on success
  }
}

Failure Response

{
  success: true, // Process completed
  result: {
    verified: false, // Verification failed
    confidence: 25,
    failures: [
      {
        type: 'compilation',
        severity: 'critical', 
        message: 'TypeScript errors in auth.ts:45',
        details: 'Cannot find name express'
      },
      {
        type: 'test',
        severity: 'high',
        message: '3/10 tests failing',
        details: 'Login tests failed with 401 errors'
      }
    ]
  }
}

🧪 Testing

This project follows strict Test-Driven Development (TDD):

# Run all tests
npm test

# Run with coverage
npm run test:coverage

# Run specific test suite
npm test -- tests/unit/AgentGuard.test.ts

Test Statistics

104 tests passing across 7 test suites
91.66% code coverage (statements)
TDD methodology throughout

🔧 Development

Prerequisites

Node.js ≥18.0.0
TypeScript 5.3+

Setup

git clone https://github.com/movibe-ai/veritas-mcp.git
cd veritas-mcp
npm install
npm run build
npm test

Scripts

npm run build          # Compile TypeScript
npm run test           # Run all tests
npm run test:coverage  # Coverage report
npm run lint           # ESLint check
npm run clean          # Clean dist/

🎯 Use Cases

1. Claude Code Integration

Prevent Claude from claiming success without actual verification:

// Before reporting "Feature implemented successfully"
const verification = await agentGuard.verifyAgentClaim({
  claim: {
    type: 'feature_implementation',
    description: 'User dashboard with real-time data',
    changedFiles: ['src/dashboard.tsx', 'src/api.ts'],
    expectedBehavior: 'Dashboard shows live user metrics',
    testCommands: ['npm test', 'npm run test:e2e'],
    buildCommands: ['npm run build']
  }
});

// Only report success if verification passes
if (verification.result?.verified) {
  reportToUser('✅ Feature implemented and verified');
} else {
  reportToUser('🚨 Implementation incomplete - fixing issues...');
}

2. CI/CD Pipeline Integration

Ensure deployment readiness:

const result = await agentGuard.verifyAgentClaim({
  claim: {
    type: 'bug_fix',
    description: 'Fixed authentication timeout issue',
    changedFiles: ['src/auth/session.ts'],
    expectedBehavior: 'Sessions persist correctly',
    testCommands: ['npm run test:auth'],
    buildCommands: ['npm run build:production']
  },
  evidenceLevel: 'comprehensive'
});

// Deploy only if verification passes
if (result.result?.verified && result.result?.confidence > 90) {
  triggerDeployment();
}

3. Code Review Automation

Validate PR claims:

// Validate PR description claims
const verification = await agentGuard.verifyAgentClaim({
  claim: {
    type: 'refactor', 
    description: 'Optimized database queries',
    changedFiles: pullRequest.changedFiles,
    expectedBehavior: 'Same functionality, better performance',
    testCommands: ['npm test', 'npm run test:performance'],
    buildCommands: ['npm run build']
  }
});

// Auto-approve if verification passes
if (verification.result?.verified) {
  approvePullRequest(pullRequest.id);
}

📈 Benefits

For Development Teams

Increased Confidence: Trust AI agent outputs completely
Time Savings: Automated verification vs manual checking
Quality Assurance: Every claim is verified with evidence
Risk Reduction: No broken deployments from false claims

For CI/CD Pipelines

Deployment Safety: Only verified code reaches production
Automated Quality Gates: Built-in verification checkpoints
Evidence Trail: Complete audit log of all verifications
Failure Prevention: Catch issues before they impact users

for AI Safety

Prevents Hallucination: Agents can't claim false success
Evidence-Based: All claims backed by actual execution
Transparency: Clear reporting of what was verified
Accountability: Complete log of verification attempts

🤝 Contributing

We welcome contributions! Please follow TDD methodology:

RED: Write failing tests first
GREEN: Implement minimum code to pass
REFACTOR: Improve while keeping tests green

See for details.

📄 License

MIT License - see file for details.

🙏 Acknowledgments

Built with Test-Driven Development methodology
Designed for Claude Code integration
Inspired by the need for AI agent accountability

⚡ Stop AI agents from lying about work quality. Use AgentGuard.