claude-praetorian-mcp by Vvkmnn - MCP Server

claude-praetorian-mcp

An Model Context Protocol (MCP) server for aggressive context compaction in Claude Code. Save 90%+ tokens by compacting web research, task outputs, and conversations into structured snapshots.

Inspired by this talk by Dexter Horthy from HumanLayer, and his team's work on ACE: Advanced Context Engineering for Coding Agents, 12-Factor Agents & the TOON (Token-Oriented Object Notation) Format.

install

Requirements:

From shell:

claude mcp add claude-praetorian-mcp -- npx claude-praetorian-mcp

From inside Claude (restart required):

Add this to our global mcp config: npx claude-praetorian-mcp

Install this mcp: https://github.com/Vvkmnn/claude-praetorian-mcp

From any manually configurable mcp.json: (Cursor, Windsurf, etc.)

{
  "mcpServers": {
    "claude-praetorian-mcp": {
      "command": "npx",
      "args": ["claude-praetorian-mcp"],
      "env": {}
    }
  }
}

There is no npm install required -- no external databases or services, only flat files.

However, if npx resolves the wrong package, you can force resolution with:

npm install -g claude-praetorian-mcp

Optionally, install the skill to teach Claude when to proactively use praetorian:

npx skills add Vvkmnn/claude-praetorian-mcp --skill claude-praetorian --global
# Optional: add --yes to skip interactive prompt and install to all agents

This makes Claude automatically compact after research, subagent tasks, and before context resets. The MCP works without the skill, but the skill improves discoverability.

plugin

For full automation with hooks, install from the claude-emporium marketplace:

/plugin marketplace add Vvkmnn/claude-emporium
/plugin install claude-praetorian@claude-emporium

The claude-praetorian plugin provides:

Hooks (targeted, fires only at high-value moments):

Before EnterPlanMode → Restore prior compactions for this project
Before context compaction → Save decisions/insights before context resets
After WebFetch/WebSearch → Compact web research findings
After SubagentStop → Compact subagent task results

Commands: /praetorian-compact, /praetorian-restore, /praetorian-search

Requires the MCP server installed first. See the emporium for other Claude Code plugins and MCPs.

features

MCP server for aggressive context compaction. Generates structured incremental snapshots to yield 90%+ token savings and easily refresh context with "frequent intentional compaction".

Runs project by project, saves artifacts to {$project}/.claude/praetorian (with a royal guard ⚜️):

`praetorian_compact`

(Incrementally) compact context using the TOON format to get the most valuable tokens from an activity.

⚜️ praetorian_compact type=<type> title=<title>
  > "ACE Framework research - save 1,450 tokens"
  > "Icon rendering bug investigation - compact the findings"
  > "Database architecture decisions - preserve the rationale"
  > "WebFetch results from authentication docs"
  > "Task output from explore subagent - code structure analysis"

⚜️ compact | Created

┌─ ⚜️  ────────────────────────────────────────────────── Created ─┐
│ Compacted: "ACE Framework Research" • 1,450 tokens saved
│ Type: web_research • ID: cpt_1765245902396_nxetoc
└───────────────────────────────────────────────────────────────────┘

{
  "type": "web_research",
  "title": "ACE Framework Research",
  "source": "https://github.com/humanlayer/ace-fca",
  "key_insights": [
    "Frequent intentional compaction saves 90%+ tokens",
    "TOON format is 30-60% smaller than JSON/YAML",
    "Compaction should happen after every expensive operation"
  ],
  "refs": ["ace-fca.md:42 - compaction strategy", "toon-spec.md:1 - format definition"],
  "recommendations": ["Compact after every WebFetch", "Use type='decisions' for architecture choices"]
}

⚜️ compact | Merged

┌─ ⚜️  ───────────────────────────────────────────────────── Merged ─┐
│ Compacted: "Authentication Patterns" • 890 tokens saved
│ Type: decisions • ID: cpt_1765245903512_xk9mp1
│ Merged with: cpt_1765245903512_xk9mp1
└────────────────────────────────────────────────────────────────────┘

{
  "type": "decisions",
  "title": "Authentication Patterns",
  "decisions": [
    { "chose": "JWT with refresh tokens", "over": ["sessions", "API keys"], "reason": "Stateless, works across microservices" },
    { "chose": "httpOnly cookies", "over": ["localStorage"], "reason": "XSS protection" }
  ],
  "refs": ["src/middleware/auth.ts:45 - token validation", "src/routes/login.ts:23 - refresh flow"],
  "anti_patterns": ["Never store tokens in localStorage", "Never skip CSRF on cookie-based auth"]
}

`praetorian_restore`

Search and restore context by injecting TOON tokens back into current context as needed.

⚜️ praetorian_restore query=<query>
  > "What did we learn about authentication?"
  > "Find the Docker container debugging session"
  > "Show recent architecture decisions"
  > "Search for MCP server implementation patterns"
  > "" (empty = recent compactions)

⚜️ restore | Search

┌─ ⚜️  ───────────────────────────────────────────────────── Search ─┐
│ Found 2 compactions
│ Query: "authentication"
└────────────────────────────────────────────────────────────────────┘

{
  "compactions": [
    {
      "id": "cpt_1765245903512_xk9mp1",
      "type": "decisions",
      "title": "Authentication Patterns",
      "relevance": 0.85,
      "key_insights": ["JWT with refresh tokens", "httpOnly cookies for XSS protection"],
      "refs": ["src/middleware/auth.ts:45", "src/routes/login.ts:23"]
    },
    {
      "id": "cpt_1765245902396_nxetoc",
      "type": "web_research",
      "title": "OAuth2 Best Practices",
      "relevance": 0.72,
      "key_insights": ["PKCE flow for public clients", "Token rotation every 15min"]
    }
  ],
  "total": 2
}

⚜️ restore | Recent

┌─ ⚜️  ───────────────────────────────────────────────────── Recent ─┐
│ Found 3 compactions
└────────────────────────────────────────────────────────────────────┘

Status indicators:

Created - New compaction saved
Merged - Updated existing compaction (>70% title similarity via Jaccard index)
Search - Search results returned (keyword matching)
Recent - Recent compactions listed (by updated time)

Praetorian is designed for heavy, frequent use. The more you compact, the more you save.

When to compact:

After every WebFetch
After every Task/subagent completes
After reading multiple files
After making decisions
During long conversations (proactive compaction)
Before context gets >60% full

Real-world example session:

Compaction	Before	After	Saved
Web research (3 URLs)	4,500	300	4,200
Subagent outputs (2)	3,500	300	3,200
Architecture debates	5,000	300	4,700
Hook research	1,500	150	1,350
Total	14,500	1,050	13,450 (93%)

Next session: restore() loads ~1,000 tokens. Instant resume, no re-research.

methodology

How claude-praetorian-mcp works:

              ⚜️  claude-praetorian-mcp
              ════════════════════════


        compact (save)          restore (load)
        ──────────────          ──────────────

            INPUT                   QUERY
              │                       │
              ▼                       ▼
          ┌─────────┐           ┌─────────┐
          │   Zod   │           │ Inverted│
          │Validate │           │  Index  │
          └────┬────┘           └────┬────┘
               │                     │
               ▼                     ▼
          ┌─────────┐           ┌─────────┐
          │ Jaccard │           │  TOON   │
          │ >70% ?  │           │ Decode  │
          └──┬───┬──┘           └────┬────┘
             │   │                   │
          new│   │match              ▼
             │   │                OUTPUT
             ▼   ▼
          ┌─────────┐
          │  TOON   │
          │ Encode  │
          └────┬────┘
               │
               ▼
          ┌─────────┐
          │  Index  │
          │ Update  │
          └────┬────┘
               │
               ▼
            OUTPUT


        storage: .claude/praetorian/
        ────────────────────────────
        index.json            word index + metadata
        compactions/*.toon    encoded compaction files

Core optimizations:

TOON format: 30-60% fewer tokens than YAML/JSON
Zod validation: Production-grade runtime type safety
Jaccard similarity: Smart deduplication via title matching (>70% threshold)
Inverted index: Fast keyword search without vector embeddings
Smart merging: Combine similar compactions without duplication

Design principles:

Incremental -- merge similar compactions via Jaccard similarity, don't replace
Token-minimal -- TOON format for 30-60% fewer tokens than YAML/JSON
Project-scoped -- each project stores in its own .claude/praetorian/
Flat files only -- no databases, no external services
Offline -- never leaves your machine, no network calls

File access:

Stores in: <project>/.claude/praetorian/
Format: .toon files (encoded compactions) + index.json (word index + metadata)

development

git clone https://github.com/Vvkmnn/claude-praetorian-mcp && cd claude-praetorian-mcp
npm install && npm run build
npm test

Package requirements:

Node.js: >=20.0.0 (ES modules)
Runtime: @modelcontextprotocol/sdk, @toon-format/toon, zod
Zero external databases -- works with npx

Development workflow:

npm run build          # TypeScript compilation with executable permissions
npm run dev            # Watch mode with tsc --watch
npm run start          # Run the MCP server directly
npm run lint           # ESLint code quality checks
npm run lint:fix       # Auto-fix linting issues
npm run format         # Prettier formatting (src/)
npm run format:check   # Check formatting without changes
npm run typecheck      # TypeScript validation without emit
npm run test           # Lint + type check
npm run prepublishOnly # Pre-publish validation (build + lint + format:check)

Git hooks (via Husky):

pre-commit: Auto-formats staged .ts files with Prettier and ESLint

Contributing:

Fork the repository and create feature branches
Test with multiple compaction types before submitting PRs
Follow TypeScript strict mode and MCP protocol standards

Learn from examples:

Official MCP servers for reference implementations
TypeScript SDK for best practices
Creating Node.js modules for npm package development

license

A Roman Emperor AD 41 by Lawrence Alma-Tadema (1871). Gratus of the Praetorian Guard discovers Claudius hiding behind a curtain and declares him emperor. Walters Art Museum, public domain.