Cairo-Context-MCP

ColbySerpa/Cairo-Context-MCP

3.3

If you are the rightful owner of Cairo-Context-MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

Cairo Context MCP Toolkit: AI's Cairo Cartographer is a production-ready Model Context Protocol server designed to provide semantic search across Cairo/Starknet documentation using advanced technologies like size-based chunking, Gemini embeddings, and Qdrant vector search.

Tools
4
Resources
0
Prompts
0

Cairo Context MCP Toolkit

AI's Cairo Cartographer 🗺️



Navigate Cairo documentation with precision—your intelligent guide to Starknet's proving language 👨‍💻

A production-ready Model Context Protocol (MCP) server providing semantic search across Cairo/Starknet documentation using size-based chunking, multi-provider embeddings (Gemini/Mistral), and Qdrant vector search. Built to prevent context overflow while delivering the most relevant results. Your AI sidekick now learns Cairo syntax on the fly.


🚀 Quick Start

Prerequisites

1. Install Docker:

# macOS (using Homebrew)
brew install docker

# Windows
# Download Docker Desktop from: https://www.docker.com/products/docker-desktop/

2. Start Qdrant vector database:

docker run -d -p 6333:6333 qdrant/qdrant

3. Install Node.js >= 18.0.0 (Download)

Setup Cairo Context

# 1. Clone and install dependencies
git clone <repo-url>
cd cairo-context
npm install

# 2. Configure your embedding provider
cp .env.example .env
# Edit .env - choose "gemini" or "mistral" and add your API key

# 3. Generate embeddings (one-time setup, ~3-5 minutes)
npm run generate-embeddings

# 4. Build the project (MCP should be online after this step)
npm run build

The system will:

  • ✅ Download all 9 documentation sources from GitHub
  • ✅ Process 2,150+ chunks with beautiful progress bars
  • ✅ Generate embeddings with your chosen provider
  • ✅ Store everything in Qdrant for instant semantic search

Provider Options:

  • Gemini (Google AI): 3072D, highest quality - Get API key
  • Mistral (Mistral AI): 1024D, faster/cheaper - Get API key

Configure Your IDE

Recommended: Roo (VS Code Extension)

  1. Install Roo extension
  2. Click the 3 dots icon in Roo → MCP ServersEdit Global Config
  3. Add this configuration:
{
  "mcpServers": {
    "cairo-context": {
      "command": "node",
      "args": [
        "C:\\\\Users\\\\<your-username>\\\\path\\\\to\\\\cairo-context\\\\dist\\\\src\\\\index.js"
      ],
      "alwaysAllow": [
        "get-cairo-example",
        "list-cairo-resources",
        "semantic-search-cairo"
      ],
      "disabled": false
    }
  }
}

Alternative IDEs:

  • Claude Code: claude mcp add cairo-context -- node /path/to/cairo-context/dist/src/index.js
  • Cursor: Add to ~/.cursor/mcp.json (same JSON structure as Roo)

Why "Cairo Cartographer"?

Just as a cartographer maps uncharted territory, this MCP server charts the landscape of Cairo documentation, guiding AI assistants through:

  • 3 MCP tools for semantic search and code retrieval
  • Complete Cairo Coder ingester system ported to Qdrant
  • 8 production Cairo examples
  • 2,150 documentation chunks across 9 comprehensive sources
    • Dynamic documentation processing from 6 GitHub repositories
    • 3 AI-summarized knowledge bases (Cairo Book, Core Library, Starknet Blog)

No more getting lost in massive docs. The Cartographer dynamically ingests, chunks, indexes, and retrieves relevant answers using natural language semantic search.


Features

🎯 Size-Based Semantic Search

Problem Solved: Previous MCP servers returned massive sections after the AI conducted an overly broad search, causing the chat to crash via the classic "prompt too long" error message.

Our Solution:

  • Size-based chunking: Each chunk = ~500 tokens (not 15,000)
  • Automatic token rounding: max_tokens rounds up to nearest 500
  • Score-sorted results: Best matches first (1.0 → 0.0)
  • Configurable limits: Control both relevance (score_threshold) and output size (max_tokens)

Example of How AI Calls The MCP Tool (3 parameters only):

semantic-search-cairo({
  query: "How do I implement Poseidon hash in a STARK circuit?",
  score_threshold: 0.5,  // 0.0-1.0 (higher = stricter)
  max_tokens: 10000      // Rounds to 10000 (20 chunks of 500 tokens)
})

📚 Documentation Sources (9 Total)

3 AI-Summarized Sources (pre-processed by Cairo Coder):

  • Cairo Book (234 chunks) - Comprehensive language reference
  • Core Library (167 chunks) - Standard library documentation
  • Starknet Blog (182 chunks) - Latest updates and announcements

6 Dynamically Ingested Sources (GitHub clones + processing):

  • Starknet Docs (222 chunks) - Official Starknet documentation
  • Starknet Foundry (681 chunks) - Testing framework documentation
  • Cairo By Example (134 chunks) - Practical code examples
  • OpenZeppelin (166 chunks) - Secure contract components
  • Scarb (181 chunks) - Package manager documentation
  • Starknet.js (183 chunks) - JavaScript SDK guides

Total: 2,150 chunks across 9 sources

💻 Code Examples

8 production-ready Cairo programs:

  • Counter Contract - Simple state management
  • ERC20 Token - Fungible token standard
  • ERC721 NFT - Non-fungible token standard
  • Ownable ERC20 - Access control pattern
  • Pausable ERC20 - Emergency stop pattern
  • Reentrancy Guard - Security pattern
  • Rollback Component - State recovery pattern
  • Debugging Values - Debug techniques

Architecture

Reverse-Engineered Cairo Coder Ingester → Qdrant

We fully ported Cairo Coder's PostgreSQL-based ingester system to work with Qdrant, achieving:

9 Sources → Dynamic Ingestion → Gemini Embeddings (3072D) → Qdrant → Semantic Search

Major Engineering Achievement:

  • Ported entire Cairo Coder ingester architecture (9 specialized ingesters, 20+ utilities)
  • Created Qdrant adapter ()
  • Python wrappers for OpenZeppelin (Antora) and Starknet Foundry (mdbook)
  • Automatic dimension detection (3072D full Gemini embeddings)
  • Incremental updates (detects content changes, only updates modified chunks)

What We Replaced:

- PostgreSQL + pgvector extension
- Complex database migrations
- Manual dependency management

What We Built:

+ Qdrant vector database (1 container, auto-refilled)
+ Full Cairo Coder ingester compatibility for all 9 sources
+ Gemini or Mistral embeddings 
+ Python bypass wrappers for 2 of the sources
+ Automatic collection dimension matching
+ <200ms query latency across 2,150 chunks

Available Tools (3 Total)

1. semantic-search-cairo (Primary Search Tool)

Semantic search using natural language queries, powered by Gemini embeddings and Qdrant.

Parameters (simplified to 2):

  • query (required) - Natural language question
  • score_threshold (optional) - 0.0-1.0 in steps of 0.05 (default: 0.5)
  • max_tokens (optional) - Default: 50000, minimum: 500 (auto-rounds to nearest 500)

Token Rounding Examples:

  • 290500 (1 chunk minimum)
  • 7501000 (2 chunks)
  • 24002500 (5 chunks)
  • 1000010000 (20 chunks)
  • 5000050000 (100 chunks, default)

Example Usage:

// Broad search with default settings
{
  query: "How do I implement Poseidon hash in a STARK circuit?",
  score_threshold: 0.5,
  max_tokens: 50000
}

// Precise search with limited output
{
  query: "felt252 modular arithmetic",
  score_threshold: 0.7,
  max_tokens: 5000  // Returns ~10 highly relevant chunks
}

// Exploration mode
{
  query: "storage optimization techniques",
  score_threshold: 0.3,
  max_tokens: 20000  // Returns ~40 loosely related chunks
}

2. get-cairo-example

Retrieve complete Cairo code examples.

Parameters:

  • example_id (required) - One of:
    • counter - Simple state management
    • erc20 - Fungible token
    • erc721 - NFT standard
    • ownable-erc20 - Access control
    • pausable-erc20 - Emergency stop
    • reentrancy-guard - Security pattern
    • rollback-component - State recovery
    • debugging - Debug techniques

Example:

example_id: "erc20"

3. list-cairo-resources

List all available documentation sources and examples.

Parameters: None needed, just call the tool.


Embeddings

Supported Providers:

Gemini (Google AI) - Default

  • Model: gemini-embedding-001
  • Dimension: 3072 (highest quality)
  • Task Type: RETRIEVAL_DOCUMENT for docs, RETRIEVAL_QUERY for queries
  • Cost: ~$0.0015 for 2,150 chunks
  • Get API Key: https://aistudio.google.com/api-keys

Mistral AI - Alternative

Features:

  • Automatic dimension matching: System detects embedding size and recreates collection if needed
  • Provider switching: Change provider in .env and re-run embedding generation
  • LangChain integration: Both providers use unified interface

Vector Database

  • Qdrant running on localhost:6333
  • Collection: cairo-docs
  • Distance Metric: Cosine similarity
  • Vectors: 2,150 (dynamically ingested from 9 sources)
  • Automatic management: Collection auto-recreated if dimension mismatch detected

Search Quality

Score Threshold Guide:

  • 0.9-1.0 - Nearly identical matches only
  • 0.7-0.9 - High similarity (recommended for precise queries)
  • 0.5-0.7 - Moderate similarity (default, good balance)
  • 0.3-0.5 - Broader matching
  • 0.0-0.3 - Very loose matching (may include unrelated results)

Token Limit Guide:

  • 500 - Minimum (1 chunk)
  • 5000 - Quick reference (~10 chunks)
  • 10000 - Moderate exploration (~20 chunks)
  • 50000 - Deep dive (default, ~100 chunks)

Performance

  • Startup: <100ms (no initialization overhead)
  • Search: <200ms (Gemini embedding + Qdrant lookup)
  • Memory: <50MB (lightweight compared to RAG systems)
  • Storage: ~5KB per chunk in Qdrant (2,150 chunks = ~10.7MB)
  • Ingestion: ~1-2 minutes to download and process all 9 sources
  • Incremental updates: Only re-processes changed documentation

License

MIT

Credits

  • Ingester System: Ported from KasarLabs/cairo-coder (complete reverse-engineering and Qdrant adaptation)
  • Documentation: 9 sources including Cairo Coder's 3 AI-summarized docs + 6 dynamically ingested repositories
  • Architecture: MCP server pattern + custom Qdrant vector store implementation seen in Roo's codebase_search tool as inspiration