dataform-mcp-server

MHooson/dataform-mcp-server

3.2

If you are the rightful owner of dataform-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

A Model Context Protocol (MCP) server that provides Large Language Models with access to comprehensive Dataform documentation.

Tools
5
Resources
0
Prompts
0

Dataform Documentation MCP Server

npm version License: MIT Node Version

A Model Context Protocol (MCP) server for Claude Code (VS Code extension) that provides access to comprehensive Dataform documentation. Get instant answers about Dataform's features, syntax, and best practices while coding.

✨ New in v0.2.0: Pre-built documentation database included! Install via npm and start using immediately - no ingestion wait required.

Features

  • Semantic Search: Search Dataform documentation using natural language queries
  • Concept Retrieval: Get detailed explanations of Dataform concepts with examples
  • Syntax Reference: Look up syntax for specific Dataform constructs
  • Error Solutions: Find solutions for common Dataform errors
  • Best Practices: Retrieve curated best practices for various use cases

Installation

From npm (Recommended)

The easiest way to install with a pre-built documentation database:

npm install -g dataform-docs-mcp

Includes: Pre-built database with all Dataform documentation (~5.5MB download, 12.6MB unpacked)

From Source (Development)

For contributors or those who want to build from source:

# Clone the repository
git clone https://github.com/mhooson/dataform-mcp.git
cd dataform-mcp

# Install dependencies
npm install

# Build the project
npm run build

# Run initial documentation ingestion
npm run ingest

Quick Start

Prerequisites

Before starting, ensure you have:

  • Node.js 18+ installed
  • ChromaDB installed: pip install chromadb
  • ~13MB disk space for the documentation database
  • OpenAI API key required for query embeddings: Get one here
    • Used to convert search queries to embeddings (~$0.0001 per query)
    • Also needed for updating the documentation database

Quick Start (npm installation)

The package includes a pre-built database with all Dataform documentation, so you can start using it immediately!

Step 1: Install the package

npm install -g dataform-docs-mcp

This installs the package including the pre-built documentation database (~5.5MB download, 12.6MB unpacked).

Step 2: Start ChromaDB server

# The database is included in the npm package installation
# Find the package location:
npm root -g  # Example: /usr/local/lib/node_modules

# Start ChromaDB pointing to the installed database
chroma run --path $(npm root -g)/dataform-docs-mcp/data/vectordb --port 8000

Keep this terminal running.

Step 3: Start the MCP server

dataform-mcp serve

That's it! The server is ready to use with the pre-built database.

Optional: Update the documentation later

# Only needed if you want the latest Dataform docs
export OPENAI_API_KEY=your-api-key-here
dataform-mcp update

The update command refreshes the database with the latest documentation from Google Cloud.

Step 4: Configure your IDE (see IDE Configuration section below)


Alternative: Source Installation Quick Start

For those installing from source (see Installation section above):

1. Start ChromaDB Server

The MCP server requires ChromaDB to be running:

chroma run --path ./data/vectordb --port 8000

Keep this running in a separate terminal.

Note: ChromaDB must be running on port 8000 (default) for the MCP server to connect.

2. Configure Claude Code (VS Code Extension)

For npm installation:

Create a .mcp.json file in your project root:

{
  "mcpServers": {
    "dataform-docs": {
      "command": "dataform-mcp",
      "args": ["serve"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Or use the CLI command:

claude mcp add --transport stdio dataform-docs \
  --env OPENAI_API_KEY=your-api-key-here \
  -- dataform-mcp serve

For development (from source):

{
  "mcpServers": {
    "dataform-docs": {
      "command": "node",
      "args": ["/absolute/path/to/dataform-mcp/dist/src/index.js"],
      "env": {
        "OPENAI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Scope Options:

  • --scope local (default): Private to current project
  • --scope project: Shared with team via version control
  • --scope user: Available across all your projects

Why OpenAI API key is required: The server needs to convert your search queries into embeddings to search the vector database. This incurs minimal cost (~$0.0001 per query).

3. Start the MCP Server

For development (from source):

npm run dev

Or for production (after global install):

dataform-mcp serve

Available Tools

1. search_dataform_docs

Search Dataform documentation semantically.

Parameters:

  • query (string, required): The search query
  • max_results (number, optional): Maximum results to return (default: 5)
  • doc_type (array, optional): Filter by type: ["guide", "reference", "tutorial", "api"]
  • include_code (boolean, optional): Prioritize code examples (default: false)

Example:

{
  "query": "How do I create an incremental table?",
  "max_results": 3,
  "include_code": true
}

2. get_dataform_concept

Retrieve detailed explanation of a Dataform concept.

Parameters:

  • concept (string, required): The concept name (e.g., "assertions", "dependencies")

Example:

{
  "concept": "incremental tables"
}

3. get_syntax_reference

Get syntax reference for Dataform constructs.

Parameters:

  • construct (string, required): The construct name (e.g., "config block", "ref() function")
  • language (string, optional): "sqlx" or "javascript"

Example:

{
  "construct": "ref() function",
  "language": "sqlx"
}

4. search_error_solutions

Find solutions for Dataform errors.

Parameters:

  • error_message (string, required): The error message
  • context (string, optional): Additional context

Example:

{
  "error_message": "Cannot find module '@dataform/core'",
  "context": "Occurs when running dataform compile"
}

5. get_best_practices

Retrieve best practices for specific topics.

Parameters:

  • topic (string, required): The topic (e.g., "testing", "performance", "incremental models")

Example:

{
  "topic": "testing"
}

Project Structure

dataform-mcp/
├── src/
│   ├── index.ts              # MCP server entry point
│   ├── cli.ts                # CLI interface
│   ├── tools/                # Tool implementations
│   ├── ingestion/            # Documentation scraping & processing
│   ├── storage/              # Vector & metadata storage
│   ├── search/               # Search implementations
│   └── utils/                # Utilities (config, logger)
├── data/
│   ├── vectordb/             # Vector database files
│   ├── metadata.db           # SQLite metadata
│   └── cache/                # Query cache
├── scripts/
│   ├── ingest.ts             # Initial ingestion
│   └── update.ts             # Update documentation
├── config.json               # Configuration
└── package.json

Configuration

Edit config.json to customize behavior:

{
  "documentation": {
    "base_url": "https://cloud.google.com/dataform/docs",
    "update_frequency": "weekly"
  },
  "embeddings": {
    "model": "text-embedding-3-small",
    "dimensions": 1536
  },
  "vector_store": {
    "type": "chroma",
    "path": "./data/vectordb"
  },
  "search": {
    "default_max_results": 5,
    "similarity_threshold": 0.7
  }
}

Create config.local.json for local overrides (gitignored).

Development

Build

npm run build

Run in Development Mode

npm run dev

Run Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run tests with coverage
npm test -- --coverage

Test Structure:

  • tests/unit/ - Unit tests for individual components
  • tests/integration/ - Integration tests for full pipelines
  • tests/fixtures/ - Mock data and test fixtures

Test Coverage:

  • ✓ Document parsing (HTML to Markdown)
  • ✓ Document chunking with overlap
  • ✓ Query caching
  • ✓ Tool handlers (all 5 tools)
  • ✓ Metadata storage
  • ✓ End-to-end ingestion pipeline

Update Documentation

npm run update

Maintenance

Scheduled Updates

Set up a cron job for weekly updates:

0 0 * * 0 dataform-mcp update

Monitoring

Logs are written to logs/dataform-mcp.log. Set log level via environment variable:

LOG_LEVEL=0 dataform-mcp serve  # DEBUG
LOG_LEVEL=1 dataform-mcp serve  # INFO (default)
LOG_LEVEL=2 dataform-mcp serve  # WARN
LOG_LEVEL=3 dataform-mcp serve  # ERROR

Architecture

The MCP server uses:

  • Vector Store: ChromaDB for semantic search
  • Embeddings: OpenAI text-embedding-3-small
  • Metadata: SQLite for structured data
  • Cache: In-memory cache for frequent queries
  • Scraping: Playwright + Cheerio for documentation extraction

Environment Variables

# OpenAI API key for embeddings
OPENAI_API_KEY=your_api_key_here

# Log level (0-3)
LOG_LEVEL=1

# Custom config path
CONFIG_PATH=/path/to/config.json

Troubleshooting

Native Module Error (better-sqlite3)

Error: The module 'better_sqlite3.node' was compiled against a different Node.js version

This occurs when the native SQLite module needs to be rebuilt for your Node.js version.

Solution:

# Find the package installation directory
cd $(npm root -g)/dataform-docs-mcp

# Rebuild native modules
npm rebuild better-sqlite3

# Or rebuild all native modules
npm rebuild

Why this happens: VS Code or Claude Code may use a different Node.js version than your system, requiring native modules to be recompiled.

Server won't start

  1. Check logs in logs/dataform-mcp.log
  2. Ensure all dependencies are installed: npm install
  3. Verify configuration in config.json
  4. Try rebuilding native modules (see above)

Search returns no results

  1. Run ingestion: npm run ingest
  2. Check if vector database exists in data/vectordb/
  3. Verify OpenAI API key is set
  4. Check that ChromaDB is running on port 8000

Documentation is stale

Run update manually:

npm run update

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

MIT

Support

For issues and questions:

Acknowledgments

Built with: