mcp-bigquery-schema by waaronmorris - MCP Server

BigQuery Schema MCP Server

An MCP (Model Context Protocol) server that provides safe, read-only access to BigQuery schema information. This server enables LLMs to explore dataset structures, table schemas, and metadata without accessing actual data.

Features

Schema-Only Access: Retrieve table and view schemas without accessing row data
INFORMATION_SCHEMA Queries: Query BigQuery metadata catalog safely
Resource Discovery: List all available datasets, tables, and views
Type Safety: Full TypeScript implementation with strict type checking
Secure by Design: No access to actual data, only structural information

Installation

Via npx (Recommended - No Installation Required)

Use directly with npx in your Claude Desktop configuration - see Configuration section below.

Global Installation

npm install -g @waaronmorris/mcp-bigquery-schema@1.1.1

From Source

git clone https://github.com/waaronmorris/mcp-bigquery-schema.git
cd mcp-bigquery-schema
npm install
npm run build

Authentication

The server supports two authentication methods:

1. Service Account Key File (Recommended)

Create a service account in Google Cloud Console with the following roles:

See Security Below

Download the JSON key file and provide it via the --key-file argument.

2. Application Default Credentials

If no key file is specified, the server will use Application Default Credentials (ADC). Set up ADC by running:

gcloud auth application-default login

Configuration

Add the server to your MCP client configuration:

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "bigquery-schema": {
      "command": "npx",
      "args": [
        "@waaronmorris/mcp-bigquery-schema@1.1.1",
        "--project-id", "your-project-id",
        "--location", "US",
        "--key-file", "/path/to/service-account-key.json"
      ]
    }
  }
}

Without Key File (Using ADC)

{
  "mcpServers": {
    "bigquery-schema": {
      "command": "npx",
      "args": [
        "@waaronmorris/mcp-bigquery-schema@1.1.1",
        "--project-id", "your-project-id",
        "--location", "US"
      ]
    }
  }
}

Alternative: Global Installation

If you prefer to install globally:

npm install -g @waaronmorris/mcp-bigquery-schema@1.1.1

Then use in Claude Desktop config:

{
  "mcpServers": {
    "bigquery-schema": {
      "command": "mcp-bigquery-schema",
      "args": [
        "--project-id", "your-project-id",
        "--location", "US"
      ]
    }
  }
}

Available Tools

explore_datasets

List and explore available BigQuery datasets in the project with metadata.

Parameters:

includeMetadata (optional): Include detailed metadata for each dataset (location, creation time, etc.)
datasetId (optional): Specific dataset ID to get detailed information about
includeTableDescriptions (optional): Include table descriptions when listing tables (requires additional API calls)

Usage Examples:

// List all datasets with basic info
{}

// List all datasets with metadata
{
  "includeMetadata": true
}

// Get specific dataset details with all tables and descriptions
{
  "datasetId": "my_dataset",
  "includeTableDescriptions": true
}

get_table_metadata

Get comprehensive metadata for a specific table including descriptions, labels, and schema details.

Parameters:

dataset (required): The dataset containing the table
table (required): The table name to get metadata for
includeColumnStats (optional): Include column-level statistics from INFORMATION_SCHEMA

Usage Examples:

// Get basic table metadata with descriptions
{
  "dataset": "my_dataset",
  "table": "users"
}

// Get table metadata with column statistics
{
  "dataset": "my_dataset",
  "table": "users",
  "includeColumnStats": true
}

Returns:

Table description and labels
Column names, types, and descriptions
Partitioning and clustering configuration
Size and row count statistics
Creation and modification timestamps
Optional: Usage statistics from query history

get_table_usage_stats

Analyze table usage patterns and popularity from query job history.

Parameters:

dataset (optional): The dataset to analyze (analyzes all if not specified)
table (optional): Specific table to analyze
days (optional): Number of days of history to analyze (default: 7)
limit (optional): Maximum number of results to return (default: 20)

Usage Example:

// Get most popular tables across all datasets
{
  "days": 7,
  "limit": 10
}

// Returns:
{
  "period": "Last 7 days",
  "totalTablesAnalyzed": 10,
  "mostPopularTables": [
    {
      "dataset": "analytics",
      "table": "user_events",
      "queryCount": 150,
      "uniqueUsers": 12,
      "avgBytesProcessed": 1024000000,
      "lastAccessed": "2024-01-15T10:30:00Z",
      "estimatedCostUSD": 0.70
    }
  ]
}

suggest_related_tables

Find tables commonly queried together based on usage patterns.

Parameters:

dataset (required): The dataset containing the table
table (required): The table to find related tables for
days (optional): Number of days of history to analyze (default: 30)

Usage Example:

// Find tables often joined with user_events
{
  "dataset": "analytics",
  "table": "user_events",
  "days": 30
}

// Returns:
{
  "sourceTable": "analytics.user_events",
  "analysisPeriod": "Last 30 days",
  "relatedTables": [
    {
      "dataset": "analytics",
      "table": "sessions",
      "coOccurrenceCount": 45,
      "suggestedJoinLikelihood": "high",
      "uniqueUsers": 8
    }
  ]
}

query_information_schema

Query BigQuery INFORMATION_SCHEMA views to retrieve metadata about datasets, tables, columns, and more.

Parameters:

dataset (required): The dataset to query
table (required): The INFORMATION_SCHEMA table to query
filter (optional): WHERE clause conditions

Supported INFORMATION_SCHEMA tables:

TABLES - List all tables and views in a dataset
COLUMNS - Get column information for tables
VIEWS - View definitions
TABLE_OPTIONS - Table-level options and settings
COLUMN_FIELD_PATHS - Nested field information
PARTITIONS - Partition information
TABLE_SNAPSHOTS - Table snapshot metadata
TABLE_STORAGE - Storage statistics
ROUTINES - Stored procedures and functions
ROUTINE_OPTIONS - Routine options
PARAMETERS - Routine parameters
SCHEMATA - Dataset information
SCHEMATA_OPTIONS - Dataset options
JOBS - Query job history and metadata
JOBS_BY_PROJECT - Project-level job information

Example queries:

// List all tables in a dataset
{
  "dataset": "my_dataset",
  "table": "TABLES",
  "filter": "table_type = 'BASE TABLE'"
}

// Get columns for a specific table
{
  "dataset": "my_dataset", 
  "table": "COLUMNS",
  "filter": "table_name = 'my_table'"
}

// Find all partitioned tables
{
  "dataset": "my_dataset",
  "table": "TABLES",
  "filter": "is_partitioned = TRUE"
}

Available Resources

The server provides schema information as resources for all tables and views in your BigQuery project:

URI Format: bigquery://project-id/dataset-id/table-id/schema
Content: JSON schema definition including field names, types, modes, and descriptions

Examples

Exploring Available Datasets

// The LLM can request:
"What datasets are available in this project?"

// This will use the explore_datasets tool:
{
  "includeMetadata": true
}

// Returns:
{
  "projectId": "my-project",
  "datasetCount": 3,
  "datasets": [
    {
      "id": "analytics",
      "location": "US",
      "creationTime": "2024-01-01T00:00:00.000Z",
      "description": "Analytics data",
      "tableCount": 15,
      "labels": {}
    },
    // ...
  ]
}

Getting Table Schema

// The LLM can request:
"Show me the schema for the users table in the analytics dataset"

// This will fetch the resource:
// bigquery://my-project/analytics/users/schema

Exploring Dataset Structure

// The LLM can request:
"What tables are in the sales dataset?"

// This will query:
{
  "dataset": "sales",
  "table": "TABLES"
}

Finding Column Information

// The LLM can request:
"Show me all columns in the orders table"

// This will query:
{
  "dataset": "sales",
  "table": "COLUMNS",
  "filter": "table_name = 'orders'"
}

Working with Table and Column Descriptions

// Get comprehensive table metadata including descriptions
{
  "dataset": "analytics",
  "table": "user_events"
}

// Returns:
{
  "dataset": "analytics",
  "table": "user_events",
  "description": "Stores all user interaction events for analytics",
  "labels": {
    "team": "analytics",
    "pii": "true"
  },
  "schema": {
    "fields": [
      {
        "name": "event_id",
        "type": "STRING",
        "mode": "REQUIRED",
        "description": "Unique identifier for each event"
      },
      {
        "name": "user_id",
        "type": "STRING",
        "mode": "REQUIRED",
        "description": "Reference to the user who triggered the event"
      },
      {
        "name": "event_type",
        "type": "STRING",
        "mode": "NULLABLE",
        "description": "Type of event (click, view, purchase, etc.)"
      }
    ]
  },
  "partitioning": {
    "timePartitioning": {
      "type": "DAY",
      "field": "event_timestamp"
    }
  }
}

// Query INFORMATION_SCHEMA for column descriptions
{
  "dataset": "analytics",
  "table": "COLUMNS",
  "filter": "table_name = 'user_events' AND description IS NOT NULL"
}

Using Table Usage Analytics

The MCP server can analyze query history to identify popular tables and common join patterns:

// Find the most frequently queried tables in the last week
{
  "tool": "get_table_usage_stats",
  "days": 7,
  "limit": 5
}

// Get usage statistics with table metadata
{
  "tool": "get_table_metadata",
  "dataset": "analytics",
  "table": "user_events",
  "includeUsageStats": true
}

// Discover tables commonly used together
{
  "tool": "suggest_related_tables",
  "dataset": "analytics",
  "table": "user_events"
}

Benefits of Usage Analytics:

Prioritize Investigation: Focus on the most frequently accessed tables first
Understand Relationships: Discover which tables are commonly joined
Cost Optimization: Identify expensive queries and optimize them
User Patterns: See which teams or services use specific tables
Data Freshness: Know when tables were last accessed

Security

Multi-Layer Security Architecture

This server implements defense-in-depth with multiple security layers to ensure it can NEVER access actual data:

Layer 1: Input Validation

Project ID Validation: Only lowercase letters, numbers, and hyphens
Dataset Name Validation: Only alphanumeric and underscores
Table Whitelist: Only approved INFORMATION_SCHEMA tables
Filter Sanitization: Blocks SQL injection attempts

Layer 2: Query Pattern Enforcement

Strict Query Format: Only SELECT * FROM INFORMATION_SCHEMA allowed
Forbidden Keywords: Blocks INSERT, UPDATE, DELETE, DROP, CREATE, etc.
Forbidden Patterns: Blocks JOINs, UNIONs, subqueries, CTEs
Comment Prevention: Blocks SQL comments (--, /*, #)

Layer 3: Safe Query Execution

Query Wrapper: Additional validation before execution
Read-Only Mode: Enforced at BigQuery client level
Timeout Limits: 30-second maximum query execution
Result Size Limits: Maximum 10,000 rows

Layer 4: Result Validation

Field Inspection: Checks for PII-like field names
Data Pattern Detection: Identifies potential data leaks
Size Validation: Prevents large data exfiltration

Recommended IAM Permissions

For maximum security, create a custom IAM role with ONLY these permissions:

{
  "title": "BigQuery Schema Reader",
  "description": "Read-only access to BigQuery metadata only",
  "permissions": [
    "bigquery.datasets.get",
    "bigquery.tables.get",
    "bigquery.tables.list",
    "bigquery.jobs.create"
  ]
}

DO NOT grant these permissions:

❌ bigquery.tables.getData - Allows reading table data
❌ bigquery.tables.export - Allows exporting data
❌ bigquery.datasets.update - Allows modifying datasets
❌ bigquery.tables.update - Allows modifying tables
❌ bigquery.tables.delete - Allows deleting tables

Security Best Practices

Use a Dedicated Service Account: Create a service account specifically for this MCP server
Minimal Permissions: Use the custom role above, not predefined roles
Regular Audits: Review Cloud Audit Logs for any suspicious queries
Network Restrictions: Use VPC Service Controls if available
Key Rotation: Rotate service account keys regularly
Monitor Usage: Set up alerts for unusual query patterns

Audit Logging

All queries are logged to stderr with [AUDIT] prefix for monitoring:

[AUDIT] Executing INFORMATION_SCHEMA query: SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`
[AUDIT] Query completed in 123ms, returned 42 rows

What This Server CANNOT Do

❌ Cannot read actual table data
❌ Cannot execute arbitrary SQL queries
❌ Cannot modify any data or schema
❌ Cannot access tables outside INFORMATION_SCHEMA
❌ Cannot perform JOIN operations with data tables
❌ Cannot execute stored procedures or functions
❌ Cannot export or import data

Development

Building from Source

npm install
npm run build

Development Mode

npm run dev

Clean Build

npm run clean
npm run build

Contributing

This project uses Conventional Commits to maintain a clear and meaningful commit history.

Making Commits

We use Commitizen to help create properly formatted commit messages:

# Instead of git commit, use:
npm run commit

# This will launch an interactive prompt to help you create a conventional commit

The interactive prompt will guide you through:

Type: Select the type of change (feat, fix, docs, etc.)
Scope: Optionally specify what part of the codebase changed
Description: Write a short description
Body: Optionally provide additional details
Breaking Changes: Note any breaking changes
Issues: Reference any related issues

Commit Types

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes (formatting, etc.)
refactor: Code refactoring
perf: Performance improvements
test: Adding or updating tests
build: Build system changes
ci: CI/CD changes
chore: Other changes that don't modify src or test files
revert: Revert a previous commit
security: Security-related changes

Examples

# Using npm run commit:
npm run commit
# Follow the prompts to create: "feat(auth): add service account validation"

# Manual conventional commit (if needed):
git commit -m "fix: resolve location parameter validation issue"
git commit -m "feat(security): add gitleaks pre-commit hook"
git commit -m "docs: update README with commitizen instructions"

Pre-commit Hooks

All commits are automatically checked for:

Conventional commit format: Ensures commit messages follow the standard
Secret detection: Scans for potential credentials or API keys
File validation: Blocks commits containing service account files

If a commit is rejected, review the error message and adjust accordingly.

Requirements

Node.js >= 14.0.0
Google Cloud Project with BigQuery API enabled
Appropriate IAM permissions for BigQuery access

Troubleshooting

Authentication Issues

If you encounter authentication errors:

Verify your service account key file path is correct
Ensure the service account has the required BigQuery permissions
Check that the BigQuery API is enabled in your Google Cloud project

Permission Errors

The service account needs:

bigquery.datasets.get
bigquery.tables.get
bigquery.tables.list
bigquery.jobs.create (for running INFORMATION_SCHEMA queries)

Location Issues

Ensure the --location parameter matches your BigQuery datasets' location. Common values:

US - United States
EU - European Union
us-central1 - Specific regional locations

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions, please use the GitHub Issues page.