bigquery-mcp

effcamp/bigquery-mcp

3.2

If you are the rightful owner of bigquery-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

A Model Context Protocol (MCP) server for Google BigQuery, enabling AI assistants to query and interact with BigQuery datasets.

Tools
6
Resources
0
Prompts
0

BigQuery MCP Server

A Model Context Protocol (MCP) server for Google BigQuery, enabling AI assistants to query and interact with BigQuery datasets.

Features

  • Query Catalog: Pre-defined, tested SQL queries exposed as MCP resources (no context window bloat!)
  • Custom SQL Queries: Execute parameterized BigQuery SQL queries when needed
  • Dataset Discovery: List datasets and tables in your project
  • Schema Inspection: Get table schemas with field types and descriptions
  • Simple Queries: Query tables with WHERE clauses and LIMIT without writing SQL
  • Type-Safe: Built with Pydantic for robust data validation
  • Configurable: YAML-based configuration with environment variable support
  • Scalable: Add hundreds of business queries without impacting LLM performance

Installation

Prerequisites

  • Python 3.13+ (recommended for full features)
    • For Python 3.10+: Most features work, but type hints use | syntax (requires 3.10+)
  • Google Cloud Platform account with BigQuery enabled
  • GCP credentials (service account or default credentials)

Method 1: Using pip (Recommended)

Install from GCP Artifact Registry:

# Configure pip to use your GCP Artifact Registry
pip install bigquery-mcp \
  --index-url https://us-central1-python.pkg.dev/<your-project-id>/<your-repo>/simple/

Or configure authentication once:

# Authenticate with GCP Artifact Registry
gcloud auth application-default login
pip install keyring
pip config set global.index-url https://us-central1-python.pkg.dev/<your-project-id>/<your-repo>/simple/

# Install the package
pip install bigquery-mcp

# Run the server
bigquery-mcp

Method 2: Using pipx (Isolated Install)

For isolated installation without affecting your global Python environment:

# Install pipx if not already installed
pip install pipx
pipx ensurepath

# Install bigquery-mcp from Artifact Registry
pipx install bigquery-mcp \
  --pip-args="--index-url https://us-central1-python.pkg.dev/<your-project-id>/<your-repo>/simple/"

# Run the server
bigquery-mcp

Method 3: Using uv (Development)

For development with local source:

# Clone the repository
git clone <your-repo-url>
cd python-mcp

# Install dependencies
uv sync

# Run the server
uv run bigquery-mcp

Which Method Should I Use?

MethodBest ForProsCons
pipProduction, system-wide installsStandard tooling, simpleMay affect global Python environment
pipxUser installs, multiple versionsIsolated environments per appExtra tool required
uvDevelopment, contributingFast, full dev environmentRequires source code

Configuration

Option 1: Configuration File (Recommended)

Create config/bigquery.yaml:

project_id: "your-gcp-project-id"
dataset_id: "your-default-dataset"  # optional
credentials_path: "/path/to/service-account.json"  # optional
location: "US"  # optional, default: US
max_results: 1000  # optional, default: 1000

Option 2: Environment Variables

export GCP_PROJECT_ID="your-gcp-project-id"
export GCP_DATASET_ID="your-default-dataset"  # optional
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"  # optional
export GCP_LOCATION="US"  # optional
export GCP_MAX_RESULTS="1000"  # optional

Option 3: Custom Config Path

export BIGQUERY_MCP_CONFIG="/custom/path/to/config.yaml"
uv run bigquery-mcp

Authentication

The server supports multiple authentication methods (in order of precedence):

  1. Service Account File: Specified in config or GOOGLE_APPLICATION_CREDENTIALS
  2. Application Default Credentials: Uses gcloud credentials

To set up gcloud credentials:

gcloud auth application-default login

Query Catalog

The server includes a query catalog system that allows you to define pre-written SQL queries that the LLM can discover and execute. This avoids bloating the LLM context window with hundreds of tool definitions.

How It Works

  1. Define Queries: Create YAML files in config/queries/ with your business queries
  2. Exposed as Resources: Queries are automatically exposed as MCP resources
  3. LLM Discovery: The LLM can search and read query definitions via resources
  4. Execute by ID: Use the execute_catalog_query tool to run queries

Benefits

  • No Context Bloat: Add hundreds of queries without impacting LLM performance
  • Natural Language: LLM matches user requests to catalog queries by description/tags
  • Consistent SQL: Pre-written, tested queries ensure reliability
  • Easy Maintenance: Update queries in YAML files without changing code

Example Query Definition

queries:
  - id: account_balance_reconciliation
    name: Account Balance Reconciliation
    description: Compare account balances between internal and external systems
    category: reconciliation
    tags: [reconciliation, accounting, balance]
    parameters:
      - name: start_date
        type: DATE
        description: Start date (YYYY-MM-DD)
        required: true
      - name: end_date
        type: DATE
        description: End date (YYYY-MM-DD)
        required: true
    sql: |
      SELECT account_id, internal_balance, external_balance,
             (internal_balance - external_balance) as discrepancy
      FROM ...
      WHERE date BETWEEN @start_date AND @end_date

Example Use Cases

The included example queries demonstrate:

  • Account Balance Reconciliation: Compare internal vs external balances
  • Wallet Balance Reconciliation: Verify wallet transactions match balances
  • Transaction Reconciliation: Match internal transactions with payment processors
  • Daily Transaction Summaries: Revenue and volume metrics by day
  • Customer Lifetime Value: LTV calculations and segmentation
  • Revenue by Category: Monthly breakdowns and trends

See config/queries/README.md for full documentation on creating your own queries.

Available Tools

1. execute_catalog_query

Execute a pre-defined query from the catalog. Use this for common business operations.

Parameters:

  • query_id (string, required): ID of the catalog query to execute
  • parameters (object, optional): Query parameters as key-value pairs
  • max_results (integer, optional): Maximum number of results to return

Example:

{
  "query_id": "account_balance_reconciliation",
  "parameters": {
    "start_date": "2024-01-01",
    "end_date": "2024-01-31"
  }
}

Discovery:

  • Use MCP resources (list_resources / read_resource) to discover available queries
  • The LLM automatically matches user requests to catalog queries

2. execute_query

Execute a custom BigQuery SQL query with optional parameters.

Parameters:

  • query (string, required): SQL query to execute (use @param_name for parameters)
  • parameters (object, optional): Query parameters as key-value pairs
  • max_results (integer, optional): Maximum number of results to return

Example:

{
  "query": "SELECT * FROM `project.dataset.table` WHERE name = @name AND age > @min_age",
  "parameters": {
    "name": "John",
    "min_age": 25
  },
  "max_results": 100
}

Catalog Query vs Custom SQL

The server provides intelligent guidance to the LLM on when to use catalog queries vs custom SQL:

  • Prefer catalog queries for common business operations and reporting
  • Use custom SQL for ad-hoc exploration or unique one-time queries
  • Resources expose catalog without polluting the tool list

3. list_datasets

List all datasets in the BigQuery project.

Parameters: None

Returns:

{
  "datasets": ["dataset1", "dataset2", "dataset3"]
}

4. list_tables

List all tables in a specific dataset.

Parameters:

  • dataset_id (string, required): Dataset ID to list tables from

Returns:

{
  "tables": ["table1", "table2", "table3"]
}

5. get_table_schema

Get the schema of a specific table.

Parameters:

  • dataset_id (string, required): Dataset ID
  • table_id (string, required): Table ID

Returns:

{
  "schema": [
    {
      "name": "id",
      "type": "INTEGER",
      "mode": "REQUIRED",
      "description": "Unique identifier"
    },
    {
      "name": "name",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": "User name"
    }
  ]
}

6. query_table

Query a table with optional WHERE clause and LIMIT.

Parameters:

  • dataset_id (string, required): Dataset ID
  • table_id (string, required): Table ID
  • limit (integer, optional): Maximum number of rows to return
  • where (string, optional): WHERE clause (without the WHERE keyword)

Example:

{
  "dataset_id": "my_dataset",
  "table_id": "users",
  "limit": 10,
  "where": "age > 18 AND country = 'US'"
}

MCP Resources

The server exposes catalog queries as MCP resources:

  • URI Format: bigquery://queries/{query_id}
  • Discovery: Use list_resources to see all available catalog queries
  • Details: Use read_resource to get full query details including SQL, parameters, and descriptions
  • No Context Bloat: Resources don't consume tool slots or context window space

Using with MCP Clients

Claude Desktop

Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS or %APPDATA%\Claude\claude_desktop_config.json on Windows):

Using pip or pipx (Recommended)

{
  "mcpServers": {
    "bigquery": {
      "command": "bigquery-mcp",
      "env": {
        "GCP_PROJECT_ID": "your-project-id",
        "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/credentials.json"
      }
    }
  }
}

Using uv (Development)

{
  "mcpServers": {
    "bigquery": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/python-mcp",
        "run",
        "bigquery-mcp"
      ],
      "env": {
        "GCP_PROJECT_ID": "your-project-id",
        "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/credentials.json"
      }
    }
  }
}

Other MCP Clients

The server communicates via stdio using the MCP protocol. Any MCP-compatible client can connect by running:

bigquery-mcp

Development

Running Type Checks

uv run basedpyright src/

Running Linter

uv run ruff check src/

Formatting Code

uv run ruff format src/

Running Tests

uv run pytest

Project Structure

python-mcp/
├── src/
│   └── bigquery_mcp/
│       ├── __init__.py
│       ├── server.py          # Main MCP server
│       ├── bigquery_client.py # BigQuery client wrapper
│       ├── tools.py           # Tool definitions
│       ├── models.py          # Pydantic models
│       └── config.py          # Configuration loader
├── config/
│   └── bigquery.yaml.example  # Example configuration
├── examples/
│   └── example_queries.sql    # Sample SQL queries
├── tests/                     # Test files
├── pyproject.toml            # Project configuration
└── README.md

Architecture

The server follows the MCP Toolbox pattern:

  1. Configuration Layer (config.py): Loads settings from YAML or environment variables
  2. Client Layer (bigquery_client.py): Wraps BigQuery API with async methods
  3. Tools Layer (tools.py): Defines MCP tools and handles parameter validation
  4. Server Layer (server.py): MCP server implementation with stdio transport
  5. Models Layer (models.py): Pydantic models for type safety

Comparison to MCP Toolbox

This implementation is inspired by googleapis/genai-toolbox but:

  • Python-based instead of Go
  • BigQuery-specific instead of multi-database
  • Simpler POC focused on core functionality
  • Direct MCP integration without separate control plane

License

MIT License - see file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.