bigquery-mcp by effcamp - MCP Server

BigQuery MCP Server

A Model Context Protocol (MCP) server for Google BigQuery, enabling AI assistants to query and interact with BigQuery datasets.

Features

Query Catalog: Pre-defined, tested SQL queries exposed as MCP resources (no context window bloat!)
Custom SQL Queries: Execute parameterized BigQuery SQL queries when needed
Dataset Discovery: List datasets and tables in your project
Schema Inspection: Get table schemas with field types and descriptions
Simple Queries: Query tables with WHERE clauses and LIMIT without writing SQL
Type-Safe: Built with Pydantic for robust data validation
Configurable: YAML-based configuration with environment variable support
Scalable: Add hundreds of business queries without impacting LLM performance

Installation

Prerequisites

Python 3.13+ (recommended for full features)
- For Python 3.10+: Most features work, but type hints use | syntax (requires 3.10+)
Google Cloud Platform account with BigQuery enabled
GCP credentials (service account or default credentials)

Method 1: Using pip (Recommended)

Install from GCP Artifact Registry:

# Configure pip to use your GCP Artifact Registry
pip install bigquery-mcp \
  --index-url https://us-central1-python.pkg.dev/<your-project-id>/<your-repo>/simple/

Or configure authentication once:

# Authenticate with GCP Artifact Registry
gcloud auth application-default login
pip install keyring
pip config set global.index-url https://us-central1-python.pkg.dev/<your-project-id>/<your-repo>/simple/

# Install the package
pip install bigquery-mcp

# Run the server
bigquery-mcp

Method 2: Using pipx (Isolated Install)

For isolated installation without affecting your global Python environment:

# Install pipx if not already installed
pip install pipx
pipx ensurepath

# Install bigquery-mcp from Artifact Registry
pipx install bigquery-mcp \
  --pip-args="--index-url https://us-central1-python.pkg.dev/<your-project-id>/<your-repo>/simple/"

# Run the server
bigquery-mcp

Method 3: Using uv (Development)

For development with local source:

# Clone the repository
git clone <your-repo-url>
cd python-mcp

# Install dependencies
uv sync

# Run the server
uv run bigquery-mcp

Which Method Should I Use?

Method	Best For	Pros	Cons
pip	Production, system-wide installs	Standard tooling, simple	May affect global Python environment
pipx	User installs, multiple versions	Isolated environments per app	Extra tool required
uv	Development, contributing	Fast, full dev environment	Requires source code

Configuration

Option 1: Configuration File (Recommended)

Create config/bigquery.yaml:

project_id: "your-gcp-project-id"
dataset_id: "your-default-dataset"  # optional
credentials_path: "/path/to/service-account.json"  # optional
location: "US"  # optional, default: US
max_results: 1000  # optional, default: 1000

Option 2: Environment Variables

export GCP_PROJECT_ID="your-gcp-project-id"
export GCP_DATASET_ID="your-default-dataset"  # optional
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"  # optional
export GCP_LOCATION="US"  # optional
export GCP_MAX_RESULTS="1000"  # optional

Option 3: Custom Config Path

export BIGQUERY_MCP_CONFIG="/custom/path/to/config.yaml"
uv run bigquery-mcp

Authentication

The server supports multiple authentication methods (in order of precedence):

Service Account File: Specified in config or GOOGLE_APPLICATION_CREDENTIALS
Application Default Credentials: Uses gcloud credentials

To set up gcloud credentials:

gcloud auth application-default login

Query Catalog

The server includes a query catalog system that allows you to define pre-written SQL queries that the LLM can discover and execute. This avoids bloating the LLM context window with hundreds of tool definitions.

How It Works

Define Queries: Create YAML files in config/queries/ with your business queries
Exposed as Resources: Queries are automatically exposed as MCP resources
LLM Discovery: The LLM can search and read query definitions via resources
Execute by ID: Use the execute_catalog_query tool to run queries

Benefits

No Context Bloat: Add hundreds of queries without impacting LLM performance
Natural Language: LLM matches user requests to catalog queries by description/tags
Consistent SQL: Pre-written, tested queries ensure reliability
Easy Maintenance: Update queries in YAML files without changing code

Example Query Definition

queries:
  - id: account_balance_reconciliation
    name: Account Balance Reconciliation
    description: Compare account balances between internal and external systems
    category: reconciliation
    tags: [reconciliation, accounting, balance]
    parameters:
      - name: start_date
        type: DATE
        description: Start date (YYYY-MM-DD)
        required: true
      - name: end_date
        type: DATE
        description: End date (YYYY-MM-DD)
        required: true
    sql: |
      SELECT account_id, internal_balance, external_balance,
             (internal_balance - external_balance) as discrepancy
      FROM ...
      WHERE date BETWEEN @start_date AND @end_date

Example Use Cases

The included example queries demonstrate:

Account Balance Reconciliation: Compare internal vs external balances
Wallet Balance Reconciliation: Verify wallet transactions match balances
Transaction Reconciliation: Match internal transactions with payment processors
Daily Transaction Summaries: Revenue and volume metrics by day
Customer Lifetime Value: LTV calculations and segmentation
Revenue by Category: Monthly breakdowns and trends

See config/queries/README.md for full documentation on creating your own queries.

Available Tools

1. execute_catalog_query

Execute a pre-defined query from the catalog. Use this for common business operations.

Parameters:

query_id (string, required): ID of the catalog query to execute
parameters (object, optional): Query parameters as key-value pairs
max_results (integer, optional): Maximum number of results to return

Example:

{
  "query_id": "account_balance_reconciliation",
  "parameters": {
    "start_date": "2024-01-01",
    "end_date": "2024-01-31"
  }
}

Discovery:

Use MCP resources (list_resources / read_resource) to discover available queries
The LLM automatically matches user requests to catalog queries

2. execute_query

Execute a custom BigQuery SQL query with optional parameters.

Parameters:

query (string, required): SQL query to execute (use @param_name for parameters)
parameters (object, optional): Query parameters as key-value pairs
max_results (integer, optional): Maximum number of results to return

Example:

{
  "query": "SELECT * FROM `project.dataset.table` WHERE name = @name AND age > @min_age",
  "parameters": {
    "name": "John",
    "min_age": 25
  },
  "max_results": 100
}

Catalog Query vs Custom SQL

The server provides intelligent guidance to the LLM on when to use catalog queries vs custom SQL:

Prefer catalog queries for common business operations and reporting
Use custom SQL for ad-hoc exploration or unique one-time queries
Resources expose catalog without polluting the tool list

3. list_datasets

List all datasets in the BigQuery project.

Parameters: None

Returns:

{
  "datasets": ["dataset1", "dataset2", "dataset3"]
}

4. list_tables

List all tables in a specific dataset.

Parameters:

dataset_id (string, required): Dataset ID to list tables from

Returns:

{
  "tables": ["table1", "table2", "table3"]
}

5. get_table_schema

Get the schema of a specific table.

Parameters:

dataset_id (string, required): Dataset ID
table_id (string, required): Table ID

Returns:

{
  "schema": [
    {
      "name": "id",
      "type": "INTEGER",
      "mode": "REQUIRED",
      "description": "Unique identifier"
    },
    {
      "name": "name",
      "type": "STRING",
      "mode": "NULLABLE",
      "description": "User name"
    }
  ]
}

6. query_table

Query a table with optional WHERE clause and LIMIT.

Parameters:

dataset_id (string, required): Dataset ID
table_id (string, required): Table ID
limit (integer, optional): Maximum number of rows to return
where (string, optional): WHERE clause (without the WHERE keyword)

Example:

{
  "dataset_id": "my_dataset",
  "table_id": "users",
  "limit": 10,
  "where": "age > 18 AND country = 'US'"
}

MCP Resources

The server exposes catalog queries as MCP resources:

URI Format: bigquery://queries/{query_id}
Discovery: Use list_resources to see all available catalog queries
Details: Use read_resource to get full query details including SQL, parameters, and descriptions
No Context Bloat: Resources don't consume tool slots or context window space

Using with MCP Clients

Claude Desktop

Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS or %APPDATA%\Claude\claude_desktop_config.json on Windows):

Using pip or pipx (Recommended)

{
  "mcpServers": {
    "bigquery": {
      "command": "bigquery-mcp",
      "env": {
        "GCP_PROJECT_ID": "your-project-id",
        "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/credentials.json"
      }
    }
  }
}

Using uv (Development)

{
  "mcpServers": {
    "bigquery": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/python-mcp",
        "run",
        "bigquery-mcp"
      ],
      "env": {
        "GCP_PROJECT_ID": "your-project-id",
        "GOOGLE_APPLICATION_CREDENTIALS": "/path/to/credentials.json"
      }
    }
  }
}

Other MCP Clients

The server communicates via stdio using the MCP protocol. Any MCP-compatible client can connect by running:

bigquery-mcp

Development

Running Type Checks

uv run basedpyright src/

Running Linter

uv run ruff check src/

Formatting Code

uv run ruff format src/

Running Tests

uv run pytest

Project Structure

python-mcp/
├── src/
│   └── bigquery_mcp/
│       ├── __init__.py
│       ├── server.py          # Main MCP server
│       ├── bigquery_client.py # BigQuery client wrapper
│       ├── tools.py           # Tool definitions
│       ├── models.py          # Pydantic models
│       └── config.py          # Configuration loader
├── config/
│   └── bigquery.yaml.example  # Example configuration
├── examples/
│   └── example_queries.sql    # Sample SQL queries
├── tests/                     # Test files
├── pyproject.toml            # Project configuration
└── README.md

Architecture

The server follows the MCP Toolbox pattern:

Configuration Layer (config.py): Loads settings from YAML or environment variables
Client Layer (bigquery_client.py): Wraps BigQuery API with async methods
Tools Layer (tools.py): Defines MCP tools and handles parameter validation
Server Layer (server.py): MCP server implementation with stdio transport
Models Layer (models.py): Pydantic models for type safety

Comparison to MCP Toolbox

This implementation is inspired by googleapis/genai-toolbox but:

Python-based instead of Go
BigQuery-specific instead of multi-database
Simpler POC focused on core functionality
Direct MCP integration without separate control plane

License

MIT License - see file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.