waaronmorris/mcp-bigquery-schema
If you are the rightful owner of mcp-bigquery-schema and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Server for BigQuery Schema provides secure, read-only access to BigQuery schema information, allowing exploration of dataset structures and metadata without accessing actual data.
BigQuery Schema MCP Server
An MCP (Model Context Protocol) server that provides safe, read-only access to BigQuery schema information. This server enables LLMs to explore dataset structures, table schemas, and metadata without accessing actual data.
Features
- Schema-Only Access: Retrieve table and view schemas without accessing row data
- INFORMATION_SCHEMA Queries: Query BigQuery metadata catalog safely
- Resource Discovery: List all available datasets, tables, and views
- Type Safety: Full TypeScript implementation with strict type checking
- Secure by Design: No access to actual data, only structural information
Installation
Via npx (Recommended - No Installation Required)
Use directly with npx in your Claude Desktop configuration - see Configuration section below.
Global Installation
npm install -g @waaronmorris/mcp-bigquery-schema@1.1.1
From Source
git clone https://github.com/waaronmorris/mcp-bigquery-schema.git
cd mcp-bigquery-schema
npm install
npm run build
Authentication
The server supports two authentication methods:
1. Service Account Key File (Recommended)
Create a service account in Google Cloud Console with the following roles:
- See Security Below
Download the JSON key file and provide it via the --key-file argument.
2. Application Default Credentials
If no key file is specified, the server will use Application Default Credentials (ADC). Set up ADC by running:
gcloud auth application-default login
Configuration
Add the server to your MCP client configuration:
Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"bigquery-schema": {
"command": "npx",
"args": [
"@waaronmorris/mcp-bigquery-schema@1.1.1",
"--project-id", "your-project-id",
"--location", "US",
"--key-file", "/path/to/service-account-key.json"
]
}
}
}
Without Key File (Using ADC)
{
"mcpServers": {
"bigquery-schema": {
"command": "npx",
"args": [
"@waaronmorris/mcp-bigquery-schema@1.1.1",
"--project-id", "your-project-id",
"--location", "US"
]
}
}
}
Alternative: Global Installation
If you prefer to install globally:
npm install -g @waaronmorris/mcp-bigquery-schema@1.1.1
Then use in Claude Desktop config:
{
"mcpServers": {
"bigquery-schema": {
"command": "mcp-bigquery-schema",
"args": [
"--project-id", "your-project-id",
"--location", "US"
]
}
}
}
Available Tools
explore_datasets
List and explore available BigQuery datasets in the project with metadata.
Parameters:
includeMetadata(optional): Include detailed metadata for each dataset (location, creation time, etc.)datasetId(optional): Specific dataset ID to get detailed information aboutincludeTableDescriptions(optional): Include table descriptions when listing tables (requires additional API calls)
Usage Examples:
// List all datasets with basic info
{}
// List all datasets with metadata
{
"includeMetadata": true
}
// Get specific dataset details with all tables and descriptions
{
"datasetId": "my_dataset",
"includeTableDescriptions": true
}
get_table_metadata
Get comprehensive metadata for a specific table including descriptions, labels, and schema details.
Parameters:
dataset(required): The dataset containing the tabletable(required): The table name to get metadata forincludeColumnStats(optional): Include column-level statistics from INFORMATION_SCHEMA
Usage Examples:
// Get basic table metadata with descriptions
{
"dataset": "my_dataset",
"table": "users"
}
// Get table metadata with column statistics
{
"dataset": "my_dataset",
"table": "users",
"includeColumnStats": true
}
Returns:
- Table description and labels
- Column names, types, and descriptions
- Partitioning and clustering configuration
- Size and row count statistics
- Creation and modification timestamps
- Optional: Usage statistics from query history
get_table_usage_stats
Analyze table usage patterns and popularity from query job history.
Parameters:
dataset(optional): The dataset to analyze (analyzes all if not specified)table(optional): Specific table to analyzedays(optional): Number of days of history to analyze (default: 7)limit(optional): Maximum number of results to return (default: 20)
Usage Example:
// Get most popular tables across all datasets
{
"days": 7,
"limit": 10
}
// Returns:
{
"period": "Last 7 days",
"totalTablesAnalyzed": 10,
"mostPopularTables": [
{
"dataset": "analytics",
"table": "user_events",
"queryCount": 150,
"uniqueUsers": 12,
"avgBytesProcessed": 1024000000,
"lastAccessed": "2024-01-15T10:30:00Z",
"estimatedCostUSD": 0.70
}
]
}
suggest_related_tables
Find tables commonly queried together based on usage patterns.
Parameters:
dataset(required): The dataset containing the tabletable(required): The table to find related tables fordays(optional): Number of days of history to analyze (default: 30)
Usage Example:
// Find tables often joined with user_events
{
"dataset": "analytics",
"table": "user_events",
"days": 30
}
// Returns:
{
"sourceTable": "analytics.user_events",
"analysisPeriod": "Last 30 days",
"relatedTables": [
{
"dataset": "analytics",
"table": "sessions",
"coOccurrenceCount": 45,
"suggestedJoinLikelihood": "high",
"uniqueUsers": 8
}
]
}
query_information_schema
Query BigQuery INFORMATION_SCHEMA views to retrieve metadata about datasets, tables, columns, and more.
Parameters:
dataset(required): The dataset to querytable(required): The INFORMATION_SCHEMA table to queryfilter(optional): WHERE clause conditions
Supported INFORMATION_SCHEMA tables:
TABLES- List all tables and views in a datasetCOLUMNS- Get column information for tablesVIEWS- View definitionsTABLE_OPTIONS- Table-level options and settingsCOLUMN_FIELD_PATHS- Nested field informationPARTITIONS- Partition informationTABLE_SNAPSHOTS- Table snapshot metadataTABLE_STORAGE- Storage statisticsROUTINES- Stored procedures and functionsROUTINE_OPTIONS- Routine optionsPARAMETERS- Routine parametersSCHEMATA- Dataset informationSCHEMATA_OPTIONS- Dataset optionsJOBS- Query job history and metadataJOBS_BY_PROJECT- Project-level job information
Example queries:
// List all tables in a dataset
{
"dataset": "my_dataset",
"table": "TABLES",
"filter": "table_type = 'BASE TABLE'"
}
// Get columns for a specific table
{
"dataset": "my_dataset",
"table": "COLUMNS",
"filter": "table_name = 'my_table'"
}
// Find all partitioned tables
{
"dataset": "my_dataset",
"table": "TABLES",
"filter": "is_partitioned = TRUE"
}
Available Resources
The server provides schema information as resources for all tables and views in your BigQuery project:
- URI Format:
bigquery://project-id/dataset-id/table-id/schema - Content: JSON schema definition including field names, types, modes, and descriptions
Examples
Exploring Available Datasets
// The LLM can request:
"What datasets are available in this project?"
// This will use the explore_datasets tool:
{
"includeMetadata": true
}
// Returns:
{
"projectId": "my-project",
"datasetCount": 3,
"datasets": [
{
"id": "analytics",
"location": "US",
"creationTime": "2024-01-01T00:00:00.000Z",
"description": "Analytics data",
"tableCount": 15,
"labels": {}
},
// ...
]
}
Getting Table Schema
// The LLM can request:
"Show me the schema for the users table in the analytics dataset"
// This will fetch the resource:
// bigquery://my-project/analytics/users/schema
Exploring Dataset Structure
// The LLM can request:
"What tables are in the sales dataset?"
// This will query:
{
"dataset": "sales",
"table": "TABLES"
}
Finding Column Information
// The LLM can request:
"Show me all columns in the orders table"
// This will query:
{
"dataset": "sales",
"table": "COLUMNS",
"filter": "table_name = 'orders'"
}
Working with Table and Column Descriptions
// Get comprehensive table metadata including descriptions
{
"dataset": "analytics",
"table": "user_events"
}
// Returns:
{
"dataset": "analytics",
"table": "user_events",
"description": "Stores all user interaction events for analytics",
"labels": {
"team": "analytics",
"pii": "true"
},
"schema": {
"fields": [
{
"name": "event_id",
"type": "STRING",
"mode": "REQUIRED",
"description": "Unique identifier for each event"
},
{
"name": "user_id",
"type": "STRING",
"mode": "REQUIRED",
"description": "Reference to the user who triggered the event"
},
{
"name": "event_type",
"type": "STRING",
"mode": "NULLABLE",
"description": "Type of event (click, view, purchase, etc.)"
}
]
},
"partitioning": {
"timePartitioning": {
"type": "DAY",
"field": "event_timestamp"
}
}
}
// Query INFORMATION_SCHEMA for column descriptions
{
"dataset": "analytics",
"table": "COLUMNS",
"filter": "table_name = 'user_events' AND description IS NOT NULL"
}
Using Table Usage Analytics
The MCP server can analyze query history to identify popular tables and common join patterns:
// Find the most frequently queried tables in the last week
{
"tool": "get_table_usage_stats",
"days": 7,
"limit": 5
}
// Get usage statistics with table metadata
{
"tool": "get_table_metadata",
"dataset": "analytics",
"table": "user_events",
"includeUsageStats": true
}
// Discover tables commonly used together
{
"tool": "suggest_related_tables",
"dataset": "analytics",
"table": "user_events"
}
Benefits of Usage Analytics:
- Prioritize Investigation: Focus on the most frequently accessed tables first
- Understand Relationships: Discover which tables are commonly joined
- Cost Optimization: Identify expensive queries and optimize them
- User Patterns: See which teams or services use specific tables
- Data Freshness: Know when tables were last accessed
Security
Multi-Layer Security Architecture
This server implements defense-in-depth with multiple security layers to ensure it can NEVER access actual data:
Layer 1: Input Validation
- Project ID Validation: Only lowercase letters, numbers, and hyphens
- Dataset Name Validation: Only alphanumeric and underscores
- Table Whitelist: Only approved INFORMATION_SCHEMA tables
- Filter Sanitization: Blocks SQL injection attempts
Layer 2: Query Pattern Enforcement
- Strict Query Format: Only
SELECT * FROM INFORMATION_SCHEMAallowed - Forbidden Keywords: Blocks INSERT, UPDATE, DELETE, DROP, CREATE, etc.
- Forbidden Patterns: Blocks JOINs, UNIONs, subqueries, CTEs
- Comment Prevention: Blocks SQL comments (--, /*, #)
Layer 3: Safe Query Execution
- Query Wrapper: Additional validation before execution
- Read-Only Mode: Enforced at BigQuery client level
- Timeout Limits: 30-second maximum query execution
- Result Size Limits: Maximum 10,000 rows
Layer 4: Result Validation
- Field Inspection: Checks for PII-like field names
- Data Pattern Detection: Identifies potential data leaks
- Size Validation: Prevents large data exfiltration
Recommended IAM Permissions
For maximum security, create a custom IAM role with ONLY these permissions:
{
"title": "BigQuery Schema Reader",
"description": "Read-only access to BigQuery metadata only",
"permissions": [
"bigquery.datasets.get",
"bigquery.tables.get",
"bigquery.tables.list",
"bigquery.jobs.create"
]
}
DO NOT grant these permissions:
- ❌
bigquery.tables.getData- Allows reading table data - ❌
bigquery.tables.export- Allows exporting data - ❌
bigquery.datasets.update- Allows modifying datasets - ❌
bigquery.tables.update- Allows modifying tables - ❌
bigquery.tables.delete- Allows deleting tables
Security Best Practices
- Use a Dedicated Service Account: Create a service account specifically for this MCP server
- Minimal Permissions: Use the custom role above, not predefined roles
- Regular Audits: Review Cloud Audit Logs for any suspicious queries
- Network Restrictions: Use VPC Service Controls if available
- Key Rotation: Rotate service account keys regularly
- Monitor Usage: Set up alerts for unusual query patterns
Audit Logging
All queries are logged to stderr with [AUDIT] prefix for monitoring:
[AUDIT] Executing INFORMATION_SCHEMA query: SELECT * FROM `project.dataset.INFORMATION_SCHEMA.TABLES`
[AUDIT] Query completed in 123ms, returned 42 rows
What This Server CANNOT Do
- ❌ Cannot read actual table data
- ❌ Cannot execute arbitrary SQL queries
- ❌ Cannot modify any data or schema
- ❌ Cannot access tables outside INFORMATION_SCHEMA
- ❌ Cannot perform JOIN operations with data tables
- ❌ Cannot execute stored procedures or functions
- ❌ Cannot export or import data
Development
Building from Source
npm install
npm run build
Development Mode
npm run dev
Clean Build
npm run clean
npm run build
Contributing
This project uses Conventional Commits to maintain a clear and meaningful commit history.
Making Commits
We use Commitizen to help create properly formatted commit messages:
# Instead of git commit, use:
npm run commit
# This will launch an interactive prompt to help you create a conventional commit
The interactive prompt will guide you through:
- Type: Select the type of change (feat, fix, docs, etc.)
- Scope: Optionally specify what part of the codebase changed
- Description: Write a short description
- Body: Optionally provide additional details
- Breaking Changes: Note any breaking changes
- Issues: Reference any related issues
Commit Types
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting, etc.)refactor: Code refactoringperf: Performance improvementstest: Adding or updating testsbuild: Build system changesci: CI/CD changeschore: Other changes that don't modify src or test filesrevert: Revert a previous commitsecurity: Security-related changes
Examples
# Using npm run commit:
npm run commit
# Follow the prompts to create: "feat(auth): add service account validation"
# Manual conventional commit (if needed):
git commit -m "fix: resolve location parameter validation issue"
git commit -m "feat(security): add gitleaks pre-commit hook"
git commit -m "docs: update README with commitizen instructions"
Pre-commit Hooks
All commits are automatically checked for:
- Conventional commit format: Ensures commit messages follow the standard
- Secret detection: Scans for potential credentials or API keys
- File validation: Blocks commits containing service account files
If a commit is rejected, review the error message and adjust accordingly.
Requirements
- Node.js >= 14.0.0
- Google Cloud Project with BigQuery API enabled
- Appropriate IAM permissions for BigQuery access
Troubleshooting
Authentication Issues
If you encounter authentication errors:
- Verify your service account key file path is correct
- Ensure the service account has the required BigQuery permissions
- Check that the BigQuery API is enabled in your Google Cloud project
Permission Errors
The service account needs:
bigquery.datasets.getbigquery.tables.getbigquery.tables.listbigquery.jobs.create(for running INFORMATION_SCHEMA queries)
Location Issues
Ensure the --location parameter matches your BigQuery datasets' location. Common values:
US- United StatesEU- European Unionus-central1- Specific regional locations
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues and questions, please use the GitHub Issues page.