ledgerW/gdelt-mcp
If you are the rightful owner of gdelt-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A Model Context Protocol (MCP) server providing access to the GDELT 2.0 dataset via Google BigQuery.
GDELT 2.0 MCP Server
Model Context Protocol server providing AI agents with access to GDELT 2.0 (Global Database of Events, Language, and Tone) via Google BigQuery. Query global events, news coverage, themes, and visual content with cost-optimized workflows.
Why This MCP Rocks 🚀
- 🎯 Cost-Optimized Workflows: Built-in materialization tools make iterative analysis 50-100x cheaper
- 📊 4 Massive Datasets: Events, mentions, themes, and visual analysis - all queryable
- 🔐 Secure Architecture: Your credentials, your queries, your control
- ⚡ Production-Ready: Partition pruning, date filters, and cost estimation built-in
- 🎓 CAMEO Taxonomy: Built-in event/actor code lookups for precise queries
Quick Start
Prerequisites
- Python 3.11+
uvpackage manager (install)- GCP project with BigQuery API enabled
- Service account with BigQuery permissions
Installation
cd gdelt-mcp
uv sync
Getting GCP Credentials
- Create GCP Project and enable BigQuery API
- Create Service Account in IAM & Admin
- Grant Roles: BigQuery Data Viewer + BigQuery Job User
- Generate Key: Download JSON key file
- Extract Values:
project_id,private_key,client_email
Running the Server
# Development
uv run python server.py
# With FastMCP CLI
uv run fastmcp run server.py
# Interactive testing
uv run fastmcp dev server.py
Authentication
Bearer token format (pipe-delimited):
project_id|private_key|client_email
Example in Python:
from fastmcp import Client
from fastmcp.client.auth import BearerAuth
token = f"{project_id}|{private_key}|{client_email}"
client = Client("https://your-server/mcp", auth=BearerAuth(token=token))
LangChain Agent with Remote Server
Create a LangChain agent that uses the usage guide in its system prompt and organizes tools by their tags:
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain.agents import create_agent
import httpx
async def create_gdelt_agent():
"""Create a LangChain agent with GDELT tools organized by category."""
# Bearer token format: project_id|private_key|client_email
token = f"{project_id}|{private_key}|{client_email}"
# Initialize MultiServerMCPClient with remote server
client = MultiServerMCPClient({
"gdelt": {
"transport": "streamable_http",
"url": "https://your-server/mcp",
"auth": httpx.BasicAuth(username="", password=token),
}
})
# Get usage guide resource
usage_guide_blobs = await client.get_resources(
"gdelt",
uris="gdelt://guide/usage"
)
usage_guide = usage_guide_blobs[0].as_string()
# Get LangChain tools (these include metadata with tags)
tools = await client.get_tools(server_name="gdelt")
# Group tools by FastMCP tags
def group_tools_by_tag(tools):
categories = {
'schema': [],
'query': [],
'cost': [],
'cameo': [],
'guide': []
}
for tool in tools:
# Extract tags from LangChain tool metadata
if hasattr(tool, 'metadata') and tool.metadata:
tags = tool.metadata.get('tags', [])
for tag in tags:
if tag in categories:
categories[tag].append(tool.name)
return categories
tool_categories = group_tools_by_tag(tools)
# Format categorized tool list for system prompt
tools_by_category = "\n\n".join([
f"**{category.upper()} TOOLS**:\n" +
"\n".join([f"- {tool}" for tool in tools])
for category, tools in tool_categories.items()
if tools
])
# Build comprehensive system prompt
system_prompt = f"""You are a GDELT data analysis assistant with access to the complete GDELT 2.0 dataset via BigQuery.
IMPORTANT USAGE GUIDE:
{usage_guide}
AVAILABLE TOOLS BY CATEGORY:
{tools_by_category}
Follow the recommended workflow from the usage guide above. Always:
1. Check existing materialized subsets first (list_materialized_subsets)
2. Create materialized subsets for iterative analysis (create_materialized_subset)
3. Use appropriate date filters in all queries
4. Start with Events table (smallest) before querying larger tables"""
# Create agent with gpt-5.1
agent = create_agent(
"openai:gpt-5.1",
tools,
system_prompt=system_prompt
)
return agent
# Use the agent
agent = await create_gdelt_agent()
result = await agent.ainvoke({
"messages": "Analyze protests in the United States during the first week of January 2025"
})
print(result)
Key Features:
- Remote server with authentication: Uses bearer token auth (project_id|private_key|client_email)
- Usage guide integration: Fetches and includes the usage guide in system prompt
- Tag-based tool grouping: Organizes tools by category (schema, query, cost, cameo, guide)
- Comprehensive system prompt: Integrates best practices and tool categories
- GPT-5.1 model: Latest OpenAI model for optimal performance
MCP Client with Remote Server
Connect to the server using MCP's client API to access tools and resources directly:
from mcp import Client
import httpx
# Bearer token format: project_id|private_key|client_email
token = f"{project_id}|{private_key}|{client_email}"
server_url = "https://gdelt-mcp.fastmcp.app/mcp"
auth = httpx.BasicAuth(username="", password=token)
client = Client(server_url, auth=auth)
async with client:
# Get usage guide resource
usage_guide = await client.read_resource("gdelt://guide/usage")
print(f"Usage guide: {usage_guide[:200]}...")
# List all tools
tools = await client.list_tools()
print(f"Available tools: {[t.name for t in tools]}")
# Filter tools by tag
schema_tools = [
tool for tool in tools
if hasattr(tool, 'meta') and tool.meta and
tool.meta.get('_fastmcp', {}) and
'schema' in tool.meta.get('_fastmcp', {}).get('tags', [])
]
query_tools = [
tool for tool in tools
if hasattr(tool, 'meta') and tool.meta and
tool.meta.get('_fastmcp', {}) and
'query' in tool.meta.get('_fastmcp', {}).get('tags', [])
]
cost_tools = [
tool for tool in tools
if hasattr(tool, 'meta') and tool.meta and
tool.meta.get('_fastmcp', {}) and
'cost' in tool.meta.get('_fastmcp', {}).get('tags', [])
]
print(f"Found {len(schema_tools)} schema tools")
print(f"Found {len(query_tools)} query tools")
print(f"Found {len(cost_tools)} cost optimization tools")
Key Features:
- Remote server with authentication: Uses bearer token auth
- Direct resource access: Read usage guides and schemas
- Tool listing and filtering: Filter tools by FastMCP tags
- Simple API: Straightforward client interface for tool discovery
Available Tool Tags:
schema: Schema documentation toolsquery: Direct query tools (events, eventmentions, gkg, cloudvision)cost: Cost optimization tools (estimate, materialize, list, query subsets)cameo: CAMEO taxonomy lookup toolsguide: Usage guide and best practices
MCP Resources
Schema documentation and best practices:
gdelt://events/schema- Events table (who/what/where/when)gdelt://eventmentions/schema- Media mentions and sourcesgdelt://gkg/schema- Global Knowledge Graph (themes, entities)gdelt://cloudvision/schema- Visual analysis of news imagesgdelt://guide/usage- Cost-effective querying guide 🚨
MCP Tools
Query Tools
query_events - Query the Events table (smallest, query first)
- Parameters:
where_clause,select_fields,limit,order_by - ⚠️ REQUIRED: Include
SQLDATE >= YYYYMMDDfilter
query_eventmentions - Query media mentions of events
- Parameters:
where_clause,select_fields,limit - 💡 RECOMMENDED: Use
GLOBALEVENTIDfrom Events queries
query_gkg - Query Global Knowledge Graph (largest/most expensive)
- Parameters:
where_clause,select_fields,limit - ⚠️ REQUIRED: Include
DATE >= YYYYMMDDhhmmssfilter
query_cloudvision - Query visual analysis of news images
- Parameters:
where_clause,select_fields,limit - 💡 RECOMMENDED: Include
timestampfilters
Cost Optimization Tools
estimate_query_cost - Check query cost before execution (dry-run)
- Prevents expensive accidents
- Get cost warnings for >1GB scans
create_materialized_subset - Create filtered subset with auto-expiration
- Filter once, query many times (50-100x cheaper)
- Auto-expires in 48 hours (configurable)
- Must include date filters in where_clause
list_materialized_subsets - View your materialized subsets
- Shows expiration status, size, row count
query_materialized_subset - Query subsets (near-free!)
- ~$0.00001 per query vs $0.01+ on full tables
- Perfect for iterative analysis
CAMEO Taxonomy Tools
get_cameo_event_codes - Get CAMEO event code taxonomy
- Parameters:
category(e.g., "19"),search_keyword(e.g., "protest") - 300+ hierarchical event codes
get_cameo_actor_codes - Get CAMEO actor code taxonomy
- Parameters:
code_type("countries", "types", or "all") - Country codes and actor type classifications
Cost-Optimized Workflow
🎯 The recommended approach for all analysis:
# Step 1: Create subset once (filters data, small cost)
create_materialized_subset(
source_table="events",
subset_name="ukraine_jan2025",
where_clause="SQLDATE BETWEEN 20250101 AND 20250131 AND (Actor1CountryCode = 'UKR' OR Actor2CountryCode = 'UKR')",
select_fields="SQLDATE, Actor1Name, Actor2Name, EventCode",
description="Ukraine events January 2025"
)
# Step 2: Query subset multiple times (near-free)
query_materialized_subset(
subset_name="ukraine_jan2025",
where_clause="EventCode LIKE '19%'", # Military events
limit=1000
)
query_materialized_subset(
subset_name="ukraine_jan2025",
where_clause="GoldsteinScale < -5", # High conflict
limit=500
)
Cost Comparison:
- Direct queries (3x): 3 × $0.01 = $0.03
- Materialized workflow: $0.01 + 3 × $0.00001 = $0.01
- Savings: 66% (scales with more queries!)
📖 Read the gdelt://guide/usage resource for complete guidance.
Example Queries
Find Recent Military Conflicts
query_events(
where_clause="EventRootCode = '19' AND SQLDATE >= 20250101",
select_fields="SQLDATE, Actor1Name, Actor2Name, EventCode",
limit=100
)
Track Protests in a Country
query_events(
where_clause="EventRootCode = '14' AND Actor1CountryCode = 'USA' AND SQLDATE >= 20250101",
limit=50
)
Analyze Media Coverage
# First get events
events = query_events(
where_clause="Actor1Name LIKE '%Biden%' AND SQLDATE >= 20250101",
limit=10
)
# Then get mentions for specific event
mentions = query_eventmentions(
where_clause=f"GLOBALEVENTID = {events[0]['GLOBALEVENTID']}"
)
Find Articles About Climate Change
query_gkg(
where_clause="Themes LIKE '%ENV_CLIMATECHANGE%' AND DATE >= 20250101000000",
select_fields="DATE, Themes, V2Locations, V2Tone",
limit=100
)
GDELT Tables Overview
| Table | Size | Update Frequency | Best For |
|---|---|---|---|
| Events | Small (~100-200MB/day) | 15 minutes | Event tracking, actor analysis |
| EventMentions | Medium (~1-2GB/day) | 15 minutes | Source analysis, media coverage |
| GKG | Large (~5-10GB/day) | 15 minutes | Themes, entities, sentiment |
| CloudVision | Variable | 15 minutes | Visual content analysis |
Query Priority: Events → Mentions → GKG → CloudVision
Best Practices
- Always use date filters to enable partition pruning
- Start with Events table - it's the smallest
- Use
estimate_query_costbefore expensive queries - Create materialized subsets for iterative analysis
- Select specific fields instead of
*when possible - Check CAMEO codes before querying events
Troubleshooting
BigQuery Access Denied
- Verify service account has BigQuery Data Viewer + Job User roles
- Check that BigQuery API is enabled in your GCP project
Expensive Queries
- Add date filters (SQLDATE or DATE)
- Select fewer fields (not
*) - Use materialization for repeated queries
- Check costs with
estimate_query_costfirst
Query Timeout
- Reduce date range
- Limit number of results
- Select fewer fields
Resources
- GDELT Project
- GDELT 2.0 Documentation
- CAMEO Event Codes
- FastMCP Documentation
- Google BigQuery Pricing
License
This MCP server implementation is provided as-is. GDELT data is freely available for research and analysis.