bbrzycki/sports-mcp-server
If you are the rightful owner of sports-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The sports-mcp-server is a specialized server designed to handle model context protocols for sports-related applications, providing efficient data processing and communication capabilities.
sports-mcp-server
Model Context Protocol (MCP) server that exposes curated sports analytics to large language model agents.
High-level scope
- Provide read-only, intent-focused endpoints (e.g.,
get_game_summary,compare_pitchers,list_active_bet_signals). - Wrap warehouse + MLflow + control-plane APIs to keep the agent sandboxed to domain-appropriate data.
- Enforce authentication, rate limits, and response shaping to prevent arbitrary SQL access.
Skeleton plan
app/
main.py # MCP entrypoint
handlers/ # Functions per capability
clients/ # Warehouse + control-plane adapters
schemas/
responses/ # Pydantic models describing payloads
MVP status
The service now loads every JSON spec under dataset_registry.curated/, exposes their metadata via:
GET /datasets— enumerate available datasets (driven entirely by the curated registry).GET /datasets/{dataset_id}— describe schema, columns, primary keys, and documentation.POST /datasets/{dataset_id}/query— execute parameterised SQL (eq/gte/lte filters, column projection, limit/offset) against Postgres using those definitions.
As soon as you add or edit a dataset JSON file and restart the server, the endpoints surface the new schema automatically—no more stubbed data living in the codebase.
Dataset metadata scaffolding
To bootstrap real datasets from Postgres without hand-copying every column, use the introspection script:
pip install -e . # ensures psycopg is available
cd scripts
python generate_dataset_registry.py \
--database sports_dw_dev \
--schemas marts_baseball staging_baseball \
--output-dir ../dataset_registry.generated
The script connects using the usual PG* env vars (or CLI flags), enumerates every base table in the provided schemas, and emits a JSON registry with:
- One file per table at
dataset_registry.generated/<schema>/<table>.json dataset_id=<schema>.<table>- Auto-generated display name (
"Pitcher Game Logs", etc.) - Columns with name + Postgres data type and blank description placeholders
You can then fill in dataset/table descriptions, primary keys, and richer column docs directly in the generated file before wiring those datasets into the MCP server.
Auto-annotating curated tables
After copying the tables you plan to expose into dataset_registry.curated/, run:
python scripts/annotate_registry.py
The annotator pulls the latest Baseball Savant CSV documentation plus a handful of heuristics to populate dataset descriptions, primary keys, and column blurbs (e.g., counts, percentages, IDs). Review and edit the output as needed, but it should give each dataset enough context for an LLM agent to reason about the available fields.
Runtime configuration
DATASET_REGISTRY_DIR(defaultdataset_registry.curated) — directory the API scans for dataset JSON specs.POSTGRES_DSN— override the composed connection string if you prefer to supply a single DSN (otherwise the usualPG*env vars are read).
Connecting to Postgres
The service expects Postgres connection details via environment variables (defaulting to the sports-data-platform warehouse container when both stacks share the sports_net network):
PGHOST=warehouse
PGPORT=5432
PGDATABASE=sports_dw
PGUSER=sports_admin
PGPASSWORD=sports_admin_password
To run locally with Docker Compose, ensure sports-data-platform is up first, then launch this service:
docker compose up --build
When running directly on the host, point PGHOST/PGPORT at the exposed warehouse port (e.g., localhost:5433).
Docker quick start
docker network create sports_net # one-time shared network
docker compose up --build
The placeholder service is reachable at http://localhost:9100/healthz.