odp-mcp

socrata/odp-mcp

3.2

If you are the rightful owner of odp-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Socrata SODA MCP Server is a lightweight Model Context Protocol server designed to provide read-only access to Socrata's SODA API, enabling dataset search, metadata retrieval, dataset preview, and structured queries.

Tools
4
Resources
0
Prompts
0

Socrata SODA MCP Server

Lightweight Model Context Protocol server that exposes read-only tools for Socrata’s SODA API (dataset search, metadata, preview, and structured queries).

Setup

pnpm install
cp .env.example .env   # edit domains/tokens
pnpm build
pnpm start             # runs stdio MCP server
PORT=3000 pnpm start   # runs HTTP bridge on the given port (Heroku/localhost)

Environment variables:

  • SODA_DOMAINS: optional comma-separated Socrata domains to pre-warm clients (not required; any domain is allowed at call time)
  • SODA_APP_TOKEN: optional app token applied to all domains by default (can be overridden per call)
  • SODA_REQUESTS_PER_HOUR: optional client-side throttle per domain
  • HTTP_API_KEYS: optional comma-separated keys to protect the HTTP bridge (expects X-API-Key header)
  • MANIFEST_SHA256: optional sha256 of sorted tool names; startup fails if mismatched

Tools

  • list_datasets(domain, query?, limit?)
  • get_metadata(domain, uid)
  • preview_dataset(domain, uid, limit?)
  • query_dataset(domain, uid, select?, where?, order?, group?, having?, limit?, offset?)

Tool input examples (HTTP POST /tools/{name}):

  • list_datasets: { "domain": "data.cityofnewyork.us", "query": "311", "limit": 5 }
  • get_metadata: { "domain": "data.cityofnewyork.us", "uid": "erm2-nwe9" }
  • preview_dataset: { "domain": "data.cityofnewyork.us", "uid": "erm2-nwe9", "limit": 10 }
  • query_dataset: { "domain": "data.cityofnewyork.us", "uid": "erm2-nwe9", "select": ["unique_key","complaint_type"], "where": "borough = 'MANHATTAN'", "order": ["created_date DESC"], "limit": 5 }

Defaults and guards:

  • Limits clamp to max 5000 rows; offsets clamp to 50,000.
  • Preview default limit: 50. Query default limit: 500.
  • Client-side rate limiter if SODA_REQUESTS_PER_HOUR is set.

Development

pnpm test          # unit tests; e2e skipped by default
RUN_E2E=true pnpm test   # includes live call to NYC 311 dataset
pnpm exec tsc --noEmit   # type-check
pnpm exec tsc --noEmit --watch   # optional faster inner-loop typecheck
# Optional MCP e2e (live): RUN_E2E_MCP=true pnpm test

Notes

  • Transport: stdio via @modelcontextprotocol/sdk; server name socrata-soda-mcp.
  • Responses are serialized as JSON text to satisfy SDK content typings; callers should parse the text payload.
  • Write operations are intentionally excluded; extend tools with strong validation before enabling Producer APIs.
  • E2E tests are opt-in to avoid network flakiness; set RUN_E2E=true to exercise a live Socrata dataset.
  • HTTP bridge extras: GET /healthz, GET /tools (manifest), HTTPS redirect when behind proxy, optional API key gate via HTTP_API_KEYS (uses X-API-Key).
  • Optional manifest integrity: set MANIFEST_SHA256 (sha256 of sorted tool names) to fail closed on mismatch.
  • Per-call auth overrides supported on every tool: appToken, username+password (basic), or bearerToken.
  • SoQL safety: identifiers validated; limit/offset clamped; $query only used when structured clauses present.
  • Rate limiting: optional per-domain client bucket (SODA_REQUESTS_PER_HOUR); HTTP client retries 429/5xx with backoff.

HTTP Endpoints (bridge)

  • GET / — friendly descriptor with name/description/endpoints/capabilities.
  • GET /healthz and GET /readyz — liveness/readiness probes.
  • GET /tools or /manifest — tool manifest with schemas and examples.
  • POST /tools/{tool_name} — invoke a tool; body is the tool input JSON.
    • Pass dataset domain via domain and dataset id via uid on metadata/preview/query tools.

Implementation Details & Behavior

  • Tools:
    • list_datasets: catalog search with domains filter. Example {domain:"data.cityofnewyork.us", query:"311", limit:5}.
    • get_metadata: /api/views/{uid}.json, cached via LRU (keyed by domain+uid). Example {domain:"data.cityofnewyork.us", uid:"nc67-uf89"}.
    • preview_dataset: /resource/{uid}.json with $limit; default 50, max 5000. Example {domain:"data.cityofnewyork.us", uid:"nc67-uf89", limit:20}.
    • query_dataset: structured SoQL builder; uses $query only when select/where/order/group/having present, otherwise $limit/$offset. Defaults limit 500 (max 5000), offset 0 (max 50k). Example {domain:"data.cityofnewyork.us", uid:"nc67-uf89", select:["unique_key","complaint_type"], where:"borough = 'MANHATTAN'", order:["created_date DESC"], limit:5}.
  • Http client: fetch-based with AbortController timeout, auth headers (app token/basic/bearer), retry (3x exp backoff) on 429/5xx, optional client-side rate limiter (requestsPerHour), JSON parsing, and header normalization.
  • Clamping: shared in src/limits.ts (limit max 5000, default query 500, preview 50; offset max 50k).
  • Config: env-driven (SODA_DOMAINS, SODA_APP_TOKEN, SODA_REQUESTS_PER_HOUR).
  • Caching: metadata LRU size 100.
  • Tests: 19 suites (33 tests) including auth/retry/rate-limit, clamping, error bubbling, MCP registration/validation, and optional live e2e (RUN_E2E, RUN_E2E_MCP).