gcp-dataflow-template-kit

bharath03-a/gcp-dataflow-template-kit

3.2

If you are the rightful owner of gcp-dataflow-template-kit and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Dataflow Template MCP Server is designed to streamline the creation and deployment of standardized Dataflow projects, providing both a server and CLI tool for developers.

Tools
2
Resources
0
Prompts
0

Dataflow Template MCP Server

MCP server and CLI tool for creating standardized Dataflow projects from templates. The template is built on Apache Beam, providing a unified programming model for both batch and streaming data processing. Created to help developers get started on a standard format without worrying much about the structure of their pipeline package and its deployment setup.

Demo MCP Endpoint: https://dataflow-mcp-server-308763801667.us-central1.run.app/mcp

Note: The endpoint is currently public and unauthenticated. Authentication will be implemented in a future release (see Future Changes section).

Demo Video: Watch a walkthrough of using the MCP server with an AI coding assistant:

What's Included

  • MCP server for AI coding assistants (Cursor, Claude, etc.)
  • CLI tool for manual project creation
  • Standardized Dataflow template structure
  • GitHub Actions workflow for automated deployment

Getting Started

Clone the repository:

git clone https://github.com/bharath03-a/gcp-dataflow-template-kit
cd gcp-dataflow-template-kit

Installation

# Install dependencies
pip install -e .

# Or using uv
uv sync

# Install pre-commit hooks
pre-commit install

Usage

CLI Tool

Create a new Dataflow project:

dataflow-create create /path/to/new-project

Or run the template locally:

dataflow-create run-template

MCP Server (Local/Stdio)

Add to your MCP client configuration (e.g., ~/.cursor/mcp.json):

{
  "mcpServers": {
    "dataflow-template": {
      "command": "python",
      "args": ["-m", "mcp_server.mcp_server"],
      "cwd": "/path/to/dataflow_template"
    }
  }
}

MCP Server (HTTP/Cloud Run)

The server can run as an HTTP service for remote access.

Set environment variables:

export MCP_TRANSPORT=streamable-http
export MCP_HOST=0.0.0.0
export MCP_PORT=8000

Run the server:

python -m mcp_server.mcp_server

Deployment

Local Docker Build

docker build -t dataflow-mcp-server .
docker run -p 8080:8080 \
  -e MCP_TRANSPORT=streamable-http \
  -e MCP_HOST=0.0.0.0 \
  dataflow-mcp-server

Cloud Run Deployment

The project includes a GitHub Actions workflow for automatic deployment to Cloud Run.

  1. Set up GitHub Secrets:

    • GCP_PROJECT_ID: Your Google Cloud project ID
    • GCP_SA_KEY: Service account JSON key
  2. Push to main branch to trigger deployment

  3. Manual deployment:

gcloud run deploy dataflow-mcp-server \
  --source . \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars MCP_TRANSPORT=streamable-http \
  --port 8080

Add to your MCP client configuration (e.g., ~/.cursor/mcp.json):

{
  "mcpServers": {
    "dataflow-template": {
      "url": "<CLOUD-RUN-ENDPOINT>",
      "transport": "http"
    }
  }
}

Testing

Test the MCP server:

python tests/test_mcp_server_remote.py

Set MCP_SERVER_URL environment variable to test against a remote server.

Project Structure

.
├── mcp_server/          # MCP server implementation
├── cli/                 # CLI tool
├── template_files/      # Dataflow template files
├── tests/               # Test script
├── .github/workflows/   # GitHub Actions workflow
└── Dockerfile           # Docker configuration

Development

Run linting:

ruff check .
ruff format .

Available Tools

  • create_dataflow_project: Creates a new Dataflow project from the template
  • health_check: Checks if the MCP server is running and template is accessible
  • list_template_files: List all files in the template directory with folder structure
  • get_template_file_content: Get the content of a specific template file from the server's template directory

Future Changes

The following improvements are planned for future releases:

  • Additional Language Templates: Adding more templates for Java and Scala to support a broader range of Dataflow use cases
  • Template Variations: Providing different variations of templates (e.g., batch vs streaming, simple vs complex architectures) to better suit various project requirements
  • Secure Authenticated Endpoint: Implementing authentication for the MCP server endpoint to secure the public deployment and protect against unauthorized access

Contributing

Contributions are welcome! We appreciate your help in making this project better. Here's how you can contribute:

How to Contribute

  1. Fork the repository and clone your fork locally
  2. Create a branch for your feature or bug fix:
    git checkout -b feature/your-feature-name
    # or
    git checkout -b fix/your-bug-fix
    
  3. Make your changes and ensure they follow the project's code style
  4. Run tests and linting to ensure everything passes:
    ruff check .
    ruff format .
    python tests/test_mcp_server_remote.py
    
  5. Commit your changes with clear, descriptive commit messages
  6. Push to your fork and create a Pull Request

Thank you for contributing! 🙏


With love, from a fellow frustrated data engineer