omenanewcreator/Gradient-Serverless-Inference-Playwright-MCP-CUA-Chat-Template
If you are the rightful owner of Gradient-Serverless-Inference-Playwright-MCP-CUA-Chat-Template and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Model Context Protocol (MCP) server is designed to facilitate seamless integration between AI models and various tools, enabling enhanced capabilities such as browser automation and media processing.
DigitalOcean Gradient + Playwright MCP CUA Template
A Next.js application demonstrating DigitalOcean's AI platform capabilities, featuring:
- Gradient Integration: Chat with multiple LLM models powered by DigitalOcean's Gradient platform
- Playwright Browser Automation: Remote browser control through MCP (Model Context Protocol)
- DigitalOcean Spaces: Automatic file upload and optimization for media content
- Interactive Web Tools: Screenshot capture and browser automation capabilities
Features
Core Applications
1. AI Chat with MCP Browser Automation
- Multi-Model Support: Access to various LLMs through DigitalOcean's Gradient (requires models with tool support - see Limitations section)
- Browser Control: AI can navigate websites, take screenshots, fill forms, and interact with web pages (OpenAI models recommended)
- Visual AI: Support for vision capabilities - AI can see and understand screenshots
- PDF Processing: AI can read and process PDF documents
- Media Support: Display images, videos, audio, PDFs, and documents inline
2. Screenshotter Tool
- Multi-Browser Support: Chromium, Firefox, Safari (WebKit), and Microsoft Edge
- Device Emulation: Simulate various devices (iPhones, iPads, Android devices)
- Resolution Presets: Common desktop and mobile resolutions
- Full Page Screenshots: Capture entire scrollable pages
- High Quality Mode: Toggle between compressed and high-quality screenshots
User Interface
Chat Interface
- Responsive Design: Full-width messages with proper mobile support
- Resizable Sidebar: Drag to resize between 280px-600px
- Syntax Highlighting: Code blocks with VS Code Dark+ theme
- Message Styling:
- User messages: Blue background (#3b82f6)
- Assistant messages: Green background (#22c55e)
- Collapsible Content: Large outputs automatically collapse with expand/collapse controls
Advanced Features
- Debug Mode: Toggle to view raw message JSON for development
- Model Parameters: Adjustable temperature, max tokens, top P, and frequency penalty
- Streaming Responses: Real-time token streaming with visual indicators
- Error Handling: Graceful error display with retry capabilities
Technical Features
- Token Optimization: Replace large base64 strings with presigned URLs, automatically uploaded to DigitalOcean Spaces
- Concurrent Processing: Batch uploads with concurrency limits for performance
- MCP Protocol Support: Full implementation of Model Context Protocol for tool integration
- Streamable HTTP Transport: Real-time communication with MCP servers
- Keyboard Shortcuts:
- OS-aware shortcut display (shows ā on Mac, Ctrl on others): Clear chat and start new conversation
Architecture
Services
-
Next.js Web App (Port 3000)
- Main application with chat and screenshotter interfaces
- Server-side API routes for AI and browser operations
- React components with TypeScript
-
Playwright Server (Port 8081)
- Headless browser instance management
- WebSocket API for browser control
- Supports Chromium, Firefox, WebKit, and Edge
-
Playwright MCP Server (Port 8080)
- Model Context Protocol implementation
- Bridges AI tools with browser automation
- Provides screenshot, navigation, and interaction capabilities
API Endpoints
/api/chat
- Main chat endpoint with streaming responses/api/gradient-models
- Fetch available AI models/api/screenshot
- Direct screenshot API/api/devices
- Get device emulation profiles
Limitations and Requirements
Model Requirements
The chat interface with browser automation requires LLM models that support function calling/tools. Not all models available through Gradient support this feature.
Supported Models
The following models have been tested and confirmed to work with browser automation tools:
Model ID | Provider | Description | Performance |
---|---|---|---|
openai-gpt-41 | OpenAI | GPT-4 Optimized | Best overall performance |
openai-gpt-4o | OpenAI | GPT-4 Optimized | Better than mini, but not as good as 41 |
openai-gpt-4o-mini | OpenAI | GPT-4 Optimized Mini | Cost-effective, fast |
alibaba-qwen3-32b | Alibaba | Qwen 3 32B | Excellent open model |
deepseek-r1-distill-llama-70b | DeepSeek | R1 Distilled Llama 70B | Powerful open model |
llama3.3-70b-instruct | Meta | Llama 3.3 70B Instruct | High-quality open model |
mistral-nemo-instruct-2407 | Mistral | Nemo Instruct 2407 | Efficient, good tool support |
Note: Other models that support function calling may also work but have not been fully tested.
Currently Unsupported Models
The following models have limitations with browser automation in this template:
- Anthropic Claude models (Claude 3 Opus, Sonnet, Haiku) - While these models do support tools, the current implementation uses the AI SDK's OpenAI-compatible provider which doesn't properly support tool calling for Anthropic models through Gradient
- Most open-source models without function calling support
- Text-only models without tool capabilities
Feature Limitations
- Browser automation only works with tool-supporting models
- Without tool support, the chat will function as a standard LLM chat without browser control
- Screenshot tool requires Playwright servers to be running and accessible
- File uploads require configured DigitalOcean Spaces access
- Browser sessions are not maintained between messages - each browser action starts fresh (AI SDK limitation)
Technical Notes
- This template uses the AI SDK with an OpenAI-compatible provider to communicate with Gradient
- Tool calling implementation follows OpenAI's function calling format
- The Playwright MCP server supports sessions for maintaining browser state across requests, but the AI SDK doesn't yet support MCP session management
- Future updates may add:
- Native support for Anthropic models once the AI SDK's provider properly supports their tool format through Gradient
- Session support once the AI SDK implements MCP session management
DigitalOcean Spaces Integration
The application uses DigitalOcean Spaces (S3-compatible object storage) to optimize token usage by automatically uploading base64-encoded files and replacing them with presigned URLs.
How it works
-
Automatic Detection: The system detects base64 data in:
- Message content (images and files)
- Tool inputs (before execution)
- Tool outputs (after execution)
-
S3 Upload: Base64 data is uploaded to S3 with the structure:
/uploads/{uuid}/{original-filename}
-
URL Replacement: Base64 data is replaced with presigned URLs that expire after 7 days
-
Supported Formats: Most file types are supported, including:
- Images (PNG, JPEG, GIF, WebP, SVG)
- Videos (MP4, WebM)
- Audio (MP3, WAV, OGG)
- Documents (PDF, JSON, TXT, HTML, CSS, JS)
Performance Features
- Concurrent uploads with batching (max 10 simultaneous)
- Non-blocking async operations
- 7-day presigned URL expiration
- Automatic MIME type detection
Getting Started
Prerequisites
- Node.js 22.14.0 or higher (< 23)
- Yarn 1.22.22
- Docker and Docker Compose (for running Playwright servers)
- DigitalOcean account with:
Local Development
-
Clone and install:
git clone https://github.com/digitalocean/template-app-platform-gradient-cua-chat cd template-app-platform-playwright-mcp-cua yarn install
-
Configure environment:
cp .env.example .env.local
-
Update
.env.local
with your credentials (see Environment Variables section below for details) -
Start Playwright servers:
Using Docker Compose Locally (recommended):
docker-compose up -d
-
Start the app:
yarn dev
-
Access the application:
- Homepage: http://localhost:3000
- Chat: http://localhost:3000/chat
- Screenshotter: http://localhost:3000/screenshotter
-
Stop services (when using Docker Compose):
docker-compose down
Environment Variables
The application requires several environment variables for different services. Copy .env.example
to .env.local
and configure:
Base Configuration
# Base URL for the Next.js application
# Set to your deployed app URL in production
NEXT_PUBLIC_BASE_URL="http://localhost:3000"
Gradient Configuration
Gradient is DigitalOcean's AI platform for running LLMs.
# Get your API key from: https://cloud.digitalocean.com/ai-ml/inference
# How to create: https://docs.digitalocean.com/products/gradientai-platform/how-to/use-serverless-inference/#create
GRADIENT_API_KEY=your_gradient_api_key_here
# Gradient inference endpoint (typically doesn't need changes)
GRADIENT_BASE_URL="https://inference.do-ai.run/v1"
DigitalOcean Spaces Configuration
Spaces is DigitalOcean's S3-compatible object storage for uploading chat media.
# Create a Space: https://docs.digitalocean.com/products/spaces/how-to/create/
# Available regions: nyc3, ams3, sfo3, sgp1, fra1, syd1
DO_SPACES_ENDPOINT=https://nyc3.digitaloceanspaces.com
DO_SPACES_REGION=nyc3
# Generate keys: https://cloud.digitalocean.com/account/api/spaces
DO_SPACES_ACCESS_KEY=your_spaces_access_key_here
DO_SPACES_SECRET_KEY=your_spaces_secret_key_here
# Your Space name (must be globally unique)
DO_SPACES_BUCKET=your_bucket_name_here
Playwright MCP Server Configuration
The Playwright MCP server enables browser automation in chat:
For Local Development
# Default ports for local services (these are the defaults if not specified)
PLAYWRIGHT_SERVER_ENDPOINT=http://localhost:8081
PLAYWRIGHT_MCP_ENDPOINT=http://localhost:8080/mcp
Note: If these environment variables are not set, the application will automatically use the local development defaults shown above.
For Production (App Platform)
Option 1 - External Access (through public internet):
PLAYWRIGHT_SERVER_ENDPOINT=https://my-app-name.ondigitalocean.app/playwright-server
PLAYWRIGHT_MCP_ENDPOINT=https://my-app-name.ondigitalocean.app/playwright-mcp/mcp
Option 2 - Internal App Network Access (recommended for performance & security):
PLAYWRIGHT_SERVER_ENDPOINT=http://playwright-server:8081
PLAYWRIGHT_MCP_ENDPOINT=http://playwright-mcp:8080/mcp
Deployment on DigitalOcean App Platform
Prerequisites
- DigitalOcean account with billing enabled
- GitHub account with the repository forked
- The following DigitalOcean services configured:
Quick Deploy
Click the button above or use this link:
Manual Deployment Steps
1. Fork the Repository
Fork this repository to your GitHub account so App Platform can access it.
2. Create a New App
- Go to DigitalOcean App Platform
- Click "Create App"
- Choose "GitHub" as your source
- Select your forked repository
3. Configure App Spec
You can either:
- Use the UI to configure components
- Upload the provided
.do/app.yaml
spec file
The app requires 3 components:
- Web Service: The Next.js application
- Worker 1: Playwright browser server
- Worker 2: Playwright MCP server
4. Set Environment Variables
Configure these environment variables in the App Platform settings (see Environment Variables section above for details):
Required Secrets:
GRADIENT_API_KEY
- Your Gradient API keyDO_SPACES_ACCESS_KEY
- Your Spaces access keyDO_SPACES_SECRET_KEY
- Your Spaces secret keyDO_SPACES_BUCKET
- Your Spaces bucket nameDO_SPACES_ENDPOINT
- Your Spaces endpoint (e.g., https://nyc3.digitaloceanspaces.com)DO_SPACES_REGION
- Your Spaces region (e.g., nyc3)
Required for Production (choose one option):
- For internal networking (recommended):
PLAYWRIGHT_SERVER_ENDPOINT=http://playwright-server:8081
PLAYWRIGHT_MCP_ENDPOINT=http://playwright-mcp:8080/mcp
- For external access:
PLAYWRIGHT_SERVER_ENDPOINT=https://your-app-name.ondigitalocean.app/playwright-server
PLAYWRIGHT_MCP_ENDPOINT=https://your-app-name.ondigitalocean.app/playwright-mcp/mcp
5. Configure Component Settings
Web Component
- Instance Size: Basic XXS (512 MB RAM, 1 vCPU)
- HTTP Port: 3000
- Routes: /
Playwright Server Worker
- Instance Size: Professional XS (1 GB RAM, 1 vCPU)
- Internal Port: 8081
- Dockerfile: Dockerfile.playwright
Playwright MCP Worker
- Instance Size: Professional XS (1 GB RAM, 1 vCPU)
- Internal Port: 8080
- Dockerfile: Dockerfile.mcp
6. Deploy
Click "Create Resources" to start the deployment. The initial build may take 10-15 minutes.
Post-Deployment
Verify Services
- Check that all 3 components show as running and healthy
- Visit your app URL to see the homepage
- Test the Chat interface
- Test the Screenshotter tool
Monitor Performance
Use the App Platform metrics to monitor:
- CPU and memory usage
- Request rates
- Error logs
Troubleshooting Deployment
Build Failures
If the build fails:
- Check the build logs for errors
- Ensure the correct values are in the arguments to the runners
- Verify the Dockerfiles are correct
Runtime Errors
Connection Issues
If services can't communicate:
- Use internal hostnames (playwright-server, playwright-mcp)
- Check the internal ports are correct
- Verify environment variables point to internal URLs
Security Considerations
- API Keys: Always use App Platform secrets for sensitive values
- Network: Use internal networking between components
- Spaces: Configure bucket policies to restrict access
- Updates: Keep dependencies updated for security patches
Project Structure
āāā app/
ā āāā api/ # API routes
ā ā āāā chat/ # Main chat endpoint
ā ā āāā gradient-models/ # Model listing
ā ā āāā screenshot/ # Screenshot API
ā ā āāā devices/ # Device profiles
ā āāā chat/ # Chat interface
ā āāā screenshotter/ # Screenshot tool
ā āāā page.tsx # Homepage
āāā components/
ā āāā chat/ # Chat UI components
ā ā āāā ChatSidebar.tsx
ā ā āāā Message.tsx
ā ā āāā MessagesArea.tsx
ā āāā media-renderers/ # Media display components
ā āāā MediaRenderer.tsx # Main router
ā āāā PDFRenderer.tsx # PDF viewer
ā āāā DocumentRenderer.tsx # Documents
āāā lib/
ā āāā mcp-transport.ts # MCP WebSocket client
ā āāā tool-handlers.tsx # Tool result rendering
ā āāā s3-utils.ts # Spaces upload logic
āāā hooks/ # React hooks
āāā Dockerfile.mcp # MCP server image
āāā Dockerfile.playwright # Browser server image
Development
Available Scripts
# Development with Turbopack
yarn dev
# Production build
yarn build
yarn start
# Testing
yarn test # Run all tests
yarn test:watch # Watch mode
yarn test:coverage # Coverage report
# Linting
yarn lint
Testing
The project includes comprehensive test coverage:
- Unit tests for components
- API route tests
- Hook tests
- Utility function tests
Run yarn test:coverage
to see the full coverage report.
Troubleshooting
Common Issues
-
"Bad Request" errors in chat
- Most common cause: Max Tokens setting in Advanced Settings is too high
- See "Max Tokens Configuration" section below for detailed explanation
-
"Cannot connect to Playwright server"
- Ensure both Playwright containers are running
- Check ports 8080 and 8081 are not in use (when running locally)
- Verify environment variables are set correctly
-
"Gradient API error"
- Verify your API key is correct
- Check you have access to Gradient
- Ensure you're not exceeding rate limits
-
"Spaces upload failed"
- Verify bucket exists and is accessible
- Check access keys have write permissions
- Ensure bucket name is globally unique
-
"Screenshot timeout"
- Check Playwright server is running and reachable
- Try different browser options
- Check if the site requires authentication
Max Tokens Configuration
The most common cause of "Bad Request" errors in the chat interface is incorrect Max Tokens settings in the Advanced Settings panel.
How Max Tokens Actually Works
The number of tokens a model generates is determined by:
generated_tokens = min(request.max_tokens, (model_context_length - prompt_token_length))
Where:
request.max_tokens
- The value you set in Advanced Settingsmodel_context_length
- The model's total context window (varies by model)prompt_token_length
- Tokens used by your messages + system prompt + tool definitions
Common Issues and Solutions
-
Setting Max Tokens too high
- If you set Max Tokens to 32,000 but your prompt uses 30,000 tokens, the model can only generate 2,000 tokens
- If Max Tokens exceeds available space, you'll get a "Bad Request" error
-
Solution
- Start with a lower Max Tokens value (e.g., 4,096)
- If you get "max tokens reached" warnings, gradually increase it
- Monitor the token usage shown in the chat interface
-
Model-Specific Context Limits
- Each model has different context lengths
- Check the model's documentation for its specific limit
- Leave room for both input and output tokens
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Support
For issues specific to:
- App Platform: DigitalOcean Support
- This application: GitHub Issues
License
This is a template application provided by DigitalOcean. See LICENSE for details.