oscaromsn/harvest-mcp
If you are the rightful owner of harvest-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
HARvest MCP Server is a tool that enables AI coding agents to analyze API interactions and generate API wrappers by processing HAR files.
session.start
Initializes a new analysis session.
analysis.run_initial_analysis
Identifies the target action URL and creates the master node.
analysis.process_next_node
Processes the next unresolved node in the dependency graph.
codegen.generate_wrapper_script
Generates the final TypeScript wrapper script.
HARvest MCP Server
Overview
HARvest MCP Server enables AI coding agents to programmatically analyze API interactions and generate API wrappers. By analyzing browser network traffic (HAR files), it generates executable code that reproduces entire API workflows including authentication, dependency chains, and data extraction.
Key Features
- š§ AI-Powered Analysis: Uses LLM function calling for intelligent request analysis
- š Dependency Graph Management: Builds and manages complex API dependency chains
- š§ Interactive Debugging: Granular control with manual intervention capabilities
- ā” High Performance: <30ms analysis, supports 12+ concurrent sessions
- š Real-Time Inspection: Live access to analysis state via MCP resources
- š”ļø Type-Safe: Full TypeScript implementation with stricter tsconfig flags and Biome rules.
- ā Comprehensive Testing: 100% coverage driven by a strict test-first (TDD) workflow
Architecture
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā MCP Server (STDIO) ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā¤
ā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
ā ā Tools ā ā Resources ā ā Prompts ā ā
ā ā ā ā ā ā ā ā
ā ā session_* ā ā dag.json ā ā full_run ā ā
ā ā analysis_* ā ā log.txt ā ā ā ā
ā ā debug_* ā ā status.json ā ā ā ā
ā ā codegen_* ā ā code.ts ā ā ā ā
ā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā āāāāāāāāāāāāāāā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Quick Start
Installation
# Clone the repository
git clone <repository-url>
cd harvest-mcp
# Install dependencies
bun install
# Build the project
bun run build
Basic Usage
-
Start the server:
bun run start
-
Connect with MCP client:
# Using MCP Inspector (if available) mcp-inspector --transport stdio --command "bun run start"
-
Basic workflow:
// 1. Create a session const session = await tools.session.start({ harPath: './path/to/traffic.har', cookiePath: './path/to/cookies.json', // optional prompt: 'Login to the application' }); // 2. Run analysis await tools.analysis.run_initial_analysis({ sessionId: session.id }); // 3. Process dependencies while (!(await tools.analysis.is_complete({ sessionId: session.id }))) { await tools.analysis.process_next_node({ sessionId: session.id }); } // 4. Generate code const code = await tools.codegen.generate_wrapper_script({ sessionId: session.id });
MCP Tools Reference
Session Management
session.start
Initializes a new analysis session.
Parameters:
harPath
(string): Path to HAR filecookiePath
(string, optional): Path to cookie fileprompt
(string): Description of the action to analyzeinputVariables
(object, optional): Pre-defined input variables
Returns:
{
"sessionId": "uuid-string"
}
session.list
Lists all active sessions.
Returns:
{
"sessions": [
{
"id": "uuid",
"prompt": "Login to application",
"createdAt": "2025-01-01T00:00:00Z",
"isComplete": false,
"nodeCount": 3
}
]
}
session.delete
Deletes a session and cleans up resources.
Parameters:
sessionId
(string): Session to delete
Analysis Tools
analysis.run_initial_analysis
Identifies the target action URL and creates the master node.
Parameters:
sessionId
(string): Session ID
Returns:
{
"masterNodeId": "uuid",
"actionUrl": "https://api.example.com/login"
}
analysis.process_next_node
Processes the next unresolved node in the dependency graph.
Parameters:
sessionId
(string): Session ID
Returns:
{
"processedNodeId": "uuid",
"foundDependencies": [
{
"type": "request",
"nodeId": "uuid",
"extractedPart": "auth_token"
}
],
"status": "completed"
}
analysis.is_complete
Checks if the dependency analysis is finished.
Parameters:
sessionId
(string): Session ID
Returns:
{
"isComplete": true
}
Debug Tools
debug_get_unresolved_nodes
Returns nodes with unresolved dependencies.
Parameters:
sessionId
(string): Session ID
Returns:
{
"unresolvedNodes": [
{
"nodeId": "uuid",
"unresolvedParts": ["auth_token", "session_id"]
}
]
}
debug_get_node_details
Gets detailed information about a specific node.
Parameters:
sessionId
(string): Session IDnodeId
(string): Node to inspect
Returns:
{
"nodeType": "curl",
"content": "curl -X POST ...",
"dynamicParts": ["auth_token"],
"extractedParts": ["user_id"],
"inputVariables": {"username": "user@example.com"}
}
debug_list_all_requests
Lists all available requests from the HAR file.
Parameters:
sessionId
(string): Session ID
Returns:
{
"requests": [
{
"method": "POST",
"url": "https://api.example.com/auth",
"responsePreview": "{\"token\":\"abc123\"...}"
}
]
}
debug_force_dependency
Manually creates a dependency link in the DAG.
Parameters:
sessionId
(string): Session IDconsumerNodeId
(string): Node that needs the dependencyproviderNodeId
(string): Node that provides the dependencyprovidedPart
(string): The variable being provided
Code Generation
codegen.generate_wrapper_script
Generates the final TypeScript wrapper script.
Parameters:
sessionId
(string): Session ID (analysis must be complete)
Returns:
// Generated TypeScript code
async function authLogin(): Promise<ApiResponse> { /* ... */ }
async function searchDocuments(): Promise<ApiResponse> { /* ... */ }
async function loginAndSearchDocuments(): Promise<ApiResponse> { /* ... */ }
MCP Resources
harvest://{sessionId}/dag.json
Real-time JSON representation of the dependency graph.
MIME Type: application/json
Example:
{
"nodes": {
"node-1": {
"type": "master_curl",
"content": "curl -X POST ...",
"dynamicParts": [],
"extractedParts": ["result"]
}
},
"edges": []
}
harvest://{sessionId}/log.txt
Plain-text analysis log with timestamps.
MIME Type: text/plain
harvest://{sessionId}/status.json
Current analysis status and progress.
MIME Type: application/json
harvest://{sessionId}/generated_code.ts
Generated TypeScript wrapper script (available after code generation).
MIME Type: text/typescript
MCP Prompts
harvest.full_run
Complete automated analysis from HAR to code generation.
Arguments:
harPath
(string): Path to HAR filecookiePath
(string, optional): Path to cookie fileprompt
(string): Description of the actioninputVariables
(object, optional): Pre-defined variables
Returns: Complete workflow results including generated code and analysis summary.
Examples
Complete Workflow Example
// Automated login analysis
const result = await prompts.harvest.full_run({
harPath: './login-traffic.har',
cookiePath: './cookies.json',
prompt: 'Login to the dashboard and navigate to settings'
});
console.log(result.generatedCode); // Complete TypeScript implementation
Manual Debugging Example
// Create session
const session = await tools.session.start({
harPath: './complex-workflow.har',
prompt: 'Multi-step document processing'
});
// Start analysis
await tools.analysis.run_initial_analysis({ sessionId: session.id });
// Monitor progress
while (true) {
const complete = await tools.analysis.is_complete({ sessionId: session.id });
if (complete.isComplete) break;
try {
await tools.analysis.process_next_node({ sessionId: session.id });
} catch (error) {
// Handle failed dependency resolution
const unresolved = await tools.debug.get_unresolved_nodes({ sessionId: session.id });
console.log('Manual intervention needed:', unresolved);
// Example manual fix
await tools.debug.force_dependency({
sessionId: session.id,
consumerNodeId: 'problematic-node-id',
providerNodeId: 'auth-node-id',
providedPart: 'auth_token'
});
}
}
// Generate final code
const code = await tools.codegen.generate_wrapper_script({ sessionId: session.id });
State Inspection Example
// Monitor analysis state in real-time
const sessionId = 'your-session-id';
// View dependency graph
const dagResource = await resources.read(`harvest://${sessionId}/dag.json`);
console.log('Current DAG:', JSON.parse(dagResource.content));
// Check analysis logs
const logResource = await resources.read(`harvest://${sessionId}/log.txt`);
console.log('Analysis log:', logResource.content);
// Monitor status
const statusResource = await resources.read(`harvest://${sessionId}/status.json`);
const status = JSON.parse(statusResource.content);
console.log(`Progress: ${status.totalNodes - status.nodesRemaining}/${status.totalNodes} nodes processed`);
Performance Characteristics
Based on comprehensive benchmarking:
- Analysis Speed: <60ms for typical workflows
- Tool Response Time: <1ms for non-LLM operations
- Memory Usage: ~16MB per active session
- Concurrent Sessions: Supports 12+ simultaneous sessions
- Bulk Operations: 15 sessions + 20 operations in ~200ms
Development
Prerequisites
- Bun: Package manager and runtime
- TypeScript: Strict typing enabled
- Node.js 18+: For compatibility
Development Commands
# Development server with hot reload
bun run dev
# Run tests
bun test # All tests
bun test:unit # Unit tests only
bun test:integration # Integration tests
bun test:e2e # End-to-end tests
bun test:coverage # With coverage report
# Code quality
bun run check # Lint and format check
bun run check:fix # Auto-fix issues
bun run typecheck # TypeScript validation
# Build
bun run build # Production build
Project Structure
src/
āāā core/ # Core business logic
ā āāā SessionManager.ts # Stateful session management
ā āāā DAGManager.ts # Dependency graph operations
ā āāā HARParser.ts # HAR file processing
ā āāā LLMClient.ts # OpenAI integration
ā āāā CodeGenerator.ts # TypeScript code generation
āāā agents/ # Analysis agents
ā āāā URLIdentificationAgent.ts
ā āāā DynamicPartsAgent.ts
ā āāā InputVariablesAgent.ts
ā āāā DependencyAgent.ts
āāā models/ # Data models
ā āāā Request.ts # HTTP request modeling
āāā types/ # TypeScript definitions
ā āāā index.ts # Centralized types
āāā server.ts # MCP server entry point
tests/
āāā unit/ # Unit tests
āāā integration/ # Integration tests
āāā e2e/ # End-to-end tests
āāā fixtures/ # Test data
Adding New Features
-
New Analysis Agent:
// src/agents/MyNewAgent.ts export class MyNewAgent { static async analyze(session: HarvestSession): Promise<AnalysisResult> { // Implementation } }
-
New MCP Tool:
// In src/server.ts server.tool('my.new_tool', MyToolSchema, async (params) => { // Tool implementation return { content: [{ type: 'text', text: 'result' }] }; });
-
New Resource:
// Add to resource handler case 'my_resource.json': return { contents: [{ type: 'text', text: JSON.stringify(data) }], mimeType: 'application/json' };
Testing Guidelines
- Unit Tests: Test individual functions and classes
- Integration Tests: Test MCP tool workflows
- E2E Tests: Test complete user scenarios
- Performance Tests: Validate speed and memory requirements
All tests use Vitest with comprehensive mocking for LLM calls.
Configuration
Environment Variables
HARvest MCP Server supports multiple LLM providers. Configure your preferred provider using environment variables:
# LLM Provider Selection (optional)
# Supported values: openai, gemini
# If not set, auto-detects based on available API keys
LLM_PROVIDER=openai
# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key-here
# Google Gemini Configuration
GOOGLE_API_KEY=your-google-api-key-here
# Model Configuration (optional)
# Overrides the default model for your provider
LLM_MODEL=gpt-4o # OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo
# Gemini: gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
Provider Auto-Detection
If LLM_PROVIDER
is not set, the system automatically selects a provider based on available API keys:
- If
OPENAI_API_KEY
is present ā Uses OpenAI - If only
GOOGLE_API_KEY
is present ā Uses Gemini - If neither is present ā Throws configuration error
Supported Models
OpenAI Models:
gpt-4o
(default) - Latest GPT-4 Optimizedgpt-4o-mini
- Smaller, faster variantgpt-4-turbo
- GPT-4 Turbogpt-4
- Standard GPT-4gpt-3.5-turbo
- Fast, cost-effective
Gemini Models:
gemini-1.5-pro
(default) - Most capablegemini-1.5-flash
- Faster, lighter variantgemini-1.0-pro
- Previous generation
Session Configuration
// SessionManager configuration
const MAX_SESSIONS = 100; // Maximum concurrent sessions
const SESSION_TIMEOUT = 30 * 60 * 1000; // 30 minutes
const CLEANUP_INTERVAL = 5 * 60 * 1000; // 5 minutes
Troubleshooting
Common Issues
1. LLM API Failures
Error: URL identification failed: API call failed
- Check API key environment variables (
OPENAI_API_KEY
orGOOGLE_API_KEY
) - Verify the correct provider is selected (check
LLM_PROVIDER
) - Verify internet connectivity
- Check API status for your provider (OpenAI or Google)
2. HAR File Parsing Errors
Error: Failed to parse HAR file: Invalid JSON
- Ensure HAR file is valid JSON
- Check file permissions
- Verify file path is correct
3. Session Not Found
Error: Session not found: uuid
- Session may have expired (30 min timeout)
- Check session ID is correct
- Verify session was created successfully
4. Memory Issues
Out of memory errors
- Sessions automatically clean up after 30 minutes
- Manually delete unused sessions
- Check for memory leaks in custom code
Debug Logging
Enable detailed logging:
# Enable MCP debug logging
DEBUG=mcp:* bun run start
# Server-side logging (stderr)
bun run start 2> debug.log
Performance Debugging
Use the built-in performance tests:
# Run performance benchmarks
bun test tests/integration/performance.test.ts
# Monitor memory usage
bun test tests/integration/performance.test.ts --reporter=verbose
Security Considerations
- Local Execution Only: Server uses STDIO transport, no network binding
- File Path Validation: All file paths are validated to prevent traversal
- No Code Execution: Server only generates code, never executes it
- Session Isolation: Each session is completely isolated
- Automatic Cleanup: Session data is cleared on timeout/termination
Test Coverage & Quality
Current test metrics:
- Total Tests: 279 tests across 24 files
- Pass Rate: 100.0% (test-driven development power)
- Coverage Areas: Unit, integration, E2E, and performance tests
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature
- Make changes with tests:
bun test
- Ensure code quality:
bun run check
- Submit a pull request
Code Standards
- TypeScript Strict Mode: No
any
types allowed - Test Coverage: >90% target coverage
- Performance: Tools must respond in <200ms
- Documentation: All public APIs documented
Aknowledges
The approach adopted on this project - HAR file creation, DAG transversing and script generation - is inspired on the amazing work of Integuru agent.
Support
- Issues:
- Documentation: This README and inline code documentation
- Examples: See
tests/
directory for comprehensive examples