database-agent

susfungit/database-agent

3.2

If you are the rightful owner of database-agent and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Database Agent MCP Server is a natural language to SQL query generator with Model Context Protocol (MCP) server capabilities, designed to facilitate autonomous database interaction.

Database Agent MCP Server - Phase 2A

A natural language to SQL query generator with MCP (Model Context Protocol) server capabilities. This is Phase 2A of a planned AI agent that will eventually provide autonomous database interaction capabilities.

Current Phase: SQL Query Generator with Basic Schema Integration
Future Vision: Autonomous Database AI Agent


πŸš€ Features (Phase 2A)

Current Capabilities

  • Natural Language to SQL: Convert natural language prompts to SQL queries
  • LLMWrapper Integration: Uses your existing llmwrapper for LLM interactions
  • MCP Protocol: Implements Model Context Protocol for tool integration
  • FastAPI Server: RESTful API endpoints for easy integration
  • Health Monitoring: Built-in health checks and monitoring
  • Comprehensive Logging: Detailed logging with configurable levels
  • Configuration Management: YAML-based configuration with environment variable overrides
  • Schema Loading & Caching: Loads and caches database schema using schema-graph-builder (see below)

Planned Features (Phase 2B+)

  • Schema-Aware Generation: Use database schema for context-aware SQL generation
  • Query Execution: Execute generated SQL and return results
  • Autonomous Decision Making: Plan and execute multi-step database operations
  • Memory & Learning: Remember user preferences and learn from interactions
  • Interactive Refinement: Ask clarifying questions and refine queries
  • Relationship Detection: Automatic foreign key relationship inference

🧩 Schema Integration (Phase 2A)

What’s New in Phase 2A?

  • SchemaManager: Loads, caches, and provides access to the database schema (tables, relationships, metadata) using the schema-graph-builder package.
  • Schema Context: Schema is now available for future use in prompt enhancement, validation, and agent reasoning (but not yet used for SQL generation).
  • Health Checks: SchemaManager exposes health and summary info for monitoring.
  • Unit Tests: Comprehensive tests for schema loading, caching, and health.

How Schema Integration Works

  • On startup, if enabled in config, the agent loads the schema from your database using schema-graph-builder.
  • The schema (tables, columns, relationships) is cached and periodically refreshed.
  • The schema is not yet used to influence SQL generation, but is available for future phases.

Configuration Example

Edit config/llm_config.yaml:

schema:
  enabled: true
  database_url: "sqlite:///:memory:"
  refresh_interval: 3600  # seconds

Requirements

  • schema-graph-builder must be installed (see Installation below).
  • The schema config section must be present and enabled.

πŸ“‹ Prerequisites

  • Python 3.8+
  • Your existing llmwrapper library (installed with pip install -e ../llmwrapper)
  • Your existing schema-graph-builder library (installed with pip install -e ../schema-graph-builder)
  • OpenAI API key (or other LLM provider)

πŸ”§ Installation

  1. Clone and setup the project:
# Navigate to the database-agent directory
cd database-agent

# Install dependencies
pip install -r requirements.txt

# Install your existing packages in development mode
pip install -e ../llmwrapper
pip install -e ../schema-graph-builder
  1. Set up environment variables:
# Copy the example environment file
cp env.example .env

# Edit .env with your API keys
nano .env
  1. Configure your LLM provider and schema integration:
# Edit config/llm_config.yaml
llm:
  provider: "openai"  # or anthropic, gemini, grok, ollama
  model: "gpt-4"      # Model name for your provider
  api_key: ""         # Will be overridden by OPENAI_API_KEY env var

schema:
  enabled: true
  database_url: "sqlite:///:memory:"
  refresh_interval: 3600

🚦 Component Status Table

FeatureStatus
Natural language β†’ SQL (LLM)βœ… Complete
SchemaManager (load/cache)βœ… Complete
Schema context APIβœ… Complete
Unit tests for schema featuresβœ… Complete
Query execution❌ Not yet
Schema-aware prompt/validation❌ Not yet

πŸ“ Log Files & .gitignore

  • All log files (*.log, *.log.*) and the logs/ directory are ignored by git (see .gitignore).
  • You do not need to manually clean up logs before committing.

πŸ§ͺ Testing

# Run all tests (including schema integration)
pytest tests/

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test
pytest tests/test_database_agent.py -v

All tests are currently passing.


πŸ“ Project Structure

database-agent/
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ requirements.txt          # Python dependencies (includes schema-graph-builder)
β”œβ”€β”€ env.example              # Environment variables template
β”œβ”€β”€ architecture_diagram.md  # System architecture documentation
β”œβ”€β”€ config/
β”‚   └── llm_config.yaml      # Configuration file
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ mcp_server.py        # Main MCP server (FastAPI)
β”‚   β”œβ”€β”€ database_agent/
β”‚   β”‚   β”œβ”€β”€ agent.py         # Core database agent (Phase 1: SQL generator)
β”‚   β”‚   β”œβ”€β”€ llm_integration.py # LLMWrapper integration
β”‚   β”‚   β”œβ”€β”€ schema_manager.py # SchemaManager (Phase 2A)
β”‚   β”‚   └── tools/
β”‚   β”‚       └── query_tool.py # MCP query tool
β”‚   └── utils/
β”‚       β”œβ”€β”€ config_loader.py # Configuration management
β”‚       └── logger.py        # Logging setup
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_database_agent.py # Unit tests (Phase 1)
β”‚   └── test_schema_manager.py # Unit tests (Phase 2A)
└── examples/
    └── basic_usage.py       # Usage example

πŸ“– API Reference

Endpoints

GET /
  • Description: Server information
  • Response: {"message": "Database Agent MCP Server", "version": "1.0.0"}
GET /health
  • Description: Health check
  • Response: Health status with LLM status and timestamp
POST /generate-sql
  • Description: Generate SQL from natural language (Phase 1: Generation only)
  • Request Body:
    {
      "prompt": "Show me all users who made orders in the last month"
    }
    
  • Response:
    {
      "sql_query": "SELECT u.* FROM users u JOIN orders o ON u.user_id = o.user_id WHERE o.order_date >= DATE_SUB(NOW(), INTERVAL 1 MONTH);",
      "explanation": "Generated SQL query for: Show me all users who made orders in the last month",
      "prompt": "Show me all users who made orders in the last month",
      "timestamp": "2024-01-15T10:30:45.123456"
    }
    
  • Note: This endpoint generates SQL but doesn't execute it. Query execution will be added in Phase 2.
GET /tools
  • Description: Get available MCP tools
  • Response: List of available tools with schemas

πŸ”§ Configuration

Environment Variables

VariableDescriptionDefault
OPENAI_API_KEYOpenAI API keyRequired
LLM_PROVIDERLLM provider (openai, anthropic, etc.)openai
LLM_MODELModel namegpt-4
SERVER_HOSTServer hostlocalhost
SERVER_PORTServer port8000
LOG_LEVELLogging levelINFO

Configuration File

Edit config/llm_config.yaml:

# LLM Configuration
llm:
  provider: "openai"
  model: "gpt-4"
  api_key: ""
  timeout: 30

# Server Configuration
server:
  host: "localhost"
  port: 8000
  debug: false

# Logging Configuration
logging:
  level: "INFO"
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  file: "logs/database_agent.log"

πŸ” Troubleshooting

Common Issues

  1. LLMWrapper not found

    ModuleNotFoundError: No module named 'llmwrapper'
    
    • Install llmwrapper in development mode: pip install -e ../llmwrapper
    • Ensure you're in the database-agent directory when running the command
  2. API Key not set

    Failed to initialize LLM: API key not provided
    
    • Set OPENAI_API_KEY environment variable
    • Or add it to your .env file
  3. Port already in use

    Error: [Errno 48] Address already in use
    
    • Change the port in config or use --port argument
    • Kill the process using the port
  4. SQL not executing

    Note: This is expected behavior in Phase 1
    
    • Phase 1 only generates SQL, doesn't execute it
    • Query execution will be added in Phase 2

Logs

Logs are written to:

  • Console output
  • logs/database_agent.log (if configured)

Check logs for detailed error information.


🎯 Current Limitations (Phase 1)

  • No Query Execution: Generated SQL is not executed against databases
  • No Schema Context: SQL generation doesn't use database schema information
  • No Memory: Each request is processed independently
  • No Validation: Generated SQL is not validated against actual database structure
  • No Interactive Refinement: Cannot ask clarifying questions

These limitations will be addressed in future phases.


πŸš€ Development Roadmap

Phase 1: SQL Query Generator βœ… (Current)

  • βœ… Natural language to SQL conversion
  • βœ… MCP server with REST API
  • βœ… LLM integration
  • βœ… Basic error handling and logging

Phase 2: Schema-Aware Generator (Next)

  • πŸ”„ Schema Integration: Use your schema-graph-builder for database context
  • πŸ”„ Relationship Detection: Automatic foreign key relationship inference
  • πŸ”„ Query Validation: Validate generated SQL against schema
  • πŸ”„ Enhanced Prompts: Schema-aware SQL generation
  • πŸ”„ Query Execution: Execute queries and return results

Phase 3: Autonomous Agent (Future)

  • πŸ”„ Memory System: Remember conversation history and user preferences
  • πŸ”„ Planning Engine: Break complex requests into executable steps
  • πŸ”„ Tool Orchestration: Multiple specialized tools beyond SQL generation
  • πŸ”„ Interactive Capabilities: Ask clarifying questions when requests are ambiguous
  • πŸ”„ Learning & Adaptation: Improve responses based on user feedback

Phase 4: Full AI Agent (Vision)

  • πŸ”„ Autonomous Decision Making: Plan and execute multi-step operations
  • πŸ”„ Context Awareness: Understand database state and user intent
  • πŸ”„ Proactive Suggestions: Suggest optimizations and insights
  • πŸ”„ Natural Conversation: Maintain context across multiple interactions

🧩 Schema Integration (Phase 2A)

Schema integration allows the agent to load and cache your actual database schema using the schema-graph-builder library. This enables future schema-aware SQL generation and validation.

What it does

  • Loads tables, columns, and relationships from your database
  • Caches schema for fast access
  • Provides schema context for future LLM-based SQL generation

How to enable

  1. Install schema-graph-builder (see below)
  2. Add a schema section to your config/llm_config.yaml:
schema:
  enabled: true
  database_url: "postgresql://user:pass@localhost/dbname"
  refresh_interval: 3600  # seconds
  1. The agent will automatically load and cache your schema if enabled.

Dependency

You must install schema-graph-builder in development mode:

pip install -e ../schema-graph-builder

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License.