tyra-mcp by bigrock776 - MCP Server

Tyra Advanced Memory MCP Server

A sophisticated Model Context Protocol (MCP) server providing advanced memory capabilities with RAG (Retrieval-Augmented Generation), hallucination detection, adaptive learning, and enterprise-grade AI infrastructure for intelligent agent ecosystems.

🚀 AI Enhancement Roadmap

The Tyra Memory Server includes advanced AI dependencies and infrastructure, with planned enhancements for comprehensive intelligence platform capabilities:

🧠 AI-Powered Memory Synthesis (Implementation Ready)

Intelligent Deduplication: 40%+ reduction in duplicate memories using local semantic analysis
Automated Summarization: Multi-layer summarization with Pydantic AI validation and anti-hallucination
Pattern Recognition: Cross-memory insights using local ML clustering and topic modeling
Temporal Evolution: Track memory content changes and concept development over time
Status: Dependencies included (pydantic-ai>=0.0.13), ready for implementation

🛡️ Pydantic AI & Anti-Hallucination Architecture (Dependency Ready)

Pydantic AI Status: Dependency included (pydantic-ai>=0.0.13) but not yet implemented
Current Implementation: Extensive Pydantic schemas throughout codebase for data validation
Current Hallucination Detection: Grounding-based analysis with confidence scoring
Ready for Enhancement: Pydantic AI integration for structured, validated AI outputs
Planned Features: Multi-layer validation, real-time detection, mandatory citation validation

⚡ Advanced RAG Pipeline Enhancement (Phase 1-2)

Multi-Modal Support: Images, videos, audio, and code with local CLIP and Whisper models
Contextual Chunk Linking: Intelligent chunk relationships and coherence scoring
Dynamic Reranking: Real-time ranking adaptation with local cross-encoder models
Intent-Aware Retrieval: Query intent classification with strategy selection

🌊 Real-Time Memory Streams (Phase 2)

WebSocket Infrastructure: Live memory updates and collaborative search
Event-Driven Triggers: Custom automation with local rule engines
Progressive Search: Real-time query suggestions and result refinement
Live Analytics: Streaming performance metrics and insights

🔮 Predictive Intelligence (Phase 2-3)

Usage Pattern Analysis: ML-driven access pattern prediction
Smart Auto-Archiving: Intelligent memory lifecycle management
Predictive Preloading: 40%+ latency reduction through anticipation
Context-Aware Embeddings: Session-adaptive embeddings with fine-tuning

🎯 Self-Optimization Engine (Phase 3)

Continuous Learning: Online adaptation without manual intervention
A/B Testing Framework: Statistical experimentation with local validation
Hyperparameter Optimization: Bayesian optimization for 20%+ performance gains
Memory Personality Profiles: Personalized user experience adaptation

🔥 All enhancements are 100% local and open source with zero external dependencies!

🌟 Current Features

🔧 Implementation Status

Current Data Validation: Extensive Pydantic usage throughout codebase (BaseModel, Field validation)
Pydantic AI: Dependency included (pydantic-ai>=0.0.13) but not yet implemented in codebase
Available for Integration: Ready to implement Pydantic AI for structured AI outputs and enhanced validation
Current Validation: Comprehensive Pydantic schemas + confidence scoring + grounding-based hallucination detection

🧠 Advanced Memory System

Multi-Modal Storage: Vector embeddings + temporal knowledge graphs with Neo4j + Graphiti integration
Agent Isolation: Separate memory spaces for Tyra, Claude, Archon with session management
Intelligent Chunking: 6 dynamic strategies (auto, paragraph, semantic, slide, line, token) with size optimization
Entity Extraction: Automated NER with relationship mapping using temporal knowledge graphs
Memory Versioning: Track memory evolution with temporal validity intervals
Memory Health: Automated stale detection, redundancy removal, and consolidation
Multi-Level Caching: L1 (in-memory), L2 (Redis), L3 (materialized views) for <100ms p95 latency

📄 Universal Document Ingestion

9 File Formats: PDF (PyMuPDF), DOCX (python-docx), PPTX (python-pptx), TXT/MD (encoding detection), HTML (html2text), JSON (nested objects), CSV (streaming), EPUB (chapters), and custom format support
Smart Processing: Auto-format detection with specialized loaders and fallback mechanisms
Dynamic Chunking: File-type-aware strategies (semantic for PDFs, paragraph for DOCX, slide for PPTX)
LLM Enhancement: Context injection with rule-based templates + optional vLLM integration
Batch Processing: Concurrent ingestion up to 100 documents with configurable concurrency (default: 20)
Streaming Pipeline: Memory-efficient processing for large files (>10MB) with progress tracking
Comprehensive Metadata: Document properties, chunk metadata, confidence scoring, hallucination detection
Error Recovery: Graceful fallback with retry logic and detailed error reporting

🔍 Sophisticated Search & RAG

Hybrid Search: Weighted combination (0.7 vector + 0.3 keyword) with graph traversal enhancement
Multi-Strategy Retrieval: Vector similarity, keyword matching, temporal queries, graph traversal
Advanced Reranking: Cross-encoder models + optional vLLM-based reranking with caching
Confidence Scoring: Multi-level assessment (💪 Rock Solid 95%+, 🧠 High 80%+, 🤔 Fuzzy 60%+, ⚠️ Low <60%)
Hallucination Detection: Real-time grounding analysis with evidence collection and consistency checking
Trading Safety: Unbypassable 95% confidence requirement for financial operations with audit logging
Context Enrichment: Graph-based context expansion with temporal relevance weighting

🕸️ Temporal Knowledge Graph

Neo4j Integration: High-performance graph database with Cypher query support
Graphiti Framework: Advanced temporal knowledge management with validity intervals
Entity Management: Automated extraction, typing, merging, and updates with conflict resolution
Relationship Tracking: Temporal relationship extraction with time-based validity and evolution tracking
Graph Traversal: Efficient path finding, subgraph extraction, and entity timeline queries
Temporal Queries: Time-range filtering, relationship evolution, and temporal pattern matching

🔀 Modular Provider System

Hot-Swappable Components: Runtime provider switching without restart
Embedding Providers: HuggingFace (intfloat/e5-large-v2 primary, all-MiniLM-L12-v2 fallback), OpenAI fallback
Vector Stores: PostgreSQL + pgvector with HNSW indexing, future support for Weaviate, Qdrant
Graph Engines: Neo4j + Graphiti (primary), extensible to other graph databases
Rerankers: Cross-encoder models, vLLM integration, custom reranking strategies
Cache Providers: Redis (multi-layer), in-memory LRU, future distributed caching
File Loaders: Extensible loader registry with custom format support
Fallback Mechanisms: Automatic failover with circuit breakers and health monitoring

📊 Performance Analytics & Observability

Real-Time Monitoring: Response time, accuracy, memory usage, cache hit rates with configurable dashboards
OpenTelemetry Integration: Complete distributed tracing, metrics collection, and structured logging
Trend Analysis: Automated performance trend detection with statistical significance testing
Smart Alerts: Configurable warning (response time >100ms) and critical thresholds with notification channels
Performance Metrics: Request latency histograms, error rate tracking, resource utilization monitoring
Health Checks: Comprehensive component health monitoring with automatic recovery
Audit Logging: Complete operation audit trail with request correlation and compliance reporting

🎯 Adaptive Learning & Self-Optimization

Self-Optimization: Automated parameter tuning based on performance data and user feedback
A/B Testing Framework: Systematic experimentation with statistical significance testing and rollback protection
Learning Insights: Pattern recognition from successful configurations and failure analysis
Multi-Strategy Optimization: Gradient descent, Bayesian optimization, random search with ensemble methods
Memory Health Management: Stale memory detection, redundancy identification, and automated cleanup
Prompt Evolution: Continuous improvement of prompts based on success/failure patterns
Configuration Adaptation: Dynamic configuration updates with safety constraints and rollback capabilities
Performance Baselines: Automatic establishment and tracking of performance benchmarks

🌐 Dual Interface Architecture

MCP Protocol: Full Model Context Protocol support for Claude, Tyra, and other MCP-compatible agents
REST API: Comprehensive HTTP API with OpenAPI documentation for web integrations and custom clients
WebSocket Support: Real-time updates and streaming for long-running operations
n8n Integration: Pre-built webhook endpoints and workflow templates for automation
SDK Support: Python and JavaScript client libraries with async support and retry logic

🔒 Enterprise Security & Safety

Local Operation: 100% local deployment with no external API dependencies for data privacy
Multi-Agent Isolation: Secure separation of agent memory spaces with access controls
Trading Safety: Unbypassable confidence requirements for financial operations with multiple validation layers
Input Validation: Comprehensive request validation with SQL injection and XSS protection
Rate Limiting: Configurable limits (1000/min default) with burst protection (50/sec)
Circuit Breakers: Automatic failure protection with configurable thresholds and recovery timeouts
Audit Trail: Complete operation logging with compliance reporting and forensic capabilities

🚀 Quick Start

Note: Current version is production-ready with advanced AI infrastructure. Enhanced features can be implemented using included dependencies with full backward compatibility.

Prerequisites

Python 3.11+
PostgreSQL with pgvector extension
Redis (for caching)
Neo4j (for knowledge graphs)
HuggingFace CLI (for model downloads)
Git LFS (for large model files)

Implementation Guide: For implementing advanced features, see for implementation patterns using included dependencies.

Automated Setup

# Run unified setup script
./setup.sh --env development

# Start the server
source venv/bin/activate
python main.py

Manual Installation

Clone and Setup

git clone https://github.com/your-org/tyra-mcp-memory-server.git
cd tyra-mcp-memory-server
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Install Model Prerequisites

# Install HuggingFace CLI and Git LFS
pip install huggingface-hub
git lfs install

Download Required Models ⚠️ REQUIRED - No Automatic Downloads

# Create model directories
mkdir -p ./models/embeddings ./models/cross-encoders

# Download primary embedding model (~1.34GB)
huggingface-cli download intfloat/e5-large-v2 \
  --local-dir ./models/embeddings/e5-large-v2 \
  --local-dir-use-symlinks False

# Download fallback embedding model (~120MB)
huggingface-cli download sentence-transformers/all-MiniLM-L12-v2 \
  --local-dir ./models/embeddings/all-MiniLM-L12-v2 \
  --local-dir-use-symlinks False

# Download cross-encoder for reranking (~120MB)
huggingface-cli download cross-encoder/ms-marco-MiniLM-L-6-v2 \
  --local-dir ./models/cross-encoders/ms-marco-MiniLM-L-6-v2 \
  --local-dir-use-symlinks False

Verify Model Installation

# Test all models are working
python scripts/test_model_pipeline.py

Database Setup

# Start databases with Docker
docker-compose -f docker-compose.dev.yml up -d

# Or configure your own PostgreSQL, Redis, Neo4j instances

Configuration

cp .env.example .env
# Edit .env with your database credentials

Start Server
```
python main.py
```

🔧 MCP Integration

AI Enhancement Available: MCP tools can be enhanced with Pydantic AI validation using included dependencies. Current tools include basic confidence scoring and hallucination detection.

Claude Desktop Configuration

Add to your Claude Desktop MCP settings:

{
  "mcpServers": {
    "tyra-memory": {
      "command": "python",
      "args": ["/path/to/tyra-mcp-memory-server/main.py"],
      "env": {
        "TYRA_ENV": "production"
      }
    }
  }
}

Available Tools

🧠 Core Memory Tools

📝 `store_memory`

Store information with automatic entity extraction and metadata enrichment.

{
  "tool": "store_memory",
  "content": "User prefers morning trading sessions and uses technical analysis",
  "agent_id": "tyra",
  "session_id": "trading_session_001",
  "extract_entities": true,
  "chunk_content": false,
  "metadata": {"category": "trading_preferences", "confidence": 95}
}

🔍 `search_memory`

Advanced hybrid search with confidence scoring and hallucination analysis.

{
  "tool": "search_memory",
  "query": "What are the user's trading preferences?",
  "agent_id": "tyra",
  "search_type": "hybrid",
  "top_k": 10,
  "min_confidence": 0.7,
  "include_analysis": true,
  "rerank": true,
  "temporal_weight": 0.3
}

📋 `get_all_memories`

Retrieve all memories for an agent with filtering and pagination.

{
  "tool": "get_all_memories",
  "agent_id": "tyra",
  "limit": 50,
  "offset": 0,
  "include_metadata": true,
  "filter_by_date": "2024-01-01",
  "category": "trading"
}

🗑️ `delete_memory`

Remove specific memories with optional cascade deletion.

{
  "tool": "delete_memory",
  "memory_id": "mem_12345",
  "agent_id": "tyra",
  "cascade_delete": false
}

📄 Document Processing Tools

📄 `ingest_document`

Ingest documents with automatic format detection and intelligent processing.

{
  "tool": "ingest_document",
  "file_path": "/path/to/document.pdf",
  "file_type": "pdf",
  "chunking_strategy": "auto",
  "chunk_size": 512,
  "chunk_overlap": 50,
  "enable_llm_context": true,
  "extract_entities": true,
  "metadata": {"source": "user_upload", "priority": "high"}
}

📊 `batch_ingest`

Process multiple documents concurrently with progress tracking.

{
  "tool": "batch_ingest",
  "documents": [
    {"file_path": "/docs/report1.pdf", "chunking_strategy": "semantic"},
    {"file_path": "/docs/data.csv", "chunking_strategy": "row"}
  ],
  "max_concurrent": 10,
  "progress_callback": true
}

📈 `get_ingestion_status`

Monitor document processing status and progress.

{
  "tool": "get_ingestion_status",
  "job_id": "batch_12345",
  "include_details": true
}

🛡️ Analysis & Validation Tools

🛡️ `analyze_response`

Analyze any response for hallucinations and confidence scoring.

{
  "tool": "analyze_response",
  "response": "Based on your history, you prefer swing trading",
  "query": "What's my trading style?",
  "retrieved_memories": [...],
  "detailed_analysis": true,
  "include_evidence": true
}

🎯 `validate_for_trading`

Special validation for financial operations with 95% confidence requirement.

{
  "tool": "validate_for_trading",
  "query": "Should I buy AAPL stock?",
  "response": "Based on your risk profile, AAPL looks good",
  "context_memories": [...],
  "require_rock_solid": true
}

🔄 `rerank_results`

Improve search result relevance with advanced reranking.

{
  "tool": "rerank_results",
  "query": "trading strategies",
  "results": [...],
  "reranker_type": "cross_encoder",
  "top_k": 5
}

🕸️ Knowledge Graph Tools

🕸️ `query_graph`

Execute graph traversal queries on the knowledge graph.

{
  "tool": "query_graph",
  "cypher_query": "MATCH (p:PERSON)-[r:TRADES]->(s:STOCK) RETURN p, r, s",
  "agent_id": "tyra",
  "include_temporal": true
}

🔗 `get_entity_relationships`

Explore entity connections and relationship paths.

{
  "tool": "get_entity_relationships",
  "entity_name": "AAPL",
  "relationship_types": ["CORRELATES_WITH", "COMPETES_WITH"],
  "max_depth": 3,
  "temporal_filter": "last_30_days"
}

📊 `get_entity_timeline`

View entity evolution over time.

{
  "tool": "get_entity_timeline",
  "entity_id": "entity_12345",
  "start_date": "2024-01-01",
  "end_date": "2024-12-31",
  "include_relationships": true
}

📊 System Monitoring Tools

📊 `get_memory_stats`

Comprehensive system statistics and health metrics.

{
  "tool": "get_memory_stats",
  "agent_id": "tyra",
  "include_performance": true,
  "include_recommendations": true,
  "include_cache_stats": true,
  "time_range": "last_24_hours"
}

🏥 `health_check`

Complete system health assessment.

{
  "tool": "health_check",
  "detailed": true,
  "include_components": ["vector_store", "graph_engine", "cache", "embeddings"],
  "run_diagnostics": true
}

⚡ `get_performance_metrics`

Real-time performance and resource utilization.

{
  "tool": "get_performance_metrics",
  "metric_types": ["latency", "throughput", "error_rate"],
  "time_window": "5m",
  "include_predictions": true
}

🎯 Learning & Optimization Tools

🎯 `get_learning_insights`

Access adaptive learning insights and optimization recommendations.

{
  "tool": "get_learning_insights",
  "category": "parameter_optimization",
  "days": 7,
  "include_experiments": true,
  "confidence_threshold": 0.8
}

🔧 `optimize_configuration`

Trigger configuration optimization based on usage patterns.

{
  "tool": "optimize_configuration",
  "components": ["embeddings", "cache", "reranking"],
  "optimization_strategy": "bayesian",
  "safety_constraints": true
}

📈 `get_improvement_suggestions`

AI-generated recommendations for system enhancement.

{
  "tool": "get_improvement_suggestions",
  "focus_areas": ["performance", "accuracy", "reliability"],
  "priority": "high",
  "include_implementation": true
}

🔧 Administrative Tools

⚙️ `update_configuration`

Dynamically update system configuration without restart.

{
  "tool": "update_configuration",
  "config_section": "cache",
  "updates": {"ttl": 7200, "max_size": "2GB"},
  "validate_before_apply": true
}

🧹 `cleanup_memories`

Clean up stale or redundant memories.

{
  "tool": "cleanup_memories",
  "agent_id": "tyra",
  "older_than_days": 90,
  "confidence_threshold": 0.3,
  "dry_run": true
}

💾 `backup_memories`

Create backup of agent memories.

{
  "tool": "backup_memories",
  "agent_id": "tyra",
  "include_graph": true,
  "compression": true,
  "backup_location": "/backups/tyra_memories.gz"
}

🏗️ Architecture

High-Level System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT LAYER                            │
├─────────────────┬─────────────────┬─────────────────┬──────────┤
│     Claude      │      Tyra       │     Archon      │   n8n    │
│   (MCP Client)  │   (MCP Client)  │   (MCP Client)  │ (Webhook)│
└─────────────────┴─────────────────┴─────────────────┴──────────┘
         │                 │                 │            │
         └─────────────────┼─────────────────┘            │
                           │                              │
┌─────────────────────────────────────────────────────────────────┐
│                     INTERFACE LAYER                            │
├─────────────────────────────┬───────────────────────────────────┤
│        MCP Server           │         FastAPI Server           │
│   • Tool Handlers           │   • REST Endpoints               │
│   • Protocol Management     │   • WebSocket Support            │
│   • Agent Isolation         │   • OpenAPI Documentation        │
│   • Session Management      │   • Rate Limiting                │
└─────────────────────────────┴───────────────────────────────────┘
         │                                 │
         └─────────────────┬───────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                      CORE ENGINE                               │
├─────────────┬─────────────┬─────────────┬─────────────┬────────┤
│   Memory    │ Document    │ Hallucination│   Graph    │ Learn  │
│  Manager    │ Processor   │  Detector    │  Engine    │ Engine │
│ • Storage   │ • 9 Formats │ • Confidence │ • Neo4j │ • A/B  │
│ • Retrieval │ • Chunking  │ • Grounding  │ • Graphiti │ • Opt  │
│ • Caching   │ • LLM Enh   │ • Evidence   │ • Temporal │ • Ins  │
└─────────────┴─────────────┴─────────────┴─────────────┴────────┘
         │           │             │             │         │
         └───────────┼─────────────┼─────────────┼─────────┘
                     │             │             │
┌─────────────────────────────────────────────────────────────────┐
│                    PROVIDER LAYER                              │
├──────────┬──────────┬──────────┬──────────┬──────────┬─────────┤
│Embedding │ Vector   │  Graph   │Reranker  │  Cache   │  File   │
│Providers │  Store   │ Database │ Provider │ Provider │ Loaders │
│• HF E5   │• PG Vec  │• Neo4j│• Cross-E │• Redis   │• PDF    │
│• MiniLM  │• HNSW    │• Cypher  │• vLLM    │• Memory  │• DOCX   │
│• OpenAI  │• Hybrid  │• Graphiti│• Custom  │• L1/L2/L3│• 7 More │
└──────────┴──────────┴──────────┴──────────┴──────────┴─────────┘
         │      │          │          │          │          │
         └──────┼──────────┼──────────┼──────────┼──────────┘
                │          │          │          │
┌─────────────────────────────────────────────────────────────────┐
│                      DATA LAYER                                │
├─────────────────┬─────────────────┬─────────────────┬──────────┤
│   PostgreSQL    │    Neo4j     │      Redis      │   Logs   │
│  + pgvector     │  + Graphiti     │   Multi-Layer   │ + Traces │
│ • Vector Store  │ • Knowledge     │ • Performance   │ • Metrics│
│ • Metadata      │ • Temporal      │ • Session       │ • Events │
│ • HNSW Index    │ • Relationships │ • Embedding     │ • Audit  │
└─────────────────┴─────────────────┴─────────────────┴──────────┘

Core Processing Pipelines

1. Memory Storage Pipeline

Input Content → Format Detection → Document Loading → Chunking Strategy Selection
     ↓                ↓               ↓                     ↓
Text Extraction → LLM Enhancement → Entity Extraction → Relationship Mapping
     ↓                ↓               ↓                     ↓
Embedding Generation → Vector Storage → Graph Storage → Cache Update
     ↓                ↓               ↓                     ↓
Metadata Recording → Performance Metrics → Success Response

2. Memory Search Pipeline

Search Query → Query Enhancement → Multi-Strategy Retrieval
     ↓              ↓                      ↓
Vector Search + Keyword Search + Graph Traversal
     ↓              ↓                      ↓
Result Fusion → Advanced Reranking → Confidence Scoring
     ↓              ↓                      ↓
Hallucination Detection → Evidence Collection → Response Formatting

3. Document Ingestion Pipeline

Document Input → Format Detection → Specialized Loader Selection
     ↓               ↓                      ↓
Content Extraction → Chunking Strategy → LLM Context Enhancement
     ↓               ↓                      ↓
Batch Processing → Memory Integration → Progress Tracking

4. Self-Learning Pipeline

Performance Data → Pattern Recognition → Experiment Design
     ↓                   ↓                    ↓
A/B Testing → Statistical Analysis → Configuration Updates
     ↓                   ↓                    ↓
Rollback Protection → Success Validation → Learning Storage

Technology Stack

Core Technologies

Python 3.8+: Primary development language
FastMCP: Model Context Protocol implementation
FastAPI: REST API framework with automatic documentation
Pydantic: Data validation and settings management
asyncio: Asynchronous programming for high performance

Databases & Storage

PostgreSQL 14+: Primary data store with JSON support
pgvector: Vector similarity search with HNSW indexing
Neo4j: High-performance graph database
Redis: Multi-layer caching and session storage

Machine Learning & NLP

HuggingFace Transformers: Embedding models (e5-large-v2, MiniLM)
Sentence Transformers: Optimized embedding inference
spaCy: Named entity recognition and text processing
NLTK: Natural language processing utilities

Observability & Monitoring

OpenTelemetry: Distributed tracing and metrics
Prometheus: Metrics collection and alerting
Grafana: Performance dashboards and visualization
Jaeger: Distributed tracing visualization

Development & Deployment

Docker: Containerization with multi-stage builds
Docker Compose: Local development environment
pytest: Comprehensive testing framework
Black/isort/flake8: Code formatting and linting

Performance Characteristics

Latency Targets (P95)

Memory Storage: <100ms per operation
Vector Search: <50ms for top-10 results
Hybrid Search: <150ms with reranking
Document Ingestion: <2s per PDF page
Graph Queries: <30ms for simple traversals
Hallucination Detection: <200ms per analysis

Throughput Capabilities

Concurrent Users: 50-100 depending on hardware
Document Processing: 10-20 documents/minute
Memory Operations: 1000+ operations/minute
Cache Hit Rate: >85% for frequently accessed data

Scalability Metrics

Memory Capacity: Unlimited (PostgreSQL-based)
Graph Complexity: Millions of entities/relationships
Vector Dimensions: 384-1024 (configurable)
Cache Size: 2-8GB recommended for optimal performance

⚙️ Configuration

Environment Variables

# Core Configuration
TYRA_ENV=development|production
TYRA_LOG_LEVEL=DEBUG|INFO|WARNING|ERROR

# Database Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=tyra_memory
POSTGRES_USER=tyra
POSTGRES_PASSWORD=secure_password

REDIS_HOST=localhost
REDIS_PORT=6379

NEO4J_HOST=localhost
NEO4J_PORT=7687

# Optional: OpenAI for fallback embeddings
OPENAI_API_KEY=sk-...

Configuration Files

The system uses a layered configuration approach with multiple YAML files:

Main Configuration (`config/config.yaml`)

# Core application settings
app:
  name: "Tyra MCP Memory Server"
  version: "1.0.0"
  environment: ${TYRA_ENV:-development}

# Memory system configuration
memory:
  backend: "postgres"
  vector_dimensions: 1024
  chunk_size: 512
  chunk_overlap: 50
  max_memories_per_agent: 1000000

# API server settings
api:
  host: ${API_HOST:-0.0.0.0}
  port: ${API_PORT:-8000}
  enable_docs: ${API_ENABLE_DOCS:-true}
  cors_origins: ["*"]
  rate_limit: 1000  # requests per minute

Provider Configuration (`config/providers.yaml`)

# Embedding providers
embeddings:
  primary: "huggingface"
  fallback: "huggingface_light"
  providers:
    huggingface:
      model: "intfloat/e5-large-v2"
      device: "auto"
      batch_size: 32

RAG Configuration (`config/rag.yaml`)

# Retrieval and reranking settings
retrieval:
  hybrid_weight: 0.7  # vector vs keyword
  max_results: 20
  diversity_penalty: 0.1

reranking:
  enabled: true
  provider: "cross_encoder"
  top_k: 10

hallucination:
  enabled: true
  threshold: 75
  require_evidence: true

Document Ingestion (`config/ingestion.yaml`)

# File processing settings
ingestion:
  max_file_size: 104857600  # 100MB
  max_batch_size: 100
  concurrent_limit: 20
  supported_formats: ["pdf", "docx", "pptx", "txt", "md", "html", "json", "csv", "epub"]
  
  chunking:
    default_strategy: "auto"
    strategies:
      auto:
        file_type_mapping:
          pdf: "semantic"
          docx: "paragraph"
          pptx: "slide"

Self-Learning Configuration (`config/self_learning.yaml`)

# Adaptive learning settings
self_learning:
  enabled: true
  analysis_interval: "1h"
  improvement_interval: "24h"
  auto_optimize: true
  
  quality_thresholds:
    memory_accuracy: 0.85
    performance_degradation: 0.1
    hallucination_rate: 0.05

Observability Configuration (`config/observability.yaml`)

# OpenTelemetry and monitoring
otel:
  enabled: true
  service_name: "tyra-mcp-memory-server"
  
tracing:
  enabled: true
  exporter: "console"  # or "jaeger", "otlp"
  sampler: "parentbased_traceidratio"
  sampler_arg: 1.0

metrics:
  enabled: true
  export_interval: 60000  # 60 seconds
  
logging:
  level: ${LOG_LEVEL:-INFO}
  format: "json"
  rotation:
    max_size: "100MB"
    max_files: 10

Environment Variables Reference

Core Settings

# Application Environment
TYRA_ENV=development|production
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR
TYRA_DEBUG=true|false

# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_ENABLE_DOCS=true
API_RATE_LIMIT=1000

# Database URLs
DATABASE_URL=postgresql://user:pass@localhost:5432/tyra_memory
REDIS_URL=redis://localhost:6379/0
NEO4J_URL=neo4j://localhost:7687

Model Configuration

# Embedding Models
EMBEDDINGS_PRIMARY_MODEL=intfloat/e5-large-v2
EMBEDDINGS_FALLBACK_MODEL=sentence-transformers/all-MiniLM-L12-v2
EMBEDDINGS_DEVICE=auto|cpu|cuda

# Model Caching
MODEL_CACHE_DIR=/models/cache
EMBEDDING_CACHE_TTL=86400  # 24 hours

Performance Tuning

# Memory Management
MEMORY_MAX_CHUNK_SIZE=2048
MEMORY_CHUNK_OVERLAP=100
MEMORY_BATCH_SIZE=50

# Caching Configuration
CACHE_TTL_EMBEDDINGS=86400
CACHE_TTL_SEARCH=3600
CACHE_TTL_RERANK=1800
CACHE_MAX_SIZE=2GB

# Database Pools
POSTGRES_POOL_SIZE=20
REDIS_POOL_SIZE=50
NEO4J_POOL_SIZE=10

Security Settings

# Authentication (Optional)
API_KEY_ENABLED=false
API_KEY=your-secure-api-key
JWT_SECRET=your-jwt-secret

# Rate Limiting
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_WINDOW=60
RATE_LIMIT_BURST=50

# CORS Configuration
CORS_ORIGINS=*
CORS_METHODS=GET,POST,PUT,DELETE
CORS_HEADERS=*

Document Ingestion

# File Processing
INGESTION_MAX_FILE_SIZE=104857600
INGESTION_MAX_BATCH_SIZE=100
INGESTION_CONCURRENT_LIMIT=20
INGESTION_TIMEOUT=300

# LLM Enhancement
INGESTION_LLM_ENHANCEMENT=true
INGESTION_LLM_MODE=rule_based
VLLM_ENDPOINT=http://localhost:8000/v1
VLLM_MODEL=meta-llama/Llama-3.1-8B-Instruct

Observability

# OpenTelemetry
OTEL_ENABLED=true
OTEL_SERVICE_NAME=tyra-mcp-memory-server
OTEL_TRACES_ENABLED=true
OTEL_METRICS_ENABLED=true
OTEL_LOGS_ENABLED=true

# Exporters
OTEL_TRACES_EXPORTER=console
OTEL_METRICS_EXPORTER=console
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

🔍 Monitoring & Debugging

Health Checks

# Check system health
curl -X POST http://localhost:8000/tools/health_check \
  -d '{"detailed": true}'

Performance Analytics

# Get performance summary
curl -X POST http://localhost:8000/tools/get_memory_stats \
  -d '{"include_performance": true}'

Logs

# View real-time logs
tail -f logs/tyra-memory.log

# Search for specific events
grep "hallucination" logs/tyra-memory.log
grep "ERROR" logs/tyra-memory.log

🧪 Development

Testing

# Run unit tests
python -m pytest tests/unit/

# Run integration tests
python -m pytest tests/integration/

# Run end-to-end tests
python -m pytest tests/e2e/

Development Mode

# Enable hot reload and debug features
export TYRA_ENV=development
export TYRA_DEBUG=true
export TYRA_HOT_RELOAD=true

python main.py

Model Development

# Download and test models
python scripts/download_models.py

# Benchmark different models
python scripts/benchmark_models.py

🚀 Production Deployment

Docker Deployment

Quick Production Start

# Clone repository
git clone https://github.com/your-org/tyra-mcp-memory-server.git
cd tyra-mcp-memory-server

# Copy and configure environment
cp .env.example .env
# Edit .env with your settings

# Build and start all services
docker-compose -f docker-compose.prod.yml up -d

# Verify deployment
docker-compose -f docker-compose.prod.yml ps
curl http://localhost:8000/health

Multi-Stage Docker Build

# Build optimized production image
docker build -t tyra-memory-server:latest \
  --target production \
  --build-arg ENVIRONMENT=production .

# Run with custom configuration
docker run -d \
  --name tyra-memory \
  -p 8000:8000 \
  -v $(pwd)/config:/app/config:ro \
  -v $(pwd)/data:/app/data \
  --env-file .env \
  tyra-memory-server:latest

Container Orchestration

# kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tyra-memory-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: tyra-memory-server
  template:
    metadata:
      labels:
        app: tyra-memory-server
    spec:
      containers:
      - name: tyra-memory
        image: tyra-memory-server:latest
        ports:
        - containerPort: 8000
        env:
        - name: TYRA_ENV
          value: "production"
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"

Systemd Service Deployment

Service Installation

# Install system service
sudo cp scripts/tyra-memory.service /etc/systemd/system/
sudo systemctl daemon-reload

# Enable auto-start on boot
sudo systemctl enable tyra-memory

# Start service
sudo systemctl start tyra-memory

# Check status and logs
sudo systemctl status tyra-memory
sudo journalctl -u tyra-memory -f

Service Configuration (`scripts/tyra-memory.service`)

[Unit]
Description=Tyra MCP Memory Server
After=network.target postgresql.service redis.service
Requires=postgresql.service redis.service

[Service]
Type=simple
User=tyra
Group=tyra
WorkingDirectory=/opt/tyra-memory-server
ExecStart=/opt/tyra-memory-server/venv/bin/python main.py
Restart=always
RestartSec=10
Environment=TYRA_ENV=production
Environment=LOG_LEVEL=INFO

# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/tyra-memory-server/data /opt/tyra-memory-server/logs

[Install]
WantedBy=multi-user.target

Performance Tuning

Hardware Requirements

# Minimum Requirements
CPU: 4 cores (8 threads recommended)
RAM: 8GB (16GB recommended)
Storage: 100GB SSD (fast I/O critical)
Network: 1Gbps (for high-throughput scenarios)

# Optimal Configuration
CPU: 8+ cores with AVX2 support
RAM: 32GB+ (for large models and caching)
GPU: NVIDIA GPU with 8GB+ VRAM (optional, for GPU acceleration)
Storage: NVMe SSD with 1000+ IOPS

Database Optimization

-- PostgreSQL performance tuning
-- Add to postgresql.conf

# Memory settings
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 256MB
maintenance_work_mem = 1GB

# Vector-specific settings
max_connections = 200
shared_preload_libraries = 'vector'

# Performance settings
random_page_cost = 1.1
checkpoint_completion_target = 0.9
wal_buffers = 64MB

Redis Configuration

# Redis optimization (redis.conf)
maxmemory 4gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000

# Persistence settings
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

Application Tuning

# config/config.yaml - Production optimizations
performance:
  # Database connection pools
  postgres_pool_size: 20
  redis_pool_size: 50
  neo4j_pool_size: 10
  
  # Memory management
  max_chunk_size: 2048
  batch_size: 50
  
  # Caching optimization
  cache_sizes:
    embeddings: "2GB"
    search_results: "1GB"
    rerank_cache: "512MB"
  
  # Async processing
  max_concurrent_requests: 100
  request_timeout: 30
  
  # GPU optimization (if available)
  gpu_enabled: true
  gpu_memory_fraction: 0.8
  cuda_device: 0

Monitoring & Observability

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'tyra-memory-server'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: '/metrics'
    scrape_interval: 30s

Grafana Dashboard

{
  "dashboard": {
    "title": "Tyra Memory Server",
    "panels": [
      {
        "title": "Request Latency",
        "type": "graph",
        "targets": [{
          "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
        }]
      },
      {
        "title": "Memory Usage",
        "type": "graph", 
        "targets": [{
          "expr": "process_resident_memory_bytes"
        }]
      }
    ]
  }
}

Load Balancing & High Availability

NGINX Configuration

upstream tyra_memory_servers {
    least_conn;
    server 127.0.0.1:8001 weight=1 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8002 weight=1 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8003 weight=1 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name memory.tyra-ai.com;
    
    location / {
        proxy_pass http://tyra_memory_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout 30s;
        proxy_send_timeout 30s;
        proxy_read_timeout 30s;
    }
    
    location /health {
        proxy_pass http://tyra_memory_servers/health;
        access_log off;
    }
}

💼 Use Cases & Applications

🤖 AI Agent Memory Enhancement

Personal Assistants: Long-term conversation memory and context retention
Customer Support: Historical interaction tracking and personalized responses
Research Assistants: Literature review, citation tracking, and knowledge synthesis
Educational Tutors: Student progress tracking and adaptive learning paths

📊 Enterprise Knowledge Management

Document Processing: Automated ingestion of company documents, policies, and procedures
Institutional Memory: Preserve and access organizational knowledge across teams
Compliance Tracking: Audit trails and regulatory requirement monitoring
Decision Support: Historical data analysis and trend identification

🏦 Financial Services Applications

Trading Support: Market analysis with confidence-scored recommendations (95% threshold)
Risk Assessment: Historical pattern analysis and risk factor identification
Client Profiling: Investment preferences and trading behavior analysis
Regulatory Compliance: Transaction monitoring and compliance reporting

🔬 Research & Analytics

Scientific Research: Literature mining and hypothesis generation
Market Research: Consumer behavior analysis and trend prediction
Competitive Intelligence: Industry analysis and competitor monitoring
Data Mining: Pattern discovery in large document collections

🌐 Multi-Agent Orchestration

Agent Coordination: Shared knowledge base for multiple AI agents
Workflow Automation: n8n integration for document processing pipelines
Cross-Platform Integration: Unified memory across different AI systems
Session Management: Context preservation across agent interactions

🔒 Secure Deployment Scenarios

Air-Gapped Networks: Completely offline operation with local models
HIPAA Compliance: Healthcare data processing with audit trails
Financial Regulations: Trading compliance with mandatory confidence thresholds
Government Applications: Classified information processing with security controls

🔧 Integration Examples

Claude Desktop Integration

{
  "mcpServers": {
    "tyra-memory": {
      "command": "python",
      "args": ["/path/to/tyra-mcp-memory-server/main.py"],
      "env": {
        "TYRA_ENV": "production",
        "DATABASE_URL": "postgresql://user:pass@localhost:5432/tyra",
        "LOG_LEVEL": "INFO"
      }
    }
  }
}

n8n Workflow Integration

{
  "nodes": [
    {
      "name": "Document Upload",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "http://localhost:8000/v1/ingest/document",
        "method": "POST",
        "body": {
          "source_type": "base64",
          "file_name": "{{ $json.filename }}",
          "file_type": "pdf",
          "content": "{{ $json.base64_content }}"
        }
      }
    },
    {
      "name": "Memory Search",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "http://localhost:8000/v1/memory/search",
        "method": "POST",
        "body": {
          "query": "{{ $json.search_query }}",
          "agent_id": "n8n_workflow",
          "include_analysis": true,
          "min_confidence": 0.8
        }
      }
    }
  ]
}

Python Client Usage

from tyra_memory_client import MemoryClient
import asyncio

async def main():
    # Initialize client
    client = MemoryClient(
        base_url="http://localhost:8000",
        api_key="your-api-key"  # if authentication enabled
    )
    
    # Store a memory
    result = await client.store_memory(
        content="User prefers technical analysis for stock trading",
        agent_id="trading_bot",
        extract_entities=True,
        metadata={"category": "trading", "confidence": 95}
    )
    print(f"Stored memory: {result.memory_id}")
    
    # Search memories
    results = await client.search_memories(
        query="trading preferences",
        agent_id="trading_bot",
        min_confidence=0.8,
        include_analysis=True
    )
    
    for memory in results.memories:
        print(f"Memory: {memory.content}")
        print(f"Confidence: {memory.confidence_score}")
        print(f"Grounding: {memory.grounding_score}")
    
    # Ingest document
    with open("trading_strategy.pdf", "rb") as f:
        doc_result = await client.ingest_document(
            file_content=f.read(),
            file_name="trading_strategy.pdf",
            file_type="pdf",
            chunking_strategy="semantic",
            enable_llm_context=True
        )
    print(f"Ingested {doc_result.chunks_ingested} chunks")

if __name__ == "__main__":
    asyncio.run(main())

📈 Performance Benchmarks

Current Performance (Local Setup)

Memory Storage: ~100ms per document
Vector Search: ~50ms for top-10 results
Hybrid Search: ~150ms with reranking
Hallucination Analysis: ~200ms per response
Memory Usage: ~500MB base + ~2GB for models

Enhanced Performance Targets (Coming Soon)

Memory Synthesis: <200ms for intelligent deduplication
Predictive Preloading: 40%+ latency reduction through ML prediction
Multi-Modal Processing: <500ms for image/video analysis
Real-Time Streams: <50ms event propagation latency
Auto-Optimization: 25%+ performance improvement through self-learning

Scalability

Current: 10-50 concurrent requests
Enhanced: 100+ concurrent with auto-scaling (Phase 3)
Memory Capacity: Unlimited (PostgreSQL-based)
Graph Complexity: Optimized for millions of entities/relationships
Future: Federated networks for distributed deployment (Phase 4)

🤝 Contributing

Development Setup

# Setup development environment
./scripts/setup.sh --env development

# Install development dependencies
pip install -r requirements-dev.txt

# Setup pre-commit hooks
pre-commit install

Code Standards

Type Hints: All functions must have type annotations
Documentation: Docstrings for all public methods
Testing: Minimum 80% code coverage
Formatting: Black + isort + flake8

📜 License

MIT License - see file for details.

🆘 Support

Documentation

- Complete enhancement plan
- Implementation standards
- Modular component architecture

Community

Issues: GitHub Issues
Discussions: GitHub Discussions
Discord: Tyra AI Community

Commercial Support

For enterprise support, custom integrations, and professional services, contact:

🎯 Implementation Roadmap

✅ Current: Production Foundation

✅ Advanced memory system with PostgreSQL + pgvector
✅ Neo4j temporal knowledge graphs with Graphiti
✅ Basic hallucination detection and confidence scoring
✅ Multi-format document ingestion (9 file types)
✅ Cross-encoder reranking and hybrid search
✅ Redis multi-layer caching and performance optimization

🚀 Phase 1: AI Enhancement Implementation (Ready)

🏗️ Pydantic AI integration for structured outputs (dependency included, awaiting implementation)
🏗️ Multi-layer hallucination detection enhancement (beyond current grounding-based system)
🏗️ Advanced memory synthesis and deduplication
🏗️ Real-time streaming capabilities (WebSocket infrastructure)

📈 Phase 2: Intelligence Amplification (Planned)

📅 Predictive memory management and preloading
📅 Context-aware embeddings with fine-tuning
📅 Multi-modal support (images, audio, video)
📅 Advanced graph reasoning and causal inference

🎯 Phase 3: Operational Excellence (Future)

📅 Complete self-optimization engine
📅 Enhanced security and privacy features
📅 Advanced observability and analytics
📅 Federated memory networks

Current system is production-ready. Enhancements use included dependencies and maintain 100% backward compatibility.

Built with ❤️ by the Tyra AI Team

Transforming AI agents with genius-tier memory and intelligence capabilities