bigrock776/tyra-mcp
If you are the rightful owner of tyra-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Tyra Advanced Memory MCP Server is a sophisticated Model Context Protocol server designed to enhance intelligent agent ecosystems with advanced memory capabilities.
store_memory
Store information with automatic entity extraction and metadata enrichment.
search_memory
Advanced hybrid search with confidence scoring and hallucination analysis.
get_all_memories
Retrieve all memories for an agent with filtering and pagination.
delete_memory
Remove specific memories with optional cascade deletion.
ingest_document
Ingest documents with automatic format detection and intelligent processing.
Tyra Advanced Memory MCP Server
A sophisticated Model Context Protocol (MCP) server providing advanced memory capabilities with RAG (Retrieval-Augmented Generation), hallucination detection, adaptive learning, and enterprise-grade AI infrastructure for intelligent agent ecosystems.
๐ AI Enhancement Roadmap
The Tyra Memory Server includes advanced AI dependencies and infrastructure, with planned enhancements for comprehensive intelligence platform capabilities:
๐ง AI-Powered Memory Synthesis (Implementation Ready)
- Intelligent Deduplication: 40%+ reduction in duplicate memories using local semantic analysis
- Automated Summarization: Multi-layer summarization with Pydantic AI validation and anti-hallucination
- Pattern Recognition: Cross-memory insights using local ML clustering and topic modeling
- Temporal Evolution: Track memory content changes and concept development over time
- Status: Dependencies included (pydantic-ai>=0.0.13), ready for implementation
๐ก๏ธ Pydantic AI & Anti-Hallucination Architecture (Dependency Ready)
- Pydantic AI Status: Dependency included (pydantic-ai>=0.0.13) but not yet implemented
- Current Implementation: Extensive Pydantic schemas throughout codebase for data validation
- Current Hallucination Detection: Grounding-based analysis with confidence scoring
- Ready for Enhancement: Pydantic AI integration for structured, validated AI outputs
- Planned Features: Multi-layer validation, real-time detection, mandatory citation validation
โก Advanced RAG Pipeline Enhancement (Phase 1-2)
- Multi-Modal Support: Images, videos, audio, and code with local CLIP and Whisper models
- Contextual Chunk Linking: Intelligent chunk relationships and coherence scoring
- Dynamic Reranking: Real-time ranking adaptation with local cross-encoder models
- Intent-Aware Retrieval: Query intent classification with strategy selection
๐ Real-Time Memory Streams (Phase 2)
- WebSocket Infrastructure: Live memory updates and collaborative search
- Event-Driven Triggers: Custom automation with local rule engines
- Progressive Search: Real-time query suggestions and result refinement
- Live Analytics: Streaming performance metrics and insights
๐ฎ Predictive Intelligence (Phase 2-3)
- Usage Pattern Analysis: ML-driven access pattern prediction
- Smart Auto-Archiving: Intelligent memory lifecycle management
- Predictive Preloading: 40%+ latency reduction through anticipation
- Context-Aware Embeddings: Session-adaptive embeddings with fine-tuning
๐ฏ Self-Optimization Engine (Phase 3)
- Continuous Learning: Online adaptation without manual intervention
- A/B Testing Framework: Statistical experimentation with local validation
- Hyperparameter Optimization: Bayesian optimization for 20%+ performance gains
- Memory Personality Profiles: Personalized user experience adaptation
๐ฅ All enhancements are 100% local and open source with zero external dependencies!
๐ Current Features
๐ง Implementation Status
- Current Data Validation: Extensive Pydantic usage throughout codebase (BaseModel, Field validation)
- Pydantic AI: Dependency included (pydantic-ai>=0.0.13) but not yet implemented in codebase
- Available for Integration: Ready to implement Pydantic AI for structured AI outputs and enhanced validation
- Current Validation: Comprehensive Pydantic schemas + confidence scoring + grounding-based hallucination detection
๐ง Advanced Memory System
- Multi-Modal Storage: Vector embeddings + temporal knowledge graphs with Neo4j + Graphiti integration
- Agent Isolation: Separate memory spaces for Tyra, Claude, Archon with session management
- Intelligent Chunking: 6 dynamic strategies (auto, paragraph, semantic, slide, line, token) with size optimization
- Entity Extraction: Automated NER with relationship mapping using temporal knowledge graphs
- Memory Versioning: Track memory evolution with temporal validity intervals
- Memory Health: Automated stale detection, redundancy removal, and consolidation
- Multi-Level Caching: L1 (in-memory), L2 (Redis), L3 (materialized views) for <100ms p95 latency
๐ Universal Document Ingestion
- 9 File Formats: PDF (PyMuPDF), DOCX (python-docx), PPTX (python-pptx), TXT/MD (encoding detection), HTML (html2text), JSON (nested objects), CSV (streaming), EPUB (chapters), and custom format support
- Smart Processing: Auto-format detection with specialized loaders and fallback mechanisms
- Dynamic Chunking: File-type-aware strategies (semantic for PDFs, paragraph for DOCX, slide for PPTX)
- LLM Enhancement: Context injection with rule-based templates + optional vLLM integration
- Batch Processing: Concurrent ingestion up to 100 documents with configurable concurrency (default: 20)
- Streaming Pipeline: Memory-efficient processing for large files (>10MB) with progress tracking
- Comprehensive Metadata: Document properties, chunk metadata, confidence scoring, hallucination detection
- Error Recovery: Graceful fallback with retry logic and detailed error reporting
๐ Sophisticated Search & RAG
- Hybrid Search: Weighted combination (0.7 vector + 0.3 keyword) with graph traversal enhancement
- Multi-Strategy Retrieval: Vector similarity, keyword matching, temporal queries, graph traversal
- Advanced Reranking: Cross-encoder models + optional vLLM-based reranking with caching
- Confidence Scoring: Multi-level assessment (๐ช Rock Solid 95%+, ๐ง High 80%+, ๐ค Fuzzy 60%+, โ ๏ธ Low <60%)
- Hallucination Detection: Real-time grounding analysis with evidence collection and consistency checking
- Trading Safety: Unbypassable 95% confidence requirement for financial operations with audit logging
- Context Enrichment: Graph-based context expansion with temporal relevance weighting
๐ธ๏ธ Temporal Knowledge Graph
- Neo4j Integration: High-performance graph database with Cypher query support
- Graphiti Framework: Advanced temporal knowledge management with validity intervals
- Entity Management: Automated extraction, typing, merging, and updates with conflict resolution
- Relationship Tracking: Temporal relationship extraction with time-based validity and evolution tracking
- Graph Traversal: Efficient path finding, subgraph extraction, and entity timeline queries
- Temporal Queries: Time-range filtering, relationship evolution, and temporal pattern matching
๐ Modular Provider System
- Hot-Swappable Components: Runtime provider switching without restart
- Embedding Providers: HuggingFace (intfloat/e5-large-v2 primary, all-MiniLM-L12-v2 fallback), OpenAI fallback
- Vector Stores: PostgreSQL + pgvector with HNSW indexing, future support for Weaviate, Qdrant
- Graph Engines: Neo4j + Graphiti (primary), extensible to other graph databases
- Rerankers: Cross-encoder models, vLLM integration, custom reranking strategies
- Cache Providers: Redis (multi-layer), in-memory LRU, future distributed caching
- File Loaders: Extensible loader registry with custom format support
- Fallback Mechanisms: Automatic failover with circuit breakers and health monitoring
๐ Performance Analytics & Observability
- Real-Time Monitoring: Response time, accuracy, memory usage, cache hit rates with configurable dashboards
- OpenTelemetry Integration: Complete distributed tracing, metrics collection, and structured logging
- Trend Analysis: Automated performance trend detection with statistical significance testing
- Smart Alerts: Configurable warning (response time >100ms) and critical thresholds with notification channels
- Performance Metrics: Request latency histograms, error rate tracking, resource utilization monitoring
- Health Checks: Comprehensive component health monitoring with automatic recovery
- Audit Logging: Complete operation audit trail with request correlation and compliance reporting
๐ฏ Adaptive Learning & Self-Optimization
- Self-Optimization: Automated parameter tuning based on performance data and user feedback
- A/B Testing Framework: Systematic experimentation with statistical significance testing and rollback protection
- Learning Insights: Pattern recognition from successful configurations and failure analysis
- Multi-Strategy Optimization: Gradient descent, Bayesian optimization, random search with ensemble methods
- Memory Health Management: Stale memory detection, redundancy identification, and automated cleanup
- Prompt Evolution: Continuous improvement of prompts based on success/failure patterns
- Configuration Adaptation: Dynamic configuration updates with safety constraints and rollback capabilities
- Performance Baselines: Automatic establishment and tracking of performance benchmarks
๐ Dual Interface Architecture
- MCP Protocol: Full Model Context Protocol support for Claude, Tyra, and other MCP-compatible agents
- REST API: Comprehensive HTTP API with OpenAPI documentation for web integrations and custom clients
- WebSocket Support: Real-time updates and streaming for long-running operations
- n8n Integration: Pre-built webhook endpoints and workflow templates for automation
- SDK Support: Python and JavaScript client libraries with async support and retry logic
๐ Enterprise Security & Safety
- Local Operation: 100% local deployment with no external API dependencies for data privacy
- Multi-Agent Isolation: Secure separation of agent memory spaces with access controls
- Trading Safety: Unbypassable confidence requirements for financial operations with multiple validation layers
- Input Validation: Comprehensive request validation with SQL injection and XSS protection
- Rate Limiting: Configurable limits (1000/min default) with burst protection (50/sec)
- Circuit Breakers: Automatic failure protection with configurable thresholds and recovery timeouts
- Audit Trail: Complete operation logging with compliance reporting and forensic capabilities
๐ Quick Start
Note: Current version is production-ready with advanced AI infrastructure. Enhanced features can be implemented using included dependencies with full backward compatibility.
Prerequisites
- Python 3.11+
- PostgreSQL with pgvector extension
- Redis (for caching)
- Neo4j (for knowledge graphs)
- HuggingFace CLI (for model downloads)
- Git LFS (for large model files)
Implementation Guide: For implementing advanced features, see for implementation patterns using included dependencies.
Automated Setup
# Run unified setup script
./setup.sh --env development
# Start the server
source venv/bin/activate
python main.py
Manual Installation
-
Clone and Setup
git clone https://github.com/your-org/tyra-mcp-memory-server.git cd tyra-mcp-memory-server python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Install Model Prerequisites
# Install HuggingFace CLI and Git LFS pip install huggingface-hub git lfs install
-
Download Required Models โ ๏ธ REQUIRED - No Automatic Downloads
# Create model directories mkdir -p ./models/embeddings ./models/cross-encoders # Download primary embedding model (~1.34GB) huggingface-cli download intfloat/e5-large-v2 \ --local-dir ./models/embeddings/e5-large-v2 \ --local-dir-use-symlinks False # Download fallback embedding model (~120MB) huggingface-cli download sentence-transformers/all-MiniLM-L12-v2 \ --local-dir ./models/embeddings/all-MiniLM-L12-v2 \ --local-dir-use-symlinks False # Download cross-encoder for reranking (~120MB) huggingface-cli download cross-encoder/ms-marco-MiniLM-L-6-v2 \ --local-dir ./models/cross-encoders/ms-marco-MiniLM-L-6-v2 \ --local-dir-use-symlinks False
-
Verify Model Installation
# Test all models are working python scripts/test_model_pipeline.py
-
Database Setup
# Start databases with Docker docker-compose -f docker-compose.dev.yml up -d # Or configure your own PostgreSQL, Redis, Neo4j instances
-
Configuration
cp .env.example .env # Edit .env with your database credentials
-
Start Server
python main.py
๐ง MCP Integration
AI Enhancement Available: MCP tools can be enhanced with Pydantic AI validation using included dependencies. Current tools include basic confidence scoring and hallucination detection.
Claude Desktop Configuration
Add to your Claude Desktop MCP settings:
{
"mcpServers": {
"tyra-memory": {
"command": "python",
"args": ["/path/to/tyra-mcp-memory-server/main.py"],
"env": {
"TYRA_ENV": "production"
}
}
}
}
Available Tools
๐ง Core Memory Tools
๐ store_memory
Store information with automatic entity extraction and metadata enrichment.
{
"tool": "store_memory",
"content": "User prefers morning trading sessions and uses technical analysis",
"agent_id": "tyra",
"session_id": "trading_session_001",
"extract_entities": true,
"chunk_content": false,
"metadata": {"category": "trading_preferences", "confidence": 95}
}
๐ search_memory
Advanced hybrid search with confidence scoring and hallucination analysis.
{
"tool": "search_memory",
"query": "What are the user's trading preferences?",
"agent_id": "tyra",
"search_type": "hybrid",
"top_k": 10,
"min_confidence": 0.7,
"include_analysis": true,
"rerank": true,
"temporal_weight": 0.3
}
๐ get_all_memories
Retrieve all memories for an agent with filtering and pagination.
{
"tool": "get_all_memories",
"agent_id": "tyra",
"limit": 50,
"offset": 0,
"include_metadata": true,
"filter_by_date": "2024-01-01",
"category": "trading"
}
๐๏ธ delete_memory
Remove specific memories with optional cascade deletion.
{
"tool": "delete_memory",
"memory_id": "mem_12345",
"agent_id": "tyra",
"cascade_delete": false
}
๐ Document Processing Tools
๐ ingest_document
Ingest documents with automatic format detection and intelligent processing.
{
"tool": "ingest_document",
"file_path": "/path/to/document.pdf",
"file_type": "pdf",
"chunking_strategy": "auto",
"chunk_size": 512,
"chunk_overlap": 50,
"enable_llm_context": true,
"extract_entities": true,
"metadata": {"source": "user_upload", "priority": "high"}
}
๐ batch_ingest
Process multiple documents concurrently with progress tracking.
{
"tool": "batch_ingest",
"documents": [
{"file_path": "/docs/report1.pdf", "chunking_strategy": "semantic"},
{"file_path": "/docs/data.csv", "chunking_strategy": "row"}
],
"max_concurrent": 10,
"progress_callback": true
}
๐ get_ingestion_status
Monitor document processing status and progress.
{
"tool": "get_ingestion_status",
"job_id": "batch_12345",
"include_details": true
}
๐ก๏ธ Analysis & Validation Tools
๐ก๏ธ analyze_response
Analyze any response for hallucinations and confidence scoring.
{
"tool": "analyze_response",
"response": "Based on your history, you prefer swing trading",
"query": "What's my trading style?",
"retrieved_memories": [...],
"detailed_analysis": true,
"include_evidence": true
}
๐ฏ validate_for_trading
Special validation for financial operations with 95% confidence requirement.
{
"tool": "validate_for_trading",
"query": "Should I buy AAPL stock?",
"response": "Based on your risk profile, AAPL looks good",
"context_memories": [...],
"require_rock_solid": true
}
๐ rerank_results
Improve search result relevance with advanced reranking.
{
"tool": "rerank_results",
"query": "trading strategies",
"results": [...],
"reranker_type": "cross_encoder",
"top_k": 5
}
๐ธ๏ธ Knowledge Graph Tools
๐ธ๏ธ query_graph
Execute graph traversal queries on the knowledge graph.
{
"tool": "query_graph",
"cypher_query": "MATCH (p:PERSON)-[r:TRADES]->(s:STOCK) RETURN p, r, s",
"agent_id": "tyra",
"include_temporal": true
}
๐ get_entity_relationships
Explore entity connections and relationship paths.
{
"tool": "get_entity_relationships",
"entity_name": "AAPL",
"relationship_types": ["CORRELATES_WITH", "COMPETES_WITH"],
"max_depth": 3,
"temporal_filter": "last_30_days"
}
๐ get_entity_timeline
View entity evolution over time.
{
"tool": "get_entity_timeline",
"entity_id": "entity_12345",
"start_date": "2024-01-01",
"end_date": "2024-12-31",
"include_relationships": true
}
๐ System Monitoring Tools
๐ get_memory_stats
Comprehensive system statistics and health metrics.
{
"tool": "get_memory_stats",
"agent_id": "tyra",
"include_performance": true,
"include_recommendations": true,
"include_cache_stats": true,
"time_range": "last_24_hours"
}
๐ฅ health_check
Complete system health assessment.
{
"tool": "health_check",
"detailed": true,
"include_components": ["vector_store", "graph_engine", "cache", "embeddings"],
"run_diagnostics": true
}
โก get_performance_metrics
Real-time performance and resource utilization.
{
"tool": "get_performance_metrics",
"metric_types": ["latency", "throughput", "error_rate"],
"time_window": "5m",
"include_predictions": true
}
๐ฏ Learning & Optimization Tools
๐ฏ get_learning_insights
Access adaptive learning insights and optimization recommendations.
{
"tool": "get_learning_insights",
"category": "parameter_optimization",
"days": 7,
"include_experiments": true,
"confidence_threshold": 0.8
}
๐ง optimize_configuration
Trigger configuration optimization based on usage patterns.
{
"tool": "optimize_configuration",
"components": ["embeddings", "cache", "reranking"],
"optimization_strategy": "bayesian",
"safety_constraints": true
}
๐ get_improvement_suggestions
AI-generated recommendations for system enhancement.
{
"tool": "get_improvement_suggestions",
"focus_areas": ["performance", "accuracy", "reliability"],
"priority": "high",
"include_implementation": true
}
๐ง Administrative Tools
โ๏ธ update_configuration
Dynamically update system configuration without restart.
{
"tool": "update_configuration",
"config_section": "cache",
"updates": {"ttl": 7200, "max_size": "2GB"},
"validate_before_apply": true
}
๐งน cleanup_memories
Clean up stale or redundant memories.
{
"tool": "cleanup_memories",
"agent_id": "tyra",
"older_than_days": 90,
"confidence_threshold": 0.3,
"dry_run": true
}
๐พ backup_memories
Create backup of agent memories.
{
"tool": "backup_memories",
"agent_id": "tyra",
"include_graph": true,
"compression": true,
"backup_location": "/backups/tyra_memories.gz"
}
๐๏ธ Architecture
High-Level System Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CLIENT LAYER โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโค
โ Claude โ Tyra โ Archon โ n8n โ
โ (MCP Client) โ (MCP Client) โ (MCP Client) โ (Webhook)โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโ
โ โ โ โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INTERFACE LAYER โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ MCP Server โ FastAPI Server โ
โ โข Tool Handlers โ โข REST Endpoints โ
โ โข Protocol Management โ โข WebSocket Support โ
โ โข Agent Isolation โ โข OpenAPI Documentation โ
โ โข Session Management โ โข Rate Limiting โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CORE ENGINE โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโค
โ Memory โ Document โ Hallucinationโ Graph โ Learn โ
โ Manager โ Processor โ Detector โ Engine โ Engine โ
โ โข Storage โ โข 9 Formats โ โข Confidence โ โข Neo4j โ โข A/B โ
โ โข Retrieval โ โข Chunking โ โข Grounding โ โข Graphiti โ โข Opt โ
โ โข Caching โ โข LLM Enh โ โข Evidence โ โข Temporal โ โข Ins โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโ
โ โ โ โ โ
โโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROVIDER LAYER โ
โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโค
โEmbedding โ Vector โ Graph โReranker โ Cache โ File โ
โProviders โ Store โ Database โ Provider โ Provider โ Loaders โ
โโข HF E5 โโข PG Vec โโข Neo4jโโข Cross-E โโข Redis โโข PDF โ
โโข MiniLM โโข HNSW โโข Cypher โโข vLLM โโข Memory โโข DOCX โ
โโข OpenAI โโข Hybrid โโข Graphitiโโข Custom โโข L1/L2/L3โโข 7 More โ
โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโ
โ โ โ โ โ โ
โโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโ
โ โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DATA LAYER โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโค
โ PostgreSQL โ Neo4j โ Redis โ Logs โ
โ + pgvector โ + Graphiti โ Multi-Layer โ + Traces โ
โ โข Vector Store โ โข Knowledge โ โข Performance โ โข Metricsโ
โ โข Metadata โ โข Temporal โ โข Session โ โข Events โ
โ โข HNSW Index โ โข Relationships โ โข Embedding โ โข Audit โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโ
Core Processing Pipelines
1. Memory Storage Pipeline
Input Content โ Format Detection โ Document Loading โ Chunking Strategy Selection
โ โ โ โ
Text Extraction โ LLM Enhancement โ Entity Extraction โ Relationship Mapping
โ โ โ โ
Embedding Generation โ Vector Storage โ Graph Storage โ Cache Update
โ โ โ โ
Metadata Recording โ Performance Metrics โ Success Response
2. Memory Search Pipeline
Search Query โ Query Enhancement โ Multi-Strategy Retrieval
โ โ โ
Vector Search + Keyword Search + Graph Traversal
โ โ โ
Result Fusion โ Advanced Reranking โ Confidence Scoring
โ โ โ
Hallucination Detection โ Evidence Collection โ Response Formatting
3. Document Ingestion Pipeline
Document Input โ Format Detection โ Specialized Loader Selection
โ โ โ
Content Extraction โ Chunking Strategy โ LLM Context Enhancement
โ โ โ
Batch Processing โ Memory Integration โ Progress Tracking
4. Self-Learning Pipeline
Performance Data โ Pattern Recognition โ Experiment Design
โ โ โ
A/B Testing โ Statistical Analysis โ Configuration Updates
โ โ โ
Rollback Protection โ Success Validation โ Learning Storage
Technology Stack
Core Technologies
- Python 3.8+: Primary development language
- FastMCP: Model Context Protocol implementation
- FastAPI: REST API framework with automatic documentation
- Pydantic: Data validation and settings management
- asyncio: Asynchronous programming for high performance
Databases & Storage
- PostgreSQL 14+: Primary data store with JSON support
- pgvector: Vector similarity search with HNSW indexing
- Neo4j: High-performance graph database
- Redis: Multi-layer caching and session storage
Machine Learning & NLP
- HuggingFace Transformers: Embedding models (e5-large-v2, MiniLM)
- Sentence Transformers: Optimized embedding inference
- spaCy: Named entity recognition and text processing
- NLTK: Natural language processing utilities
Observability & Monitoring
- OpenTelemetry: Distributed tracing and metrics
- Prometheus: Metrics collection and alerting
- Grafana: Performance dashboards and visualization
- Jaeger: Distributed tracing visualization
Development & Deployment
- Docker: Containerization with multi-stage builds
- Docker Compose: Local development environment
- pytest: Comprehensive testing framework
- Black/isort/flake8: Code formatting and linting
Performance Characteristics
Latency Targets (P95)
- Memory Storage: <100ms per operation
- Vector Search: <50ms for top-10 results
- Hybrid Search: <150ms with reranking
- Document Ingestion: <2s per PDF page
- Graph Queries: <30ms for simple traversals
- Hallucination Detection: <200ms per analysis
Throughput Capabilities
- Concurrent Users: 50-100 depending on hardware
- Document Processing: 10-20 documents/minute
- Memory Operations: 1000+ operations/minute
- Cache Hit Rate: >85% for frequently accessed data
Scalability Metrics
- Memory Capacity: Unlimited (PostgreSQL-based)
- Graph Complexity: Millions of entities/relationships
- Vector Dimensions: 384-1024 (configurable)
- Cache Size: 2-8GB recommended for optimal performance
โ๏ธ Configuration
Environment Variables
# Core Configuration
TYRA_ENV=development|production
TYRA_LOG_LEVEL=DEBUG|INFO|WARNING|ERROR
# Database Configuration
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=tyra_memory
POSTGRES_USER=tyra
POSTGRES_PASSWORD=secure_password
REDIS_HOST=localhost
REDIS_PORT=6379
NEO4J_HOST=localhost
NEO4J_PORT=7687
# Optional: OpenAI for fallback embeddings
OPENAI_API_KEY=sk-...
Configuration Files
The system uses a layered configuration approach with multiple YAML files:
Main Configuration (config/config.yaml
)
# Core application settings
app:
name: "Tyra MCP Memory Server"
version: "1.0.0"
environment: ${TYRA_ENV:-development}
# Memory system configuration
memory:
backend: "postgres"
vector_dimensions: 1024
chunk_size: 512
chunk_overlap: 50
max_memories_per_agent: 1000000
# API server settings
api:
host: ${API_HOST:-0.0.0.0}
port: ${API_PORT:-8000}
enable_docs: ${API_ENABLE_DOCS:-true}
cors_origins: ["*"]
rate_limit: 1000 # requests per minute
Provider Configuration (config/providers.yaml
)
# Embedding providers
embeddings:
primary: "huggingface"
fallback: "huggingface_light"
providers:
huggingface:
model: "intfloat/e5-large-v2"
device: "auto"
batch_size: 32
RAG Configuration (config/rag.yaml
)
# Retrieval and reranking settings
retrieval:
hybrid_weight: 0.7 # vector vs keyword
max_results: 20
diversity_penalty: 0.1
reranking:
enabled: true
provider: "cross_encoder"
top_k: 10
hallucination:
enabled: true
threshold: 75
require_evidence: true
Document Ingestion (config/ingestion.yaml
)
# File processing settings
ingestion:
max_file_size: 104857600 # 100MB
max_batch_size: 100
concurrent_limit: 20
supported_formats: ["pdf", "docx", "pptx", "txt", "md", "html", "json", "csv", "epub"]
chunking:
default_strategy: "auto"
strategies:
auto:
file_type_mapping:
pdf: "semantic"
docx: "paragraph"
pptx: "slide"
Self-Learning Configuration (config/self_learning.yaml
)
# Adaptive learning settings
self_learning:
enabled: true
analysis_interval: "1h"
improvement_interval: "24h"
auto_optimize: true
quality_thresholds:
memory_accuracy: 0.85
performance_degradation: 0.1
hallucination_rate: 0.05
Observability Configuration (config/observability.yaml
)
# OpenTelemetry and monitoring
otel:
enabled: true
service_name: "tyra-mcp-memory-server"
tracing:
enabled: true
exporter: "console" # or "jaeger", "otlp"
sampler: "parentbased_traceidratio"
sampler_arg: 1.0
metrics:
enabled: true
export_interval: 60000 # 60 seconds
logging:
level: ${LOG_LEVEL:-INFO}
format: "json"
rotation:
max_size: "100MB"
max_files: 10
Environment Variables Reference
Core Settings
# Application Environment
TYRA_ENV=development|production
LOG_LEVEL=DEBUG|INFO|WARNING|ERROR
TYRA_DEBUG=true|false
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_ENABLE_DOCS=true
API_RATE_LIMIT=1000
# Database URLs
DATABASE_URL=postgresql://user:pass@localhost:5432/tyra_memory
REDIS_URL=redis://localhost:6379/0
NEO4J_URL=neo4j://localhost:7687
Model Configuration
# Embedding Models
EMBEDDINGS_PRIMARY_MODEL=intfloat/e5-large-v2
EMBEDDINGS_FALLBACK_MODEL=sentence-transformers/all-MiniLM-L12-v2
EMBEDDINGS_DEVICE=auto|cpu|cuda
# Model Caching
MODEL_CACHE_DIR=/models/cache
EMBEDDING_CACHE_TTL=86400 # 24 hours
Performance Tuning
# Memory Management
MEMORY_MAX_CHUNK_SIZE=2048
MEMORY_CHUNK_OVERLAP=100
MEMORY_BATCH_SIZE=50
# Caching Configuration
CACHE_TTL_EMBEDDINGS=86400
CACHE_TTL_SEARCH=3600
CACHE_TTL_RERANK=1800
CACHE_MAX_SIZE=2GB
# Database Pools
POSTGRES_POOL_SIZE=20
REDIS_POOL_SIZE=50
NEO4J_POOL_SIZE=10
Security Settings
# Authentication (Optional)
API_KEY_ENABLED=false
API_KEY=your-secure-api-key
JWT_SECRET=your-jwt-secret
# Rate Limiting
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_WINDOW=60
RATE_LIMIT_BURST=50
# CORS Configuration
CORS_ORIGINS=*
CORS_METHODS=GET,POST,PUT,DELETE
CORS_HEADERS=*
Document Ingestion
# File Processing
INGESTION_MAX_FILE_SIZE=104857600
INGESTION_MAX_BATCH_SIZE=100
INGESTION_CONCURRENT_LIMIT=20
INGESTION_TIMEOUT=300
# LLM Enhancement
INGESTION_LLM_ENHANCEMENT=true
INGESTION_LLM_MODE=rule_based
VLLM_ENDPOINT=http://localhost:8000/v1
VLLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
Observability
# OpenTelemetry
OTEL_ENABLED=true
OTEL_SERVICE_NAME=tyra-mcp-memory-server
OTEL_TRACES_ENABLED=true
OTEL_METRICS_ENABLED=true
OTEL_LOGS_ENABLED=true
# Exporters
OTEL_TRACES_EXPORTER=console
OTEL_METRICS_EXPORTER=console
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
๐ Monitoring & Debugging
Health Checks
# Check system health
curl -X POST http://localhost:8000/tools/health_check \
-d '{"detailed": true}'
Performance Analytics
# Get performance summary
curl -X POST http://localhost:8000/tools/get_memory_stats \
-d '{"include_performance": true}'
Logs
# View real-time logs
tail -f logs/tyra-memory.log
# Search for specific events
grep "hallucination" logs/tyra-memory.log
grep "ERROR" logs/tyra-memory.log
๐งช Development
Testing
# Run unit tests
python -m pytest tests/unit/
# Run integration tests
python -m pytest tests/integration/
# Run end-to-end tests
python -m pytest tests/e2e/
Development Mode
# Enable hot reload and debug features
export TYRA_ENV=development
export TYRA_DEBUG=true
export TYRA_HOT_RELOAD=true
python main.py
Model Development
# Download and test models
python scripts/download_models.py
# Benchmark different models
python scripts/benchmark_models.py
๐ Production Deployment
Docker Deployment
Quick Production Start
# Clone repository
git clone https://github.com/your-org/tyra-mcp-memory-server.git
cd tyra-mcp-memory-server
# Copy and configure environment
cp .env.example .env
# Edit .env with your settings
# Build and start all services
docker-compose -f docker-compose.prod.yml up -d
# Verify deployment
docker-compose -f docker-compose.prod.yml ps
curl http://localhost:8000/health
Multi-Stage Docker Build
# Build optimized production image
docker build -t tyra-memory-server:latest \
--target production \
--build-arg ENVIRONMENT=production .
# Run with custom configuration
docker run -d \
--name tyra-memory \
-p 8000:8000 \
-v $(pwd)/config:/app/config:ro \
-v $(pwd)/data:/app/data \
--env-file .env \
tyra-memory-server:latest
Container Orchestration
# kubernetes deployment example
apiVersion: apps/v1
kind: Deployment
metadata:
name: tyra-memory-server
spec:
replicas: 3
selector:
matchLabels:
app: tyra-memory-server
template:
metadata:
labels:
app: tyra-memory-server
spec:
containers:
- name: tyra-memory
image: tyra-memory-server:latest
ports:
- containerPort: 8000
env:
- name: TYRA_ENV
value: "production"
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
Systemd Service Deployment
Service Installation
# Install system service
sudo cp scripts/tyra-memory.service /etc/systemd/system/
sudo systemctl daemon-reload
# Enable auto-start on boot
sudo systemctl enable tyra-memory
# Start service
sudo systemctl start tyra-memory
# Check status and logs
sudo systemctl status tyra-memory
sudo journalctl -u tyra-memory -f
Service Configuration (scripts/tyra-memory.service
)
[Unit]
Description=Tyra MCP Memory Server
After=network.target postgresql.service redis.service
Requires=postgresql.service redis.service
[Service]
Type=simple
User=tyra
Group=tyra
WorkingDirectory=/opt/tyra-memory-server
ExecStart=/opt/tyra-memory-server/venv/bin/python main.py
Restart=always
RestartSec=10
Environment=TYRA_ENV=production
Environment=LOG_LEVEL=INFO
# Security settings
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/tyra-memory-server/data /opt/tyra-memory-server/logs
[Install]
WantedBy=multi-user.target
Performance Tuning
Hardware Requirements
# Minimum Requirements
CPU: 4 cores (8 threads recommended)
RAM: 8GB (16GB recommended)
Storage: 100GB SSD (fast I/O critical)
Network: 1Gbps (for high-throughput scenarios)
# Optimal Configuration
CPU: 8+ cores with AVX2 support
RAM: 32GB+ (for large models and caching)
GPU: NVIDIA GPU with 8GB+ VRAM (optional, for GPU acceleration)
Storage: NVMe SSD with 1000+ IOPS
Database Optimization
-- PostgreSQL performance tuning
-- Add to postgresql.conf
# Memory settings
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 256MB
maintenance_work_mem = 1GB
# Vector-specific settings
max_connections = 200
shared_preload_libraries = 'vector'
# Performance settings
random_page_cost = 1.1
checkpoint_completion_target = 0.9
wal_buffers = 64MB
Redis Configuration
# Redis optimization (redis.conf)
maxmemory 4gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
# Persistence settings
appendonly yes
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
Application Tuning
# config/config.yaml - Production optimizations
performance:
# Database connection pools
postgres_pool_size: 20
redis_pool_size: 50
neo4j_pool_size: 10
# Memory management
max_chunk_size: 2048
batch_size: 50
# Caching optimization
cache_sizes:
embeddings: "2GB"
search_results: "1GB"
rerank_cache: "512MB"
# Async processing
max_concurrent_requests: 100
request_timeout: 30
# GPU optimization (if available)
gpu_enabled: true
gpu_memory_fraction: 0.8
cuda_device: 0
Monitoring & Observability
Prometheus Configuration
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'tyra-memory-server'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'
scrape_interval: 30s
Grafana Dashboard
{
"dashboard": {
"title": "Tyra Memory Server",
"panels": [
{
"title": "Request Latency",
"type": "graph",
"targets": [{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
}]
},
{
"title": "Memory Usage",
"type": "graph",
"targets": [{
"expr": "process_resident_memory_bytes"
}]
}
]
}
}
Load Balancing & High Availability
NGINX Configuration
upstream tyra_memory_servers {
least_conn;
server 127.0.0.1:8001 weight=1 max_fails=3 fail_timeout=30s;
server 127.0.0.1:8002 weight=1 max_fails=3 fail_timeout=30s;
server 127.0.0.1:8003 weight=1 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
server_name memory.tyra-ai.com;
location / {
proxy_pass http://tyra_memory_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
location /health {
proxy_pass http://tyra_memory_servers/health;
access_log off;
}
}
๐ผ Use Cases & Applications
๐ค AI Agent Memory Enhancement
- Personal Assistants: Long-term conversation memory and context retention
- Customer Support: Historical interaction tracking and personalized responses
- Research Assistants: Literature review, citation tracking, and knowledge synthesis
- Educational Tutors: Student progress tracking and adaptive learning paths
๐ Enterprise Knowledge Management
- Document Processing: Automated ingestion of company documents, policies, and procedures
- Institutional Memory: Preserve and access organizational knowledge across teams
- Compliance Tracking: Audit trails and regulatory requirement monitoring
- Decision Support: Historical data analysis and trend identification
๐ฆ Financial Services Applications
- Trading Support: Market analysis with confidence-scored recommendations (95% threshold)
- Risk Assessment: Historical pattern analysis and risk factor identification
- Client Profiling: Investment preferences and trading behavior analysis
- Regulatory Compliance: Transaction monitoring and compliance reporting
๐ฌ Research & Analytics
- Scientific Research: Literature mining and hypothesis generation
- Market Research: Consumer behavior analysis and trend prediction
- Competitive Intelligence: Industry analysis and competitor monitoring
- Data Mining: Pattern discovery in large document collections
๐ Multi-Agent Orchestration
- Agent Coordination: Shared knowledge base for multiple AI agents
- Workflow Automation: n8n integration for document processing pipelines
- Cross-Platform Integration: Unified memory across different AI systems
- Session Management: Context preservation across agent interactions
๐ Secure Deployment Scenarios
- Air-Gapped Networks: Completely offline operation with local models
- HIPAA Compliance: Healthcare data processing with audit trails
- Financial Regulations: Trading compliance with mandatory confidence thresholds
- Government Applications: Classified information processing with security controls
๐ง Integration Examples
Claude Desktop Integration
{
"mcpServers": {
"tyra-memory": {
"command": "python",
"args": ["/path/to/tyra-mcp-memory-server/main.py"],
"env": {
"TYRA_ENV": "production",
"DATABASE_URL": "postgresql://user:pass@localhost:5432/tyra",
"LOG_LEVEL": "INFO"
}
}
}
}
n8n Workflow Integration
{
"nodes": [
{
"name": "Document Upload",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"url": "http://localhost:8000/v1/ingest/document",
"method": "POST",
"body": {
"source_type": "base64",
"file_name": "{{ $json.filename }}",
"file_type": "pdf",
"content": "{{ $json.base64_content }}"
}
}
},
{
"name": "Memory Search",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"url": "http://localhost:8000/v1/memory/search",
"method": "POST",
"body": {
"query": "{{ $json.search_query }}",
"agent_id": "n8n_workflow",
"include_analysis": true,
"min_confidence": 0.8
}
}
}
]
}
Python Client Usage
from tyra_memory_client import MemoryClient
import asyncio
async def main():
# Initialize client
client = MemoryClient(
base_url="http://localhost:8000",
api_key="your-api-key" # if authentication enabled
)
# Store a memory
result = await client.store_memory(
content="User prefers technical analysis for stock trading",
agent_id="trading_bot",
extract_entities=True,
metadata={"category": "trading", "confidence": 95}
)
print(f"Stored memory: {result.memory_id}")
# Search memories
results = await client.search_memories(
query="trading preferences",
agent_id="trading_bot",
min_confidence=0.8,
include_analysis=True
)
for memory in results.memories:
print(f"Memory: {memory.content}")
print(f"Confidence: {memory.confidence_score}")
print(f"Grounding: {memory.grounding_score}")
# Ingest document
with open("trading_strategy.pdf", "rb") as f:
doc_result = await client.ingest_document(
file_content=f.read(),
file_name="trading_strategy.pdf",
file_type="pdf",
chunking_strategy="semantic",
enable_llm_context=True
)
print(f"Ingested {doc_result.chunks_ingested} chunks")
if __name__ == "__main__":
asyncio.run(main())
๐ Performance Benchmarks
Current Performance (Local Setup)
- Memory Storage: ~100ms per document
- Vector Search: ~50ms for top-10 results
- Hybrid Search: ~150ms with reranking
- Hallucination Analysis: ~200ms per response
- Memory Usage: ~500MB base + ~2GB for models
Enhanced Performance Targets (Coming Soon)
- Memory Synthesis: <200ms for intelligent deduplication
- Predictive Preloading: 40%+ latency reduction through ML prediction
- Multi-Modal Processing: <500ms for image/video analysis
- Real-Time Streams: <50ms event propagation latency
- Auto-Optimization: 25%+ performance improvement through self-learning
Scalability
- Current: 10-50 concurrent requests
- Enhanced: 100+ concurrent with auto-scaling (Phase 3)
- Memory Capacity: Unlimited (PostgreSQL-based)
- Graph Complexity: Optimized for millions of entities/relationships
- Future: Federated networks for distributed deployment (Phase 4)
๐ค Contributing
Development Setup
# Setup development environment
./scripts/setup.sh --env development
# Install development dependencies
pip install -r requirements-dev.txt
# Setup pre-commit hooks
pre-commit install
Code Standards
- Type Hints: All functions must have type annotations
- Documentation: Docstrings for all public methods
- Testing: Minimum 80% code coverage
- Formatting: Black + isort + flake8
๐ License
MIT License - see file for details.
๐ Support
Documentation
- - Complete enhancement plan
- - Implementation standards
- - Modular component architecture
Community
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Tyra AI Community
Commercial Support
For enterprise support, custom integrations, and professional services, contact:
๐ฏ Implementation Roadmap
โ Current: Production Foundation
- โ Advanced memory system with PostgreSQL + pgvector
- โ Neo4j temporal knowledge graphs with Graphiti
- โ Basic hallucination detection and confidence scoring
- โ Multi-format document ingestion (9 file types)
- โ Cross-encoder reranking and hybrid search
- โ Redis multi-layer caching and performance optimization
๐ Phase 1: AI Enhancement Implementation (Ready)
- ๐๏ธ Pydantic AI integration for structured outputs (dependency included, awaiting implementation)
- ๐๏ธ Multi-layer hallucination detection enhancement (beyond current grounding-based system)
- ๐๏ธ Advanced memory synthesis and deduplication
- ๐๏ธ Real-time streaming capabilities (WebSocket infrastructure)
๐ Phase 2: Intelligence Amplification (Planned)
- ๐ Predictive memory management and preloading
- ๐ Context-aware embeddings with fine-tuning
- ๐ Multi-modal support (images, audio, video)
- ๐ Advanced graph reasoning and causal inference
๐ฏ Phase 3: Operational Excellence (Future)
- ๐ Complete self-optimization engine
- ๐ Enhanced security and privacy features
- ๐ Advanced observability and analytics
- ๐ Federated memory networks
Current system is production-ready. Enhancements use included dependencies and maintain 100% backward compatibility.
Built with โค๏ธ by the Tyra AI Team
Transforming AI agents with genius-tier memory and intelligence capabilities