MikeyBeez/mcp-memory-server
If you are the rightful owner of mcp-memory-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
MCP Memory Server is an intelligent middleware server that enhances Ollama with memory and learning capabilities using the MCP pattern.
MCP Memory Server
An intelligent middleware server that adds memory and learning capabilities to Ollama by implementing the MCP (Model Context Protocol) pattern.
Overview
MCP Memory Server acts as a smart proxy between your applications (like ELVIS) and Ollama, automatically enriching prompts with relevant context and learning from every interaction.
Key Features
- Automatic Context Enrichment: Searches Brain memory and adds relevant context to prompts
- Learning from Experience: Tracks what works and improves over time
- Model-Specific Optimization: Learns each model's strengths and best practices
- Similar Task Recognition: Finds and applies lessons from similar past tasks
- Drop-in Ollama Replacement: Compatible with existing Ollama API clients
- MCP Tool Integration: Access to Brain, filesystem, and other MCP tools
Architecture
Your App (ELVIS) ā MCP Memory Server ā Ollama
ā
Brain Memory System
Learning Engine
Context Enricher
Installation
# Clone the repository
git clone [repository-url]
cd mcp-memory-server
# Install dependencies
npm install
# Build the TypeScript code
npm run build
# Copy environment template
cp .env.example .env
# Edit .env with your settings
Configuration
Create a .env
file with:
# Server Configuration
PORT=8090 # Port for Memory Server
OLLAMA_URL=http://localhost:11434 # Ollama API endpoint
MCP_URL=http://localhost:3000 # MCP tools endpoint (optional)
# Memory Configuration
MAX_CONTEXT_TOKENS=2000 # Max tokens to add as context
SIMILARITY_LIMIT=10 # How many similar memories to search
RELEVANCE_THRESHOLD=0.3 # Min relevance score (0-1)
AUTO_ENRICH=true # Enable automatic enrichment
# Brain Integration
BRAIN_ENABLED=true # Enable Brain memory system
BRAIN_DATA_DIR=~/.brain # Brain data directory
# Cache Configuration
CACHE_TTL=3600 # Cache TTL in seconds
CACHE_MAX_SIZE=1000 # Max cache entries
Usage
Starting the Server
# Start the server
npm start
# Or with custom settings
PORT=9000 OLLAMA_URL=http://remote:11434 npm start
Using with ELVIS
Simply point ELVIS to the Memory Server instead of Ollama:
// Before (direct to Ollama)
const elvis = new ELVIS({
ollamaUrl: 'http://localhost:11434'
});
// After (through Memory Server)
const elvis = new ELVIS({
ollamaUrl: 'http://localhost:8090' // Memory Server port
});
API Endpoints
The server provides Ollama-compatible endpoints plus additional memory endpoints:
Ollama-Compatible Endpoints
POST /api/generate
- Generate text (with automatic enrichment)POST /api/generate/stream
- Streaming generationGET /api/tags
- List available models
Memory Management Endpoints
GET /api/memory/stats
- Get memory statisticsPOST /api/memory/search
- Search memoriesGET /api/memory/insights
- Get recent insights
Learning Endpoints
POST /api/learning/feedback
- Provide feedback on responsesGET /api/learning/model-stats/:model
- Get model performance stats
How It Works
1. Context Enrichment
When a request comes in, the server:
- Extracts keywords from the prompt
- Searches Brain for relevant memories
- Finds similar past tasks
- Adds model-specific tips
- Includes recent insights
- Builds an enriched prompt with all context
2. Learning Process
After each response, the server:
- Assesses response quality
- Identifies the approach used
- Extracts insights and patterns
- Stores successful patterns
- Updates model performance metrics
3. Memory Types
The server tracks several types of memory:
- Task Memories: Complete record of past tasks and outcomes
- Model Context: Performance stats and best practices per model
- Insights: Learned patterns and successful approaches
- Domain Knowledge: Subject-specific information
Example Enrichment
Original prompt:
Analyze the performance bottlenecks in our Brain memory system
Enriched prompt (automatically generated):
=== MODEL GUIDANCE ===
You are deepseek-r1, with these strengths:
- deep analysis
- complex reasoning
- step-by-step thinking
Tips for best results:
- Use "think step by step" in prompts
- Excellent for mathematical proofs
=== RELEVANT CONTEXT ===
Context 1 (relevance: 89%):
Previous Brain performance analysis showed query optimization...
Context 2 (relevance: 76%):
Memory indexing strategies that improved recall speed by 40%...
=== SIMILAR PAST TASKS ===
Task 1: Analyze todo-manager performance issues
Approach: Profiling-driven analysis
Quality: 92%
Duration: 38 minutes
Key learning: Identifying hotspots first saved significant time
=== RECENT INSIGHTS ===
1. Using profiler data improves analysis accuracy
2. Database queries are often the bottleneck
3. Caching strategies significantly impact performance
=== CURRENT TASK ===
Analyze the performance bottlenecks in our Brain memory system
Development
Project Structure
src/
āāā types.ts # TypeScript interfaces
āāā server.ts # Express server setup
āāā MemoryManager.ts # Memory storage and retrieval
āāā ContextEnricher.ts # Prompt enrichment logic
āāā LearningEngine.ts # Learning from interactions
āāā clients/
ā āāā BrainClient.ts # Brain memory integration
ā āāā OllamaClient.ts # Ollama API client
āāā index.ts # Entry point
Running Tests
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # With coverage
Building
npm run build # Build TypeScript
npm run dev # Watch mode
Advanced Usage
Custom Memory Sources
You can extend the memory system by implementing custom memory providers:
class CustomMemoryProvider {
async search(query: string): Promise<Memory[]> {
// Your custom search logic
}
}
Model-Specific Configurations
Add model-specific configurations in the code:
modelStats.set('your-model', {
strengths: ['domain expertise'],
avgResponseTime: 20 * 60 * 1000,
tips: ['Works best with examples']
});
Monitoring and Metrics
The server logs all interactions for analysis:
- Request/response times
- Memory hit rates
- Model performance trends
- Quality assessments
Troubleshooting
Common Issues
-
Ollama Connection Failed
- Ensure Ollama is running:
curl http://localhost:11434
- Check OLLAMA_URL in .env
- Ensure Ollama is running:
-
Brain Not Available
- Server works without Brain but with limited memory
- Check MCP_URL configuration
-
High Memory Usage
- Adjust CACHE_MAX_SIZE
- Implement cache cleanup
-
Slow Enrichment
- Reduce SIMILARITY_LIMIT
- Increase RELEVANCE_THRESHOLD
Future Enhancements
- Vector database integration for semantic search
- Web UI for memory management
- Multi-user support with isolated memories
- Plugin system for custom enrichers
- Metrics dashboard
- Memory export/import
- A/B testing for enrichment strategies
Contributing
This is a proof-of-concept for the MCP middleware pattern. Contributions welcome!
Key areas for contribution:
- Better learning algorithms
- More sophisticated context selection
- Additional memory providers
- Performance optimizations
- Testing infrastructure
License
MIT
Built with curiosity by MikeyBeez & Claude š§ āØ