mcp-memory-gpu by fvegiard - MCP Server

MCP Memory GPU

MCP Server providing semantic memory with FAISS + SQLite hybrid storage and optional GPU acceleration.

Features

Semantic search via FAISS vector index
Persistent storage via SQLite
GPU bridge support for remote GPU computation
Ollama embeddings (nomic-embed-text by default)
Fallback to hash-based embeddings if Ollama unavailable

Installation

# From PyPI (when published)
pip install mcp-memory-gpu

# From GitHub
pip install git+https://github.com/YOUR_USERNAME/mcp-memory-gpu.git

# With GPU support
pip install mcp-memory-gpu[gpu]

Configuration

Claude Desktop

Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):

{
  "mcpServers": {
    "memory": {
      "command": "mcp-memory-gpu",
      "env": {
        "MCP_EMBEDDING_URL": "http://localhost:11434",
        "MCP_EMBEDDING_MODEL": "nomic-embed-text",
        "MCP_GPU_BRIDGE": "http://your-gpu-server:5000",
        "MCP_GPU_TOKEN": "your-secret-token"
      }
    }
  }
}

Environment Variables

Variable	Default	Description
`MCP_MEMORY_DB`	`~/.mcp-memory/memory.db`	SQLite database path
`MCP_MEMORY_INDEX`	`~/.mcp-memory/memory.faiss`	FAISS index path
`MCP_EMBEDDING_URL`	`http://localhost:11434`	Ollama API URL
`MCP_EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model
`MCP_EMBEDDING_DIM`	`768`	Embedding dimension
`MCP_GPU_BRIDGE`	(none)	GPU bridge URL for remote computation
`MCP_GPU_TOKEN`	(none)	Bearer token for GPU bridge auth

Tools

memory_store

Store information with category/key organization.

{"category": "config", "key": "api_url", "value": "https://api.example.com"}

memory_search

Semantic search across all memories.

{"query": "how to connect to the API", "limit": 5}

memory_get

Get specific memory by category and key.

{"category": "config", "key": "api_url"}

memory_delete

Delete a memory entry.

{"category": "config", "key": "api_url"}

memory_list

List all categories or items in a category.

{"category": "config"}

memory_stats

Get memory statistics.

GPU Bridge Setup

For GPU-accelerated embeddings, run the bridge server on your GPU machine:

# bridge/server.py on GPU machine
from flask import Flask, request, jsonify
import torch
from sentence_transformers import SentenceTransformer

app = Flask(__name__)
model = SentenceTransformer('nomic-ai/nomic-embed-text-v1', device='cuda')
AUTH_TOKEN = 'your-secret-token'

@app.route('/embedding', methods=['POST'])
def embedding():
    auth = request.headers.get('Authorization', '')
    if auth != f'Bearer {AUTH_TOKEN}':
        return jsonify({'error': 'unauthorized'}), 401
    text = request.json.get('text', '')
    vec = model.encode(text).tolist()
    return jsonify({'embedding': vec})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Architecture

Windows/macOS (CPU)          GPU Server (Pop-OS, etc.)
┌─────────────────┐          ┌─────────────────────┐
│  Claude Code    │          │  GPU Bridge         │
│  MCP Server     │◄────────►│  FAISS GPU          │
│  SQLite         │  HTTP    │  Sentence Transform │
└─────────────────┘          └─────────────────────┘

License

MIT