vespo92/chromadblocal-mcp-server
If you are the rightful owner of chromadblocal-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
ChromaDB MCP Server provides AI assistants with persistent memory using ChromaDB vector storage, enabling the creation of a growing knowledge base with each interaction.
ChromaDB MCP Server 🧠
A Model Context Protocol (MCP) server that gives AI assistants persistent memory through ChromaDB vector storage. Now with EXIF extraction, Watch Folders, and Duplicate Detection - the ultimate tool for creators!
✨ Features
Core
- Persistent AI Memory: Your AI assistant remembers past conversations and solutions
- Vector Search: Find similar code patterns, configurations, and documentation instantly
- Local First: Run everything on your own hardware, no cloud dependencies
🚀 Batch Processing
- Fast Batch Ingest: Process entire directories in seconds (500+ files)
- 77 File Types: Photos, CAD, documents, data files, code
- Quick Load/Unload: Temporary collections for rapid workflows
- Export/Import: Backup and transfer collections as JSON
📸 Photo Features (NEW in v3.0)
- EXIF Extraction: Camera, lens, exposure, GPS location, date taken
- Search by Camera: "Find photos shot with my Canon 5D"
- Search by Location: GPS coordinates embedded and searchable
- Search by Date: "Find photos from vacation 2024"
👁️ Watch Folders (NEW in v3.0)
- Auto-Ingest: Drop files in watched folders, auto-add to ChromaDB
- Hands-Free: Perfect for incoming photo dumps, downloads
- Filter by Type: Watch only for specific file types
🔍 Duplicate Detection (NEW in v3.0)
- Find Duplicates: Hash-based detection across directories
- Reclaim Space: See exactly how much space duplicates waste
- Compare Files: Check if two files are identical
- Perceptual Hashing: Find similar (not just identical) images
🚀 Quick Start
Prerequisites
- Bun (JavaScript runtime)
- Docker (for ChromaDB)
- Claude Desktop (or any MCP client)
Installation
-
Clone the repository
git clone https://github.com//vespo92/chromadblocal-mcp-server.git cd chromadb-mcp-server -
Install dependencies
bun install -
Start ChromaDB
docker run -d \ --name chromadb-local \ -p 8001:8000 \ -v ~/chromadb-data:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ chromadb/chroma:latest -
Initialize collections
bun run setup -
Configure Claude Desktop
Add to
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "chromadb-context": { "command": "bun", "args": ["run", "/path/to/chromadb-mcp-server/index.js"], "env": { "CHROMADB_URL": "http://localhost:8001" } } } } -
Restart Claude Desktop and start building your knowledge base!
💬 Usage Examples
Once configured, interact naturally with your AI:
Store Knowledge
- "Store this Docker configuration in ChromaDB for future reference"
- "Save this React component pattern with tags: hooks, authentication"
- "Remember this solution for GPU passthrough issues"
Retrieve Information
- "Search ChromaDB for Python async examples"
- "Find similar component patterns to this one"
- "What solutions do we have for Docker networking issues?"
Build Context
- "Add this API documentation to the project_docs collection"
- "Store these test patterns for our testing suite"
🚀 Batch File Processing
The killer feature! Process massive amounts of files instantly for AI-powered search and retrieval.
Quick Load Workflow (Fastest)
Perfect for "load, process, discard" workflows:
You: "Quick load my photos from /home/photos/vacation2024"
AI: Creates temp collection, ingests 500 photos in seconds
You: "Find photos with mountains or beaches"
AI: Returns matching photos with metadata
You: "Unload the collection"
AI: Cleans up, frees memory
Supported File Types
| Category | Extensions | Metadata Extracted |
|---|---|---|
| Images | .jpg, .jpeg, .png, .heic, .raw, .cr2, .nef, .arw, .tiff, .gif, .webp | Dimensions, size, format |
| CAD | .stl, .obj, .dxf, .dwg, .step, .iges, .fbx, .blend, .skp, .scad | Vertices, faces, format |
| Documents | .pdf, .txt, .md, .doc, .docx, .rtf | Full text content |
| Data | .json, .yaml, .xml, .csv, .toml, .ini | Parsed content |
| Code | .js, .ts, .py, .go, .rs, .java, .cpp, .c, .php, .rb + 20 more | Full source code |
Batch Processing Examples
"Scan /projects/cad-files to see what's there"
"Batch ingest all STL files from /3d-prints into the 'print_library' collection"
"Quick load my Downloads folder, find anything mentioning 'invoice'"
"Export the photo_archive collection to backup.json"
"Import backup.json into a new collection called 'restored_photos'"
Processing Speed
- Quick Load: ~200 files in 2-3 seconds
- Batch Ingest: ~500 files in 5-10 seconds (with full metadata)
- Concurrent Processing: 10-20 parallel file operations
- No external dependencies: Pure JavaScript/Bun processing
📚 Available Collections
| Collection | Description | Use Case |
|---|---|---|
home_automation | Smart home configs & automations | Home Assistant, IoT scripts |
code_snippets | Reusable code patterns | Functions, hooks, utilities |
configurations | System & app configs | Docker, Kubernetes, services |
troubleshooting | Problem solutions | Fixes, workarounds, debugging |
project_docs | Project documentation | APIs, architecture, guides |
learning_notes | Learning insights | Tutorials, concepts, notes |
🛠️ MCP Tools
search_context
Search for relevant information across collections
Parameters:
- query: Search query
- collection: (optional) Specific collection to search
- limit: (optional) Number of results
store_context
Store new information with metadata
Parameters:
- content: The content to store
- metadata: Tags, categories, descriptions
- collection: Target collection
list_collections
List all available collections and their metadata
find_similar_patterns
Find code patterns similar to provided example
Batch Processing Tools
scan_directory
Preview files in a directory before ingesting
Parameters:
- path: Directory to scan
- categories: Filter by type (images, cad, documents, data, code)
- extensions: Filter by extension (.jpg, .stl, etc.)
- recursive: Include subdirectories (default: true)
batch_ingest
Bulk ingest files into ChromaDB with full metadata
Parameters:
- path: Source directory
- collection: Target collection name
- categories: File types to include
- max_files: Limit number of files
quick_load
🚀 FAST: Rapidly load files for temporary processing
Parameters:
- path: Directory to load
- name: Collection name (auto-generated if omitted)
- categories: File types to include
unload_collection
Delete a collection (cleanup after quick_load)
Parameters:
- collection: Name of collection to delete
export_collection
Export collection to JSON file
Parameters:
- collection: Collection to export
- output_path: File path for JSON output
import_collection
Import collection from JSON file
Parameters:
- input_path: JSON file to import
- collection: Override collection name
- overwrite: Delete existing first (default: false)
get_collection_info
Get detailed stats about a collection
Parameters:
- collection: Collection name
ingest_file
Ingest a single file with metadata extraction
Parameters:
- path: File to ingest
- collection: Target collection
list_file_types
Show all supported file extensions
EXIF & Photo Tools
extract_exif
Extract detailed EXIF metadata from photos
Parameters:
- path: Path to JPEG or TIFF image
Returns: Camera, lens, exposure, GPS, date taken
Watch Folder Tools
watch_folder
Start auto-ingesting new files from a folder
Parameters:
- path: Folder to watch
- collection: Target collection (default: auto_ingest)
- categories: File types to watch
- include_exif: Extract EXIF from photos (default: true)
stop_watch
Stop watching a folder
Parameters:
- path: Folder to stop watching
list_watchers
List all active folder watchers
Duplicate Detection Tools
find_duplicates
Scan directory for duplicate files
Parameters:
- path: Directory to scan
- hash_method: "partial" (fast), "full" (thorough), "perceptual" (images)
- categories: File types to check
Returns: Duplicate groups with wasted space info
compare_files
Check if two files are duplicates
Parameters:
- file1: First file path
- file2: Second file path
find_collection_duplicates
Find duplicate entries in a ChromaDB collection
Parameters:
- collection: Collection name
🔧 Configuration
Environment Variables
CHROMADB_URL=http://localhost:8001 # ChromaDB server URL
Custom Collections
Add new collections in setup-home-collections.js:
await createCollection('ml_experiments', {
description: 'Machine learning experiments and results'
});
📦 Project Structure
chromadb-mcp-server/
├── index.js # MCP server with 22 tools
├── batch-processor.js # Fast batch file processing engine
├── exif-extractor.js # EXIF metadata extraction for photos
├── watch-folder.js # Auto-ingest watch folder system
├── duplicate-detector.js # Duplicate file detection
├── setup-home-collections.js # Collection initialization
├── test-chromadb.js # Connection test script
├── test-mcp.js # MCP functionality test
├── test-batch-processor.js # Batch processing tests
├── HOME-AI-SETUP.md # Detailed setup guide
├── package.json # Project dependencies
└── README.md # This file
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
See for more details.
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- Anthropic for the MCP specification
- Chroma for the excellent vector database
- The open-source community for inspiration and support
🚀 What's Next?
- ✅
Export/import collectionsDONE! - ✅
Batch file processingDONE! - ✅
EXIF metadata extractionDONE in v3.0! - ✅
Watch folders / auto-ingestDONE in v3.0! - ✅
Duplicate detectionDONE in v3.0! - Cloud sync capabilities
- Multi-user support
- Web UI for collection management
- AI-powered image descriptions (what's in the photo)
- 3D print analysis (volume, time estimates)
Built with ❤️ for the Home AI Community