zhifengzhang-sz/qi-v2-knowledge
If you are the rightful owner of qi-v2-knowledge and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Qi Knowledge MCP Server is a high-performance knowledge management system designed to integrate seamlessly with AI applications through the Model Context Protocol.
Qi Knowledge MCP Server
A modern, high-performance knowledge management system built with the latest 2025 JavaScript tooling stack. This implementation provides universal knowledge storage, semantic search, web crawling, and Model Context Protocol (MCP) integration.
🚀 Features
- Universal Knowledge Storage: ChromaDB-powered vector database with semantic search
- Advanced Document Processing: LangChain integration with intelligent chunking and metadata extraction
- Modern Web Crawling: Puppeteer-based crawler with anti-detection and performance optimization
- Context Management: Intelligent context assembly and optimization for AI applications
- MCP Integration: Full Model Context Protocol support for Claude Code CLI and other clients
- Modern Tooling: Built with Bun, Biome, and Vitest for optimal developer experience
🏗️ Architecture
qi-v2-knowledge/
├── lib/ # Core @qi/knowledge library
│ ├── src/
│ │ ├── types/ # TypeScript interfaces and types
│ │ ├── storage/ # ChromaDB integration layer
│ │ ├── processing/ # Document processing & web crawling
│ │ ├── context/ # Context management
│ │ └── mcp/ # MCP server implementation (TODO)
│ └── package.json # Library dependencies
├── app/ # Application layer
│ ├── src/
│ │ ├── cli/ # Command-line interface
│ │ ├── server/ # Server entry point
│ │ └── config/ # Configuration management
│ └── package.json # App dependencies with @qi/knowledge alias
└── config/ # Configuration files
└── knowledge-config.yaml
🛠️ Technology Stack
- Runtime: Bun - Ultra-fast JavaScript runtime (4x faster startup)
- Linting/Formatting: Biome - 15x faster than ESLint, type-aware linting
- Testing: Vitest - 2-5x faster than Jest, native ESM support
- Vector Database: ChromaDB - High-performance vector storage
- Document Processing: LangChain - Advanced text processing and chunking
- Web Crawling: Puppeteer - Modern web scraping with JavaScript support
- Protocol: Model Context Protocol - AI integration standard
🚀 Quick Start
Prerequisites
-
Bun Runtime (recommended):
curl -fsSL https://bun.sh/install | bash -
ChromaDB (Docker):
docker run -d --name chromadb -p 8000:8000 chromadb/chroma:latest -
Ollama (for embeddings):
curl -fsSL https://ollama.ai/install.sh | sh ollama pull nomic-embed-text
Installation
-
Install dependencies:
# Library dependencies cd lib && bun install # Application dependencies cd ../app && bun install -
Build the library:
cd lib && bun run build -
Start the knowledge server:
cd app && bun run cli serve
📖 Usage
Command Line Interface
The CLI provides several commands for managing your knowledge base:
# Initialize a new knowledge base
bun run cli init --name my-knowledge-base
# Process a document
bun run cli process ./docs/README.md --title "Project Documentation"
# Search the knowledge base
bun run cli search "how to use vector databases"
# Crawl a website
bun run cli crawl https://docs.example.com --max-depth 3
# View statistics
bun run cli stats
# Start the MCP server
bun run cli serve --port 3000
Library Usage
import {
ChromaDBKnowledgeStore,
EnhancedDocumentProcessor,
PuppeteerWebCrawler,
ContextManager,
} from '@qi/knowledge';
// Initialize components
const knowledgeStore = new ChromaDBKnowledgeStore({
collectionName: 'my-knowledge-base',
url: 'http://localhost:8000',
embeddingModel: 'nomic-embed-text',
});
const documentProcessor = new EnhancedDocumentProcessor({
chunkSize: 1000,
chunkOverlap: 200,
enableEnrichment: true,
topicExtraction: true,
});
const webCrawler = new PuppeteerWebCrawler({
headless: true,
defaultTimeout: 30000,
});
const contextManager = new ContextManager(knowledgeStore);
// Initialize all components
await knowledgeStore.initialize();
await webCrawler.initialize();
// Process and store a document
const source = {
id: 'example-docs',
name: 'Example Documentation',
type: 'documentation_site',
metadata: { priority: 10, trustScore: 0.9, isActive: true },
};
await knowledgeStore.addSource(source);
const document = await documentProcessor.processDocument(content, source);
await knowledgeStore.storeDocument(document);
// Search for relevant information
const results = await knowledgeStore.search({
query: 'vector databases semantic search',
options: { maxResults: 5 },
});
// Assemble intelligent context
const context = await contextManager.assembleContext(
'How do I implement RAG with ChromaDB?',
{ maxTokens: 4000 }
);
Claude Code CLI Integration
Add to your ~/.config/claude-code/config.json:
{
"mcpServers": {
"knowledge-base": {
"command": "node",
"args": ["/path/to/qi-v2-knowledge/app/dist/server/index.js"],
"env": {
"CHROMADB_URL": "http://localhost:8000",
"LOG_LEVEL": "info"
}
}
}
}
🧪 Development
Running Tests
# Run tests in the library
cd lib && bun test
# Run tests with UI
cd lib && bun run test:ui
# Run tests with coverage
cd lib && bun run test:coverage
Code Quality
# Lint and format code
cd lib && bun run lint:fix
cd lib && bun run format
# Type checking
cd lib && bun run build
Development Server
# Start development server with hot reload
cd app && bun run dev
📊 Performance
Based on 2025 benchmarks, this implementation provides:
- 4x faster startup compared to Node.js (thanks to Bun)
- 15x faster linting compared to ESLint (thanks to Biome)
- 2-5x faster testing compared to Jest (thanks to Vitest)
- Advanced document processing with LangChain's RecursiveCharacterTextSplitter
- Modern web crawling with Puppeteer's latest anti-detection features
- Efficient vector storage with ChromaDB's optimized similarity search
🔧 Configuration
The system is configured via config/knowledge-config.yaml:
server:
name: "qi-knowledge-server"
version: "1.0.0"
port: 3000
chromadb:
url: "http://localhost:8000"
collectionName: "qi-knowledge-base"
embeddingModel: "nomic-embed-text"
processing:
chunkSize: 1000
chunkOverlap: 200
enableEnrichment: true
topicExtraction: true
crawling:
maxDepth: 3
maxPages: 100
respectRobotsTxt: true
Environment variables can override configuration:
CHROMADB_URL- ChromaDB connection URLOLLAMA_URL- Ollama server URLLOG_LEVEL- Logging level (debug, info, warn, error)
🐳 Docker Deployment
FROM oven/bun:1 as base
WORKDIR /app
# Copy and install dependencies
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile
# Copy source and build
COPY . .
RUN bun run build
# Start the server
CMD ["bun", "start"]
# docker-compose.yml
version: '3.8'
services:
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- chromadb_data:/chroma/chroma
knowledge-server:
build: .
depends_on:
- chromadb
environment:
- CHROMADB_URL=http://chromadb:8000
ports:
- "3000:3000"
volumes:
chromadb_data:
🤝 Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes using the modern tooling:
bun run lint:fix # Format and lint bun test # Run tests - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- ChromaDB team for the excellent vector database
- LangChain team for comprehensive document processing tools
- Anthropic for the Model Context Protocol specification
- Oven team for the amazing Bun runtime
- Biome team for next-generation linting and formatting
- Vitest team for the fast and modern testing framework
Built with ❤️ using 2025's best JavaScript tooling