zhifengzhang-sz/qi-v2-knowledge
If you are the rightful owner of qi-v2-knowledge and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Qi Knowledge MCP Server is a high-performance knowledge management system designed to integrate seamlessly with AI applications through the Model Context Protocol.
Qi Knowledge MCP Server
A modern, high-performance knowledge management system built with the latest 2025 JavaScript tooling stack. This implementation provides universal knowledge storage, semantic search, web crawling, and Model Context Protocol (MCP) integration.
๐ Features
- Universal Knowledge Storage: ChromaDB-powered vector database with semantic search
- Advanced Document Processing: LangChain integration with intelligent chunking and metadata extraction
- Modern Web Crawling: Puppeteer-based crawler with anti-detection and performance optimization
- Context Management: Intelligent context assembly and optimization for AI applications
- MCP Integration: Full Model Context Protocol support for Claude Code CLI and other clients
- Modern Tooling: Built with Bun, Biome, and Vitest for optimal developer experience
๐๏ธ Architecture
qi-v2-knowledge/
โโโ lib/ # Core @qi/knowledge library
โ โโโ src/
โ โ โโโ types/ # TypeScript interfaces and types
โ โ โโโ storage/ # ChromaDB integration layer
โ โ โโโ processing/ # Document processing & web crawling
โ โ โโโ context/ # Context management
โ โ โโโ mcp/ # MCP server implementation (TODO)
โ โโโ package.json # Library dependencies
โโโ app/ # Application layer
โ โโโ src/
โ โ โโโ cli/ # Command-line interface
โ โ โโโ server/ # Server entry point
โ โ โโโ config/ # Configuration management
โ โโโ package.json # App dependencies with @qi/knowledge alias
โโโ config/ # Configuration files
โโโ knowledge-config.yaml
๐ ๏ธ Technology Stack
- Runtime: Bun - Ultra-fast JavaScript runtime (4x faster startup)
- Linting/Formatting: Biome - 15x faster than ESLint, type-aware linting
- Testing: Vitest - 2-5x faster than Jest, native ESM support
- Vector Database: ChromaDB - High-performance vector storage
- Document Processing: LangChain - Advanced text processing and chunking
- Web Crawling: Puppeteer - Modern web scraping with JavaScript support
- Protocol: Model Context Protocol - AI integration standard
๐ Quick Start
Prerequisites
-
Bun Runtime (recommended):
curl -fsSL https://bun.sh/install | bash
-
ChromaDB (Docker):
docker run -d --name chromadb -p 8000:8000 chromadb/chroma:latest
-
Ollama (for embeddings):
curl -fsSL https://ollama.ai/install.sh | sh ollama pull nomic-embed-text
Installation
-
Install dependencies:
# Library dependencies cd lib && bun install # Application dependencies cd ../app && bun install
-
Build the library:
cd lib && bun run build
-
Start the knowledge server:
cd app && bun run cli serve
๐ Usage
Command Line Interface
The CLI provides several commands for managing your knowledge base:
# Initialize a new knowledge base
bun run cli init --name my-knowledge-base
# Process a document
bun run cli process ./docs/README.md --title "Project Documentation"
# Search the knowledge base
bun run cli search "how to use vector databases"
# Crawl a website
bun run cli crawl https://docs.example.com --max-depth 3
# View statistics
bun run cli stats
# Start the MCP server
bun run cli serve --port 3000
Library Usage
import {
ChromaDBKnowledgeStore,
EnhancedDocumentProcessor,
PuppeteerWebCrawler,
ContextManager,
} from '@qi/knowledge';
// Initialize components
const knowledgeStore = new ChromaDBKnowledgeStore({
collectionName: 'my-knowledge-base',
url: 'http://localhost:8000',
embeddingModel: 'nomic-embed-text',
});
const documentProcessor = new EnhancedDocumentProcessor({
chunkSize: 1000,
chunkOverlap: 200,
enableEnrichment: true,
topicExtraction: true,
});
const webCrawler = new PuppeteerWebCrawler({
headless: true,
defaultTimeout: 30000,
});
const contextManager = new ContextManager(knowledgeStore);
// Initialize all components
await knowledgeStore.initialize();
await webCrawler.initialize();
// Process and store a document
const source = {
id: 'example-docs',
name: 'Example Documentation',
type: 'documentation_site',
metadata: { priority: 10, trustScore: 0.9, isActive: true },
};
await knowledgeStore.addSource(source);
const document = await documentProcessor.processDocument(content, source);
await knowledgeStore.storeDocument(document);
// Search for relevant information
const results = await knowledgeStore.search({
query: 'vector databases semantic search',
options: { maxResults: 5 },
});
// Assemble intelligent context
const context = await contextManager.assembleContext(
'How do I implement RAG with ChromaDB?',
{ maxTokens: 4000 }
);
Claude Code CLI Integration
Add to your ~/.config/claude-code/config.json
:
{
"mcpServers": {
"knowledge-base": {
"command": "node",
"args": ["/path/to/qi-v2-knowledge/app/dist/server/index.js"],
"env": {
"CHROMADB_URL": "http://localhost:8000",
"LOG_LEVEL": "info"
}
}
}
}
๐งช Development
Running Tests
# Run tests in the library
cd lib && bun test
# Run tests with UI
cd lib && bun run test:ui
# Run tests with coverage
cd lib && bun run test:coverage
Code Quality
# Lint and format code
cd lib && bun run lint:fix
cd lib && bun run format
# Type checking
cd lib && bun run build
Development Server
# Start development server with hot reload
cd app && bun run dev
๐ Performance
Based on 2025 benchmarks, this implementation provides:
- 4x faster startup compared to Node.js (thanks to Bun)
- 15x faster linting compared to ESLint (thanks to Biome)
- 2-5x faster testing compared to Jest (thanks to Vitest)
- Advanced document processing with LangChain's RecursiveCharacterTextSplitter
- Modern web crawling with Puppeteer's latest anti-detection features
- Efficient vector storage with ChromaDB's optimized similarity search
๐ง Configuration
The system is configured via config/knowledge-config.yaml
:
server:
name: "qi-knowledge-server"
version: "1.0.0"
port: 3000
chromadb:
url: "http://localhost:8000"
collectionName: "qi-knowledge-base"
embeddingModel: "nomic-embed-text"
processing:
chunkSize: 1000
chunkOverlap: 200
enableEnrichment: true
topicExtraction: true
crawling:
maxDepth: 3
maxPages: 100
respectRobotsTxt: true
Environment variables can override configuration:
CHROMADB_URL
- ChromaDB connection URLOLLAMA_URL
- Ollama server URLLOG_LEVEL
- Logging level (debug, info, warn, error)
๐ณ Docker Deployment
FROM oven/bun:1 as base
WORKDIR /app
# Copy and install dependencies
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile
# Copy source and build
COPY . .
RUN bun run build
# Start the server
CMD ["bun", "start"]
# docker-compose.yml
version: '3.8'
services:
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- chromadb_data:/chroma/chroma
knowledge-server:
build: .
depends_on:
- chromadb
environment:
- CHROMADB_URL=http://chromadb:8000
ports:
- "3000:3000"
volumes:
chromadb_data:
๐ค Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes using the modern tooling:
bun run lint:fix # Format and lint bun test # Run tests
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
๐ License
This project is licensed under the MIT License - see the file for details.
๐ Acknowledgments
- ChromaDB team for the excellent vector database
- LangChain team for comprehensive document processing tools
- Anthropic for the Model Context Protocol specification
- Oven team for the amazing Bun runtime
- Biome team for next-generation linting and formatting
- Vitest team for the fast and modern testing framework
Built with โค๏ธ using 2025's best JavaScript tooling