qi-v2-knowledge by zhifengzhang-sz - MCP Server

Qi Knowledge MCP Server

A modern, high-performance knowledge management system built with the latest 2025 JavaScript tooling stack. This implementation provides universal knowledge storage, semantic search, web crawling, and Model Context Protocol (MCP) integration.

🚀 Features

Universal Knowledge Storage: ChromaDB-powered vector database with semantic search
Advanced Document Processing: LangChain integration with intelligent chunking and metadata extraction
Modern Web Crawling: Puppeteer-based crawler with anti-detection and performance optimization
Context Management: Intelligent context assembly and optimization for AI applications
MCP Integration: Full Model Context Protocol support for Claude Code CLI and other clients
Modern Tooling: Built with Bun, Biome, and Vitest for optimal developer experience

🏗️ Architecture

qi-v2-knowledge/
├── lib/                    # Core @qi/knowledge library
│   ├── src/
│   │   ├── types/         # TypeScript interfaces and types
│   │   ├── storage/       # ChromaDB integration layer
│   │   ├── processing/    # Document processing & web crawling
│   │   ├── context/       # Context management
│   │   └── mcp/          # MCP server implementation (TODO)
│   └── package.json       # Library dependencies
├── app/                   # Application layer
│   ├── src/
│   │   ├── cli/          # Command-line interface
│   │   ├── server/       # Server entry point
│   │   └── config/       # Configuration management
│   └── package.json      # App dependencies with @qi/knowledge alias
└── config/               # Configuration files
    └── knowledge-config.yaml

🛠️ Technology Stack

Runtime: Bun - Ultra-fast JavaScript runtime (4x faster startup)
Linting/Formatting: Biome - 15x faster than ESLint, type-aware linting
Testing: Vitest - 2-5x faster than Jest, native ESM support
Vector Database: ChromaDB - High-performance vector storage
Document Processing: LangChain - Advanced text processing and chunking
Web Crawling: Puppeteer - Modern web scraping with JavaScript support
Protocol: Model Context Protocol - AI integration standard

🚀 Quick Start

Prerequisites

Bun Runtime (recommended):

curl -fsSL https://bun.sh/install | bash

ChromaDB (Docker):

docker run -d --name chromadb -p 8000:8000 chromadb/chroma:latest

Ollama (for embeddings):

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull nomic-embed-text

Installation

Install dependencies:

# Library dependencies
cd lib && bun install

# Application dependencies  
cd ../app && bun install

Build the library:
```
cd lib && bun run build
```
Start the knowledge server:
```
cd app && bun run cli serve
```

📖 Usage

Command Line Interface

The CLI provides several commands for managing your knowledge base:

# Initialize a new knowledge base
bun run cli init --name my-knowledge-base

# Process a document
bun run cli process ./docs/README.md --title "Project Documentation"

# Search the knowledge base
bun run cli search "how to use vector databases"

# Crawl a website
bun run cli crawl https://docs.example.com --max-depth 3

# View statistics
bun run cli stats

# Start the MCP server
bun run cli serve --port 3000

Library Usage

import {
  ChromaDBKnowledgeStore,
  EnhancedDocumentProcessor,
  PuppeteerWebCrawler,
  ContextManager,
} from '@qi/knowledge';

// Initialize components
const knowledgeStore = new ChromaDBKnowledgeStore({
  collectionName: 'my-knowledge-base',
  url: 'http://localhost:8000',
  embeddingModel: 'nomic-embed-text',
});

const documentProcessor = new EnhancedDocumentProcessor({
  chunkSize: 1000,
  chunkOverlap: 200,
  enableEnrichment: true,
  topicExtraction: true,
});

const webCrawler = new PuppeteerWebCrawler({
  headless: true,
  defaultTimeout: 30000,
});

const contextManager = new ContextManager(knowledgeStore);

// Initialize all components
await knowledgeStore.initialize();
await webCrawler.initialize();

// Process and store a document
const source = {
  id: 'example-docs',
  name: 'Example Documentation',
  type: 'documentation_site',
  metadata: { priority: 10, trustScore: 0.9, isActive: true },
};

await knowledgeStore.addSource(source);

const document = await documentProcessor.processDocument(content, source);
await knowledgeStore.storeDocument(document);

// Search for relevant information
const results = await knowledgeStore.search({
  query: 'vector databases semantic search',
  options: { maxResults: 5 },
});

// Assemble intelligent context
const context = await contextManager.assembleContext(
  'How do I implement RAG with ChromaDB?',
  { maxTokens: 4000 }
);

Claude Code CLI Integration

Add to your ~/.config/claude-code/config.json:

{
  "mcpServers": {
    "knowledge-base": {
      "command": "node",
      "args": ["/path/to/qi-v2-knowledge/app/dist/server/index.js"],
      "env": {
        "CHROMADB_URL": "http://localhost:8000",
        "LOG_LEVEL": "info"
      }
    }
  }
}

🧪 Development

Running Tests

# Run tests in the library
cd lib && bun test

# Run tests with UI
cd lib && bun run test:ui

# Run tests with coverage
cd lib && bun run test:coverage

Code Quality

# Lint and format code
cd lib && bun run lint:fix
cd lib && bun run format

# Type checking
cd lib && bun run build

Development Server

# Start development server with hot reload
cd app && bun run dev

📊 Performance

Based on 2025 benchmarks, this implementation provides:

4x faster startup compared to Node.js (thanks to Bun)
15x faster linting compared to ESLint (thanks to Biome)
2-5x faster testing compared to Jest (thanks to Vitest)
Advanced document processing with LangChain's RecursiveCharacterTextSplitter
Modern web crawling with Puppeteer's latest anti-detection features
Efficient vector storage with ChromaDB's optimized similarity search

🔧 Configuration

The system is configured via config/knowledge-config.yaml:

server:
  name: "qi-knowledge-server"
  version: "1.0.0"
  port: 3000

chromadb:
  url: "http://localhost:8000"
  collectionName: "qi-knowledge-base"
  embeddingModel: "nomic-embed-text"

processing:
  chunkSize: 1000
  chunkOverlap: 200
  enableEnrichment: true
  topicExtraction: true

crawling:
  maxDepth: 3
  maxPages: 100
  respectRobotsTxt: true

Environment variables can override configuration:

CHROMADB_URL - ChromaDB connection URL
OLLAMA_URL - Ollama server URL
LOG_LEVEL - Logging level (debug, info, warn, error)

🐳 Docker Deployment

FROM oven/bun:1 as base
WORKDIR /app

# Copy and install dependencies
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile

# Copy source and build
COPY . .
RUN bun run build

# Start the server
CMD ["bun", "start"]

# docker-compose.yml
version: '3.8'
services:
  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - chromadb_data:/chroma/chroma

  knowledge-server:
    build: .
    depends_on:
      - chromadb
    environment:
      - CHROMADB_URL=http://chromadb:8000
    ports:
      - "3000:3000"

volumes:
  chromadb_data:

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature

Make your changes using the modern tooling:

bun run lint:fix  # Format and lint
bun test         # Run tests

Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments

ChromaDB team for the excellent vector database
LangChain team for comprehensive document processing tools
Anthropic for the Model Context Protocol specification
Oven team for the amazing Bun runtime
Biome team for next-generation linting and formatting
Vitest team for the fast and modern testing framework

Built with ❤️ using 2025's best JavaScript tooling