qi-v2-knowledge

zhifengzhang-sz/qi-v2-knowledge

3.2

If you are the rightful owner of qi-v2-knowledge and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Qi Knowledge MCP Server is a high-performance knowledge management system designed to integrate seamlessly with AI applications through the Model Context Protocol.

Qi Knowledge MCP Server

A modern, high-performance knowledge management system built with the latest 2025 JavaScript tooling stack. This implementation provides universal knowledge storage, semantic search, web crawling, and Model Context Protocol (MCP) integration.

๐Ÿš€ Features

  • Universal Knowledge Storage: ChromaDB-powered vector database with semantic search
  • Advanced Document Processing: LangChain integration with intelligent chunking and metadata extraction
  • Modern Web Crawling: Puppeteer-based crawler with anti-detection and performance optimization
  • Context Management: Intelligent context assembly and optimization for AI applications
  • MCP Integration: Full Model Context Protocol support for Claude Code CLI and other clients
  • Modern Tooling: Built with Bun, Biome, and Vitest for optimal developer experience

๐Ÿ—๏ธ Architecture

qi-v2-knowledge/
โ”œโ”€โ”€ lib/                    # Core @qi/knowledge library
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ types/         # TypeScript interfaces and types
โ”‚   โ”‚   โ”œโ”€โ”€ storage/       # ChromaDB integration layer
โ”‚   โ”‚   โ”œโ”€โ”€ processing/    # Document processing & web crawling
โ”‚   โ”‚   โ”œโ”€โ”€ context/       # Context management
โ”‚   โ”‚   โ””โ”€โ”€ mcp/          # MCP server implementation (TODO)
โ”‚   โ””โ”€โ”€ package.json       # Library dependencies
โ”œโ”€โ”€ app/                   # Application layer
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ cli/          # Command-line interface
โ”‚   โ”‚   โ”œโ”€โ”€ server/       # Server entry point
โ”‚   โ”‚   โ””โ”€โ”€ config/       # Configuration management
โ”‚   โ””โ”€โ”€ package.json      # App dependencies with @qi/knowledge alias
โ””โ”€โ”€ config/               # Configuration files
    โ””โ”€โ”€ knowledge-config.yaml

๐Ÿ› ๏ธ Technology Stack

  • Runtime: Bun - Ultra-fast JavaScript runtime (4x faster startup)
  • Linting/Formatting: Biome - 15x faster than ESLint, type-aware linting
  • Testing: Vitest - 2-5x faster than Jest, native ESM support
  • Vector Database: ChromaDB - High-performance vector storage
  • Document Processing: LangChain - Advanced text processing and chunking
  • Web Crawling: Puppeteer - Modern web scraping with JavaScript support
  • Protocol: Model Context Protocol - AI integration standard

๐Ÿš€ Quick Start

Prerequisites

  1. Bun Runtime (recommended):

    curl -fsSL https://bun.sh/install | bash
    
  2. ChromaDB (Docker):

    docker run -d --name chromadb -p 8000:8000 chromadb/chroma:latest
    
  3. Ollama (for embeddings):

    curl -fsSL https://ollama.ai/install.sh | sh
    ollama pull nomic-embed-text
    

Installation

  1. Install dependencies:

    # Library dependencies
    cd lib && bun install
    
    # Application dependencies  
    cd ../app && bun install
    
  2. Build the library:

    cd lib && bun run build
    
  3. Start the knowledge server:

    cd app && bun run cli serve
    

๐Ÿ“– Usage

Command Line Interface

The CLI provides several commands for managing your knowledge base:

# Initialize a new knowledge base
bun run cli init --name my-knowledge-base

# Process a document
bun run cli process ./docs/README.md --title "Project Documentation"

# Search the knowledge base
bun run cli search "how to use vector databases"

# Crawl a website
bun run cli crawl https://docs.example.com --max-depth 3

# View statistics
bun run cli stats

# Start the MCP server
bun run cli serve --port 3000

Library Usage

import {
  ChromaDBKnowledgeStore,
  EnhancedDocumentProcessor,
  PuppeteerWebCrawler,
  ContextManager,
} from '@qi/knowledge';

// Initialize components
const knowledgeStore = new ChromaDBKnowledgeStore({
  collectionName: 'my-knowledge-base',
  url: 'http://localhost:8000',
  embeddingModel: 'nomic-embed-text',
});

const documentProcessor = new EnhancedDocumentProcessor({
  chunkSize: 1000,
  chunkOverlap: 200,
  enableEnrichment: true,
  topicExtraction: true,
});

const webCrawler = new PuppeteerWebCrawler({
  headless: true,
  defaultTimeout: 30000,
});

const contextManager = new ContextManager(knowledgeStore);

// Initialize all components
await knowledgeStore.initialize();
await webCrawler.initialize();

// Process and store a document
const source = {
  id: 'example-docs',
  name: 'Example Documentation',
  type: 'documentation_site',
  metadata: { priority: 10, trustScore: 0.9, isActive: true },
};

await knowledgeStore.addSource(source);

const document = await documentProcessor.processDocument(content, source);
await knowledgeStore.storeDocument(document);

// Search for relevant information
const results = await knowledgeStore.search({
  query: 'vector databases semantic search',
  options: { maxResults: 5 },
});

// Assemble intelligent context
const context = await contextManager.assembleContext(
  'How do I implement RAG with ChromaDB?',
  { maxTokens: 4000 }
);

Claude Code CLI Integration

Add to your ~/.config/claude-code/config.json:

{
  "mcpServers": {
    "knowledge-base": {
      "command": "node",
      "args": ["/path/to/qi-v2-knowledge/app/dist/server/index.js"],
      "env": {
        "CHROMADB_URL": "http://localhost:8000",
        "LOG_LEVEL": "info"
      }
    }
  }
}

๐Ÿงช Development

Running Tests

# Run tests in the library
cd lib && bun test

# Run tests with UI
cd lib && bun run test:ui

# Run tests with coverage
cd lib && bun run test:coverage

Code Quality

# Lint and format code
cd lib && bun run lint:fix
cd lib && bun run format

# Type checking
cd lib && bun run build

Development Server

# Start development server with hot reload
cd app && bun run dev

๐Ÿ“Š Performance

Based on 2025 benchmarks, this implementation provides:

  • 4x faster startup compared to Node.js (thanks to Bun)
  • 15x faster linting compared to ESLint (thanks to Biome)
  • 2-5x faster testing compared to Jest (thanks to Vitest)
  • Advanced document processing with LangChain's RecursiveCharacterTextSplitter
  • Modern web crawling with Puppeteer's latest anti-detection features
  • Efficient vector storage with ChromaDB's optimized similarity search

๐Ÿ”ง Configuration

The system is configured via config/knowledge-config.yaml:

server:
  name: "qi-knowledge-server"
  version: "1.0.0"
  port: 3000

chromadb:
  url: "http://localhost:8000"
  collectionName: "qi-knowledge-base"
  embeddingModel: "nomic-embed-text"

processing:
  chunkSize: 1000
  chunkOverlap: 200
  enableEnrichment: true
  topicExtraction: true

crawling:
  maxDepth: 3
  maxPages: 100
  respectRobotsTxt: true

Environment variables can override configuration:

  • CHROMADB_URL - ChromaDB connection URL
  • OLLAMA_URL - Ollama server URL
  • LOG_LEVEL - Logging level (debug, info, warn, error)

๐Ÿณ Docker Deployment

FROM oven/bun:1 as base
WORKDIR /app

# Copy and install dependencies
COPY package.json bun.lockb ./
RUN bun install --frozen-lockfile

# Copy source and build
COPY . .
RUN bun run build

# Start the server
CMD ["bun", "start"]
# docker-compose.yml
version: '3.8'
services:
  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - chromadb_data:/chroma/chroma

  knowledge-server:
    build: .
    depends_on:
      - chromadb
    environment:
      - CHROMADB_URL=http://chromadb:8000
    ports:
      - "3000:3000"

volumes:
  chromadb_data:

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes using the modern tooling:
    bun run lint:fix  # Format and lint
    bun test         # Run tests
    
  4. Commit your changes: git commit -m 'Add amazing feature'
  5. Push to the branch: git push origin feature/amazing-feature
  6. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the file for details.

๐Ÿ™ Acknowledgments

  • ChromaDB team for the excellent vector database
  • LangChain team for comprehensive document processing tools
  • Anthropic for the Model Context Protocol specification
  • Oven team for the amazing Bun runtime
  • Biome team for next-generation linting and formatting
  • Vitest team for the fast and modern testing framework

Built with โค๏ธ using 2025's best JavaScript tooling