pycontextify by tbrandenburg - MCP Server

PyContextify

One-line: Semantic search server with relationship-aware discovery across codebases and documents.

PyContextify is a Python-based MCP (Model Context Protocol) server that provides intelligent semantic search capabilities over diverse knowledge sources. It combines vector similarity search with basic relationship tracking to help developers, researchers, and technical writers discover contextually relevant information across codebases and documentation.

Main Features:

🔍 Semantic Search: Vector similarity with FAISS + hybrid keyword search
📚 Multi-Source: Index code and documents (PDF/MD/TXT)
🧠 Smart Chunking: Content-aware processing (code boundaries, document hierarchy)
⚡ Pre-loaded Models: Embedders initialize at startup for fast first requests
🔗 Relationship Tracking: Basic relationship extraction (tags, references, code symbols)
🛠️ MCP Protocol: 5 essential functions for seamless AI assistant integration

Quickstart

# Install UV and dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync

# Run MCP server
uv run pycontextify --verbose

Run with `uvx`

If you want to execute the published CLI without modifying your current environment, uvx can resolve pycontextify from PyPI and run its console entry point directly:

uvx pycontextify -- --help

The double dash (--) ensures any following arguments are forwarded to PyContextify itself. This requires a released version of the package to be available on PyPI, which the manual publishing workflow now provides.

System Requirements

Python: Python 3.10 or newer for the MCP server core (full test suite currently targets Python 3.13+).
Package management: Ability to install dependencies via UV and resolve all runtime libraries, including FAISS, sentence-transformers, PDF processors, and supporting utilities.
CPU: 64-bit multi-core processor (4+ cores recommended) so FAISS vector search and sentence-transformers embedding generation can run locally without bottlenecks.
Memory: 8 GB RAM minimum (16 GB recommended for larger corpora) because embeddings and FAISS indexes reside in-process and scale with corpus size; switch to the lighter all-MiniLM-L6-v2 model if constrained.
Network access: Internet connectivity on first run to download sentence-transformers models and other remote assets.
Storage & filesystem: At least 5 GB of free disk space to install Python dependencies, download embedding models, and persist FAISS indexes in PYCONTEXTIFY_INDEX_DIR, along with write access for temporary working folders during indexing and testing.
Optional acceleration: CUDA-capable GPU support is available by installing the optional gpu dependency group (faiss-gpu) alongside the default CPU build.

Quickstart
Installation
Usage
Chunking Techniques
Configuration
API Reference
Tests & CI
Changelog
Contributing
License
Security
Maintainers

Installation

PyPI (recommended)

pip install pycontextify

Extras are available for specific workflows:

pip install "pycontextify[dev]" – testing, linting, and packaging helpers
pip install "pycontextify[nlp]" – optional spaCy language model
pip install "pycontextify[ollama]" / [openai] – alternative embedding providers

From Source with UV

Requirements: Python 3.10+ and the UV package manager

# Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install dependencies
git clone https://github.com/pycontextify/pycontextify.git
cd pycontextify
uv sync

# Optional: Install development + release tooling
uv sync --extra dev

To update dependencies from scratch use uv sync --reinstall.

Usage

Minimal example:

# Start the MCP server
uv run pycontextify

# Index content (via MCP client/AI assistant)
# The server exposes 5 MCP functions:
# - index_filebase(path, tags) - Unified indexing for code & docs
# - discover() - List indexed tags
# - search(query, top_k=5) - Semantic search
# - reset_index(remove_files=True, confirm=False) - Clear index data
# - status() - Get system status and statistics

Expected output:

Starting PyContextify MCP Server...
Server provides 5 essential MCP functions:
  - index_filebase(path, tags): Unified filebase indexing
  - discover(): List indexed tags
  - search(query, top_k): Basic semantic search
  - reset_index(confirm=True): Clear all indexed content
  - status(): Get system status and statistics
MCP server ready and listening for requests...

Chunking Techniques

PyContextify employs a hierarchical chunking system with specialized processors for different content types, optimizing semantic search while preserving structural integrity.

Content-Aware Chunking Strategies

Code Chunking (`CodeChunker`)

Primary Strategy: Structure-aware splitting by function/class boundaries
Language Support: Python, JavaScript, TypeScript, Java, C/C++, Rust, Go, and more
Boundary Detection: def, class, function, const, var, let, public, private, protected
Relationship Extraction: Functions, classes, imports, variable assignments
Fallback: Token-based splitting when code blocks exceed size limits

Document Chunking (`DocumentChunker`)

Primary Strategy: Markdown header hierarchy preservation (#, ##, ###)
Section Tracking: Maintains parent-section relationships for context
Content Filtering: Requires minimum 50 characters per meaningful chunk
Relationship Extraction: Links [text](url), citations [1], (Smith 2020), emphasized terms
Fallback: Token-based splitting when no structure is detected

Simple Chunking (`SimpleChunker`)

Fallback Strategy: Pure token-based chunking for unstructured content
Basic Relationships: Capitalized word extraction for entity hints
Universal Compatibility: Handles any text format as last resort

Technical Configuration

chunk_size: int = 512        # Target tokens per chunk (configurable)
chunk_overlap: int = 64      # Overlap between adjacent chunks  
enable_relationships: bool   # Extract lightweight knowledge graph data
max_relationships_per_chunk: int  # Limit relationships to avoid noise

Key Features

Smart Selection: Automatic chunker selection via ChunkerFactory based on content type
Token Estimation: words × 1.3 heuristic for English text with automatic oversized chunk splitting
Position Tracking: Maintains precise character start/end positions for all chunks
Metadata Preservation: Source path, embedding info, creation timestamps, and custom metadata
Relationship Graph: Lightweight knowledge extraction (imports, references, citations, links)

Bottom Line: PyContextify's chunking system intelligently adapts to content structure—respecting code boundaries and document hierarchy—while maintaining configurable token limits and extracting contextual relationships for enhanced semantic search.

Configuration

Required environment variables / config:

PYCONTEXTIFY_EMBEDDING_MODEL — string — default: all-MiniLM-L6-v2 — Embedding model for semantic search
PYCONTEXTIFY_EMBEDDING_PROVIDER — string — default: sentence_transformers — Embedding provider (sentence_transformers, ollama, openai)
PYCONTEXTIFY_INDEX_DIR — string — default: ./index_data — Directory for storing search indices
PYCONTEXTIFY_AUTO_PERSIST — boolean — default: true — Automatically save after indexing
PYCONTEXTIFY_AUTO_LOAD — boolean — default: true — Automatically load index on startup
PYCONTEXTIFY_CHUNK_SIZE — integer — default: 512 — Text chunk size for processing
PYCONTEXTIFY_USE_HYBRID_SEARCH — boolean — default: false — Enable hybrid vector + keyword search

Priority: CLI arguments > Environment variables > Defaults

Copy .env.example to .env and customize as needed.

API Reference

PyContextify exposes 5 MCP (Model Context Protocol) functions for semantic search and indexing:

index_filebase(path, tags) - Unified indexing for codebases and documents with relationship extraction
discover() - List indexed tags for browsing and filtering
search(query, top_k=5) - Hybrid semantic + keyword search
reset_index(remove_files=True, confirm=False) - Clear index data
status() - Get system statistics and health

Full docs: See for development guidance and architecture details

Tests & CI

Run tests:

# Run all tests with coverage (requires uv >= 0.4.20 for dependency groups)
uv run --extra dev --group dev pytest --cov=pycontextify

# Run MCP-specific tests
uv run python scripts/run_mcp_tests.py

# Quick smoke test
uv run python scripts/run_mcp_tests.py --smoke

CI: Manual testing

Publishing to PyPI

Use the dedicated release checklist in when preparing a public build.

Quick reference:

Bump version in pyproject.toml (use python scripts/bump_version.py [major|minor|patch] for automation and ensure changelog coverage)
Run the full test suite or uv run python scripts/run_mcp_tests.py --smoke
Build distributables and run metadata checks:
```
python scripts/build_package.py
```
Upload to TestPyPI or PyPI with Twine once validation passes:
```
twine upload dist/*
```

Changelog

Detailed release history lives in . Update the changelog alongside any version bump so users can track notable changes between releases.

Contributing

Please read (or follow the short flow below):

Fork the project
Create a branch feature/your-feature
Add tests and documentation
Open a pull request

Security

Please report security issues to: Create an issue in this repository (or see )

License

This project is licensed under the MIT License — see the file for details.

Maintainers

PyContextify Project — contact: Create an issue for questions or support

tbrandenburg/pycontextify

PyContextify

Quickstart

Run with `uvx`

System Requirements

Table of Contents

Installation

PyPI (recommended)

From Source with UV

Usage

Chunking Techniques

Content-Aware Chunking Strategies

Code Chunking (`CodeChunker`)

Document Chunking (`DocumentChunker`)

Simple Chunking (`SimpleChunker`)

Technical Configuration

Key Features

Configuration

API Reference

Tests & CI

Publishing to PyPI

Changelog

Contributing

Security

License

Maintainers

tbrandenburg/pycontextify

PyContextify

Quickstart

Run with uvx

System Requirements

Table of Contents

Installation

PyPI (recommended)

From Source with UV

Usage

Chunking Techniques

Content-Aware Chunking Strategies

Code Chunking (CodeChunker)

Document Chunking (DocumentChunker)

Simple Chunking (SimpleChunker)

Technical Configuration

Key Features

Configuration

API Reference

Tests & CI

Publishing to PyPI

Changelog

Contributing

Security

License

Maintainers

Run with `uvx`

Code Chunking (`CodeChunker`)

Document Chunking (`DocumentChunker`)

Simple Chunking (`SimpleChunker`)