tbrandenburg/pycontextify
If you are the rightful owner of pycontextify and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
PyContextify is a Python-based MCP server designed for semantic search over codebases, documents, and webpages, featuring lightweight knowledge graph capabilities.
PyContextify

One-line: Semantic search server with relationship-aware discovery across codebases and documents.
PyContextify is a Python-based MCP (Model Context Protocol) server that provides intelligent semantic search capabilities over diverse knowledge sources. It combines vector similarity search with basic relationship tracking to help developers, researchers, and technical writers discover contextually relevant information across codebases and documentation.
Main Features:
- 🔍 Semantic Search: Vector similarity with FAISS + hybrid keyword search
- 📚 Multi-Source: Index code and documents (PDF/MD/TXT)
- 🧠 Smart Chunking: Content-aware processing (code boundaries, document hierarchy)
- ⚡ Pre-loaded Models: Embedders initialize at startup for fast first requests
- 🔗 Relationship Tracking: Basic relationship extraction (tags, references, code symbols)
- 🛠️ MCP Protocol: 5 essential functions for seamless AI assistant integration
Quickstart
# Install UV and dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
# Run MCP server
uv run pycontextify --verbose
Run with uvx
If you want to execute the published CLI without modifying your current
environment, uvx can resolve pycontextify from PyPI and run its console
entry point directly:
uvx pycontextify -- --help
The double dash (--) ensures any following arguments are forwarded to
PyContextify itself. This requires a released version of the package to be
available on PyPI, which the manual publishing workflow now provides.
System Requirements
- Python: Python 3.10 or newer for the MCP server core (full test suite currently targets Python 3.13+).
- Package management: Ability to install dependencies via UV and resolve all runtime libraries, including FAISS, sentence-transformers, PDF processors, and supporting utilities.
- CPU: 64-bit multi-core processor (4+ cores recommended) so FAISS vector search and sentence-transformers embedding generation can run locally without bottlenecks.
- Memory: 8 GB RAM minimum (16 GB recommended for larger corpora) because embeddings and FAISS indexes reside in-process and scale with corpus size; switch to the lighter
all-MiniLM-L6-v2model if constrained. - Network access: Internet connectivity on first run to download sentence-transformers models and other remote assets.
- Storage & filesystem: At least 5 GB of free disk space to install Python dependencies, download embedding models, and persist FAISS indexes in
PYCONTEXTIFY_INDEX_DIR, along with write access for temporary working folders during indexing and testing. - Optional acceleration: CUDA-capable GPU support is available by installing the optional
gpudependency group (faiss-gpu) alongside the default CPU build.
Table of Contents
- Quickstart
- Installation
- Usage
- Chunking Techniques
- Configuration
- API Reference
- Tests & CI
- Changelog
- Contributing
- License
- Security
- Maintainers
Installation
PyPI (recommended)
pip install pycontextify
Extras are available for specific workflows:
pip install "pycontextify[dev]"– testing, linting, and packaging helperspip install "pycontextify[nlp]"– optional spaCy language modelpip install "pycontextify[ollama]"/[openai]– alternative embedding providers
From Source with UV
Requirements: Python 3.10+ and the UV package manager
# Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install dependencies
git clone https://github.com/pycontextify/pycontextify.git
cd pycontextify
uv sync
# Optional: Install development + release tooling
uv sync --extra dev
To update dependencies from scratch use uv sync --reinstall.
Usage
Minimal example:
# Start the MCP server
uv run pycontextify
# Index content (via MCP client/AI assistant)
# The server exposes 5 MCP functions:
# - index_filebase(path, tags) - Unified indexing for code & docs
# - discover() - List indexed tags
# - search(query, top_k=5) - Semantic search
# - reset_index(remove_files=True, confirm=False) - Clear index data
# - status() - Get system status and statistics
Expected output:
Starting PyContextify MCP Server...
Server provides 5 essential MCP functions:
- index_filebase(path, tags): Unified filebase indexing
- discover(): List indexed tags
- search(query, top_k): Basic semantic search
- reset_index(confirm=True): Clear all indexed content
- status(): Get system status and statistics
MCP server ready and listening for requests...
Chunking Techniques
PyContextify employs a hierarchical chunking system with specialized processors for different content types, optimizing semantic search while preserving structural integrity.
Content-Aware Chunking Strategies
Code Chunking (CodeChunker)
- Primary Strategy: Structure-aware splitting by function/class boundaries
- Language Support: Python, JavaScript, TypeScript, Java, C/C++, Rust, Go, and more
- Boundary Detection:
def,class,function,const,var,let,public,private,protected - Relationship Extraction: Functions, classes, imports, variable assignments
- Fallback: Token-based splitting when code blocks exceed size limits
Document Chunking (DocumentChunker)
- Primary Strategy: Markdown header hierarchy preservation (
#,##,###) - Section Tracking: Maintains parent-section relationships for context
- Content Filtering: Requires minimum 50 characters per meaningful chunk
- Relationship Extraction: Links
[text](url), citations[1],(Smith 2020), emphasized terms - Fallback: Token-based splitting when no structure is detected
Simple Chunking (SimpleChunker)
- Fallback Strategy: Pure token-based chunking for unstructured content
- Basic Relationships: Capitalized word extraction for entity hints
- Universal Compatibility: Handles any text format as last resort
Technical Configuration
chunk_size: int = 512 # Target tokens per chunk (configurable)
chunk_overlap: int = 64 # Overlap between adjacent chunks
enable_relationships: bool # Extract lightweight knowledge graph data
max_relationships_per_chunk: int # Limit relationships to avoid noise
Key Features
- Smart Selection: Automatic chunker selection via
ChunkerFactorybased on content type - Token Estimation:
words × 1.3heuristic for English text with automatic oversized chunk splitting - Position Tracking: Maintains precise character start/end positions for all chunks
- Metadata Preservation: Source path, embedding info, creation timestamps, and custom metadata
- Relationship Graph: Lightweight knowledge extraction (imports, references, citations, links)
Bottom Line: PyContextify's chunking system intelligently adapts to content structure—respecting code boundaries and document hierarchy—while maintaining configurable token limits and extracting contextual relationships for enhanced semantic search.
Configuration
Required environment variables / config:
PYCONTEXTIFY_EMBEDDING_MODEL— string — default:all-MiniLM-L6-v2— Embedding model for semantic searchPYCONTEXTIFY_EMBEDDING_PROVIDER— string — default:sentence_transformers— Embedding provider (sentence_transformers, ollama, openai)PYCONTEXTIFY_INDEX_DIR— string — default:./index_data— Directory for storing search indicesPYCONTEXTIFY_AUTO_PERSIST— boolean — default:true— Automatically save after indexingPYCONTEXTIFY_AUTO_LOAD— boolean — default:true— Automatically load index on startupPYCONTEXTIFY_CHUNK_SIZE— integer — default:512— Text chunk size for processingPYCONTEXTIFY_USE_HYBRID_SEARCH— boolean — default:false— Enable hybrid vector + keyword search
Priority: CLI arguments > Environment variables > Defaults
Copy .env.example to .env and customize as needed.
API Reference
PyContextify exposes 5 MCP (Model Context Protocol) functions for semantic search and indexing:
index_filebase(path, tags)- Unified indexing for codebases and documents with relationship extractiondiscover()- List indexed tags for browsing and filteringsearch(query, top_k=5)- Hybrid semantic + keyword searchreset_index(remove_files=True, confirm=False)- Clear index datastatus()- Get system statistics and health
Full docs: See for development guidance and architecture details
Tests & CI
Run tests:
# Run all tests with coverage (requires uv >= 0.4.20 for dependency groups)
uv run --extra dev --group dev pytest --cov=pycontextify
# Run MCP-specific tests
uv run python scripts/run_mcp_tests.py
# Quick smoke test
uv run python scripts/run_mcp_tests.py --smoke
CI: Manual testing
Publishing to PyPI
Use the dedicated release checklist in when preparing a public build.
Quick reference:
-
Bump
versioninpyproject.toml(usepython scripts/bump_version.py [major|minor|patch]for automation and ensure changelog coverage) -
Run the full test suite or
uv run python scripts/run_mcp_tests.py --smoke -
Build distributables and run metadata checks:
python scripts/build_package.py -
Upload to TestPyPI or PyPI with Twine once validation passes:
twine upload dist/*
Changelog
Detailed release history lives in . Update the changelog alongside any version bump so users can track notable changes between releases.
Contributing
Please read (or follow the short flow below):
- Fork the project
- Create a branch
feature/your-feature - Add tests and documentation
- Open a pull request
Security
Please report security issues to: Create an issue in this repository (or see )
License
This project is licensed under the MIT License — see the file for details.
Maintainers
- PyContextify Project — contact: Create an issue for questions or support