perryrobinson/internal-knowledge-mcp-server
If you are the rightful owner of internal-knowledge-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Model Context Protocol (MCP) server is designed to facilitate seamless integration with Claude Code, providing a robust framework for intelligent document retrieval and processing.
KEP Knowledge Server
A semantic search system for Kubernetes Enhancement Proposals (KEPs) that provides intelligent retrieval of KEP documents using vector embeddings and FAISS.
Overview
This project implements a complete knowledge retrieval system for Kubernetes Enhancement Proposals, consisting of:
- Document Ingestion Pipeline: Discovers, extracts, and chunks KEP documents
- Vector Store: Uses FAISS for efficient similarity search with sentence embeddings
- REST API: FastAPI-based search interface
- MCP Integration: Model Context Protocol server for Claude Code integration
Features
- 🔍 Semantic Search: Find relevant KEPs based on meaning, not just keywords
- ⚡ Fast Retrieval: FAISS-powered vector search with sub-100ms query times
- 📊 Smart Chunking: Intelligent document splitting that preserves context
- 🔌 API Interface: RESTful API for easy integration
- 🤖 Claude Integration: MCP server for seamless Claude Code usage
Project Structure
.
├── ingestion/ # Document loading, extraction, and chunking
├── vector_store/ # FAISS index and search functionality
├── api/ # FastAPI REST API
├── data/ # KEP documents (cloned from kubernetes/enhancements)
├── storage/ # FAISS index storage
└── mcp-kep-knowledge/ # MCP server for Claude integration
Installation
Prerequisites
- Python 3.11+
- Git
- Node.js 18+ (for MCP server)
Setup
- Clone this repository:
git clone <repository-url>
cd internal-knowledge-mcp-server
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- The KEP repository will be cloned automatically on first run, or clone it manually:
git clone https://github.com/kubernetes/enhancements.git data/enhancements
Usage
Starting the Knowledge Server
First-time startup (with document ingestion):
python start_knowledge_server.py
Subsequent startups (loads existing index):
python start_knowledge_server.py
Force re-indexing:
python start_knowledge_server.py --reindex
The server will start on http://localhost:8000
API Endpoints
-
POST /search- Search for KEPs{ "query": "How does Kubernetes handle CRD versioning?", "top_k": 5, "min_score": 0.0 } -
GET /health- Health check and server status -
GET /stats- Index statistics -
GET /- API information
Example Search
curl -X POST "http://localhost:8000/search" \
-H "Content-Type: application/json" \
-d '{
"query": "pod security standards",
"top_k": 5
}'
Architecture
- Ingestion: KEP documents are discovered from the
data/enhancementsdirectory - Chunking: Documents are split into semantic chunks with overlap
- Embedding: Text chunks are converted to 384-dimensional vectors using SentenceTransformers
- Indexing: Vectors are stored in a FAISS index for fast similarity search
- Search: Queries are embedded and matched against the index using cosine similarity
Configuration
Key configuration options (defined in config.py):
EMBEDDING_MODEL: Default is "all-MiniLM-L6-v2"CHUNK_SIZE: Default is 512 charactersCHUNK_OVERLAP: Default is 64 charactersAPI_PORT: Default is 8000
Development Status
This project is currently under active development. See for the complete implementation roadmap.
Completed
- ✅ Phase 1: Project setup and dependencies
In Progress
- 🔄 Phase 2: Configuration system
- 🔄 Phase 3: Document ingestion pipeline
- 🔄 Phase 4: Vector store with FAISS
- 🔄 Phase 5: REST API
- 🔄 Phase 6: Main orchestration
Planned
- Phase 7: Testing & validation
- Phase 8: Docker support
- Phase 9: MCP server integration
- Phase 10: Documentation & cleanup
License
This project is for educational and development purposes.
Contributing
This is an internal development project. See for the implementation roadmap.