matthewhanson/athenaeum
If you are the rightful owner of athenaeum and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Athenaeum is a Retrieval-Augmented Generation (RAG) system that provides an MCP-compatible server for document retrieval and question answering.
Athenaeum
Give your LLM a library.
A RAG (Retrieval-Augmented Generation) system built with LlamaIndex and FastAPI that provides a REST API for document retrieval and question answering.
Features
- Markdown-Focused: Optimized for indexing markdown documents with structure-aware parsing
- Vector Search: FAISS-backed vector search using HuggingFace embeddings
- REST API: Three endpoints for different use cases (search, answer, chat with tool calling)
- CLI Tools: Build indices, query, and run the server
- AWS Lambda Deployment: Serverless deployment with CDK, OAuth authentication, and S3 index storage
- Reusable CDK Constructs: L3 constructs for dependencies layer and API server deployment
- Well-Tested: Comprehensive test suite with 12 passing tests
- Clean Architecture: Logical separation between indexing, retrieval, API, and CLI layers
Installation
From PyPI
pip install athenaeum
# With deployment extras for AWS CDK
pip install athenaeum[deploy]
From Source
This project uses uv for package management and Python 3.12+.
# Clone the repository
git clone https://github.com/matthewhanson/athenaeum.git
cd athenaeum
# Install dependencies (uv will automatically create/use .venv)
uv sync
# Or install in development mode
uv pip install -e ".[dev]"
Usage
CLI Commands
All commands are available via the athenaeum CLI:
# Show version
uv run athenaeum --version
# Get help
uv run athenaeum --help
Build an Index
# Basic indexing (defaults to *.md files)
uv run athenaeum index ./your_markdown_docs --output ./index
# Custom embedding model and chunk settings
uv run athenaeum index ./docs \
--output ./index \
--embed-model "sentence-transformers/all-MiniLM-L6-v2" \
--chunk-size 1024 \
--chunk-overlap 200
# Exclude specific patterns
uv run athenaeum index ./docs \
--output ./index \
--exclude "**/.git/**" "**/__pycache__/**"
Query the Index
# Basic query
uv run athenaeum query "What is the main topic?" --output ./index
# With more context
uv run athenaeum query "Explain the key concepts" \
--output ./index \
--top-k 10 \
--sources
Run the API Server
# Start server with default settings
uv run athenaeum serve --index ./index
# Custom host and port
uv run athenaeum serve --index ./index --host 0.0.0.0 --port 8000
# With auto-reload for development
uv run athenaeum serve --index ./index --reload
AWS Lambda Deployment
Athenaeum provides example deployment configurations for AWS Lambda using Docker container images (required for PyTorch + ML dependencies).
Two Deployment Approaches
1. Application-Specific Deployment (Recommended)
Your application has its own Dockerfile that:
- Installs
athenaeumas a dependency (from PyPI) - Copies your application-specific index into the container image
- Configures application-specific settings
This is the recommended approach. See examples/deployment/README.md for:
- Complete Dockerfile template
- Step-by-step customization guide
- CDK deployment example
- Production best practices
Benefits:
- Index baked into Docker image (no S3 download latency)
- Simpler architecture (no S3 bucket needed)
- Faster cold starts
- Easier to version and deploy
2. Example Template Deployment
Athenaeum includes complete example deployment files in examples/deployment/:
Dockerfile- Reference implementationrequirements.txt- Lambda dependenciesrun.sh- Lambda Web Adapter startup script.dockerignore- Build optimization
Use the template:
# Copy the template to your project
cp -r athenaeum/examples/deployment/* my-project/
# Customize for your needs:
# - Add your index: COPY index/ /var/task/index
# - Update requirements if needed
# - Modify environment variables
Quick Start with CDK
from aws_cdk import Stack, CfnOutput, Duration
from athenaeum.infra import APIServerContainerConstruct
import os
class MyStack(Stack):
def __init__(self, scope, construct_id, **kwargs):
super().__init__(scope, construct_id, **kwargs)
server = APIServerContainerConstruct(
self, "Server",
dockerfile_path="./Dockerfile", # Your Dockerfile
docker_build_context=".", # Build from current dir
index_path=None, # Index baked into image
environment={
"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"],
},
memory_size=2048, # 2GB for ML workloads
timeout=Duration.minutes(5),
)
CfnOutput(self, "ApiUrl", value=server.api_url)
Deploy:
export OPENAI_API_KEY=sk-...
cdk deploy
Deployment Architecture
Container Image Approach:
- Lambda function with Docker container (up to 10GB)
- Index baked into image at
/var/task/index - FastAPI + Lambda Web Adapter for HTTP handling
- API Gateway REST API with CORS
- CloudWatch Logs for monitoring
Resource Limits:
- Docker image: 10GB uncompressed, 10GB compressed in ECR
- Lambda memory: 128MB - 10GB (recommend 2GB for ML)
- Lambda storage: /tmp up to 10GB (ephemeral)
- Timeout: Up to 15 minutes (recommend 5 minutes)
Cost Estimate: ~$1-2/month for 10K requests with 2GB memory and 10MB index
Complete guides:
- - Deployment template and instructions
- - Examples overview
API Server
The server provides clean HTTP endpoints for RAG operations:
Endpoints
GET /
Landing page with API documentation
Response:
{
"service": "Athenaeum API Server",
"version": "0.1.0",
"endpoints": {
"/health": "Health check",
"/models": "List available models",
"/search": "Search for context chunks",
"/answer": "Single-search RAG answer",
"/chat": "Chat with tool calling (multi-search)"
}
}
GET /health
Health check endpoint
Response:
{"status": "ok"}
GET /models
List available retrieval models
Response:
{
"object": "list",
"data": [
{
"id": "athenaeum-index-retrieval",
"object": "model",
"created": 1234567890,
"owned_by": "athenaeum"
}
]
}
POST /search
Search for context chunks matching a query
Request:
{
"query": "What are the key concepts?",
"limit": 5
}
Response:
{
"object": "list",
"data": [
{
"id": "doc1.txt",
"content": "Context chunk content...",
"metadata": {
"path": "doc1.txt",
"score": 0.95
}
}
],
"model": "athenaeum-index-retrieval"
}
POST /chat
Generate an answer using RAG
Request:
{
"messages": [
{"role": "user", "content": "What are the main topics?"}
],
"model": "athenaeum-index-retrieval"
}
Response:
{
"id": "chat-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "athenaeum-index-retrieval",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The main topics are..."
},
"finish_reason": "stop"
}
]
}
Project Structure
The codebase is organized by concern with clear separation between indexing, retrieval, and interface layers:
src/athenaeum/
├── utils.py # Shared utilities (~22 lines)
│ └── setup_settings() - Configure LlamaIndex with MarkdownNodeParser
│
├── indexer.py # Markdown indexing (~169 lines)
│ ├── build_index() - PUBLIC API - Build FAISS index from markdown
│ └── _validate_paths(), _build_document_reader(), etc. - Private helpers
│
├── retriever.py # Query & retrieval (~109 lines)
│ ├── query_index() - PUBLIC API - Query with answer generation
│ ├── retrieve_context() - PUBLIC API - Retrieve context chunks
│ └── _load_index_storage() - Private helper
│
├── api_server.py # FastAPI REST API server (~400 lines)
│ ├── GET / - Landing page with API docs
│ ├── GET /health - Health check
│ ├── GET /models - List models
│ ├── POST /search - Search for context (raw vector search)
│ ├── POST /answer - Single-search RAG (quick answers)
│ └── POST /chat - Multi-search with tool calling (interactive)
│
└── main_cli.py # Typer CLI (~160 lines)
├── index - Build markdown index
├── query - Query index
└── serve - Launch API server
tests/
├── test_utils.py # Test shared utilities
├── test_indexer.py # Test indexing functions
├── test_retriever.py # Test retrieval functions
├── test_api_server.py # Test all API endpoints
└── test_cli.py # Test CLI commands
Design Principles
- Markdown-First: Uses LlamaIndex's
MarkdownNodeParserfor structure-aware chunking - Separation of Concerns: Indexing (
indexer.py) vs Retrieval (retriever.py) - Minimal Public API: Internal helpers prefixed with
_ - Thin Interface Layers: CLI and API delegate to business logic
- No Duplication: Only truly shared code in
utils.py
Key Dependencies
- LlamaIndex: Vector search,
MarkdownNodeParser, and RAG orchestration - FastAPI: HTTP API server
- FAISS: Efficient vector storage and similarity search
- HuggingFace Transformers: Local embeddings (all-MiniLM-L6-v2)
- Typer: CLI framework
- Pydantic: Data validation for API
Deployment Dependencies (optional)
- AWS CDK: Infrastructure as code for Lambda deployment
- Lambda Web Adapter: AWS's official adapter for running web apps on Lambda
- python-jose: JWT/OAuth token validation
Environment Variables
# Optional: Override default LLM for answer generation
export OPENAI_MODEL="gpt-4o-mini"
# For API server (set automatically by CLI)
export ATHENAEUM_INDEX_DIR="/path/to/index"
# Optional: Custom system prompt for chat/answer endpoints
export CHAT_SYSTEM_PROMPT="You are a helpful assistant..."
Markdown Indexing
Athenaeum uses LlamaIndex's MarkdownNodeParser for structure-aware chunking that respects:
- Heading hierarchy
- Code blocks
- Tables
- Blockquotes
Default chunk settings:
- Size: 1024 characters (~200 words)
- Overlap: 200 characters
See for detailed guidance on optimizing markdown documents for RAG.
Contributing
We welcome contributions! See for:
- Development setup and workflow
- Running tests and code quality checks
- Code style guidelines
- Pull request process
License
MIT