athenaeum

matthewhanson/athenaeum

3.2

If you are the rightful owner of athenaeum and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

Athenaeum is a Retrieval-Augmented Generation (RAG) system that provides an MCP-compatible server for document retrieval and question answering.

Athenaeum

PyPI - Version Python Version License: MIT Code style: ruff Security: bandit pre-commit

Give your LLM a library.

A RAG (Retrieval-Augmented Generation) system built with LlamaIndex and FastAPI that provides a REST API for document retrieval and question answering.

Features

  • Markdown-Focused: Optimized for indexing markdown documents with structure-aware parsing
  • Vector Search: FAISS-backed vector search using HuggingFace embeddings
  • REST API: Three endpoints for different use cases (search, answer, chat with tool calling)
  • CLI Tools: Build indices, query, and run the server
  • AWS Lambda Deployment: Serverless deployment with CDK, OAuth authentication, and S3 index storage
  • Reusable CDK Constructs: L3 constructs for dependencies layer and API server deployment
  • Well-Tested: Comprehensive test suite with 12 passing tests
  • Clean Architecture: Logical separation between indexing, retrieval, API, and CLI layers

Installation

From PyPI

pip install athenaeum

# With deployment extras for AWS CDK
pip install athenaeum[deploy]

From Source

This project uses uv for package management and Python 3.12+.

# Clone the repository
git clone https://github.com/matthewhanson/athenaeum.git
cd athenaeum

# Install dependencies (uv will automatically create/use .venv)
uv sync

# Or install in development mode
uv pip install -e ".[dev]"

Usage

CLI Commands

All commands are available via the athenaeum CLI:

# Show version
uv run athenaeum --version

# Get help
uv run athenaeum --help
Build an Index
# Basic indexing (defaults to *.md files)
uv run athenaeum index ./your_markdown_docs --output ./index

# Custom embedding model and chunk settings
uv run athenaeum index ./docs \
  --output ./index \
  --embed-model "sentence-transformers/all-MiniLM-L6-v2" \
  --chunk-size 1024 \
  --chunk-overlap 200

# Exclude specific patterns
uv run athenaeum index ./docs \
  --output ./index \
  --exclude "**/.git/**" "**/__pycache__/**"
Query the Index
# Basic query
uv run athenaeum query "What is the main topic?" --output ./index

# With more context
uv run athenaeum query "Explain the key concepts" \
  --output ./index \
  --top-k 10 \
  --sources
Run the API Server
# Start server with default settings
uv run athenaeum serve --index ./index

# Custom host and port
uv run athenaeum serve --index ./index --host 0.0.0.0 --port 8000

# With auto-reload for development
uv run athenaeum serve --index ./index --reload

AWS Lambda Deployment

Athenaeum provides example deployment configurations for AWS Lambda using Docker container images (required for PyTorch + ML dependencies).

Two Deployment Approaches

1. Application-Specific Deployment (Recommended)

Your application has its own Dockerfile that:

  • Installs athenaeum as a dependency (from PyPI)
  • Copies your application-specific index into the container image
  • Configures application-specific settings

This is the recommended approach. See examples/deployment/README.md for:

  • Complete Dockerfile template
  • Step-by-step customization guide
  • CDK deployment example
  • Production best practices

Benefits:

  • Index baked into Docker image (no S3 download latency)
  • Simpler architecture (no S3 bucket needed)
  • Faster cold starts
  • Easier to version and deploy
2. Example Template Deployment

Athenaeum includes complete example deployment files in examples/deployment/:

  • Dockerfile - Reference implementation
  • requirements.txt - Lambda dependencies
  • run.sh - Lambda Web Adapter startup script
  • .dockerignore - Build optimization

Use the template:

# Copy the template to your project
cp -r athenaeum/examples/deployment/* my-project/

# Customize for your needs:
# - Add your index: COPY index/ /var/task/index
# - Update requirements if needed
# - Modify environment variables

Quick Start with CDK

from aws_cdk import Stack, CfnOutput, Duration
from athenaeum.infra import APIServerContainerConstruct
import os

class MyStack(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        server = APIServerContainerConstruct(
            self, "Server",
            dockerfile_path="./Dockerfile",      # Your Dockerfile
            docker_build_context=".",             # Build from current dir
            index_path=None,                      # Index baked into image
            environment={
                "OPENAI_API_KEY": os.environ["OPENAI_API_KEY"],
            },
            memory_size=2048,  # 2GB for ML workloads
            timeout=Duration.minutes(5),
        )

        CfnOutput(self, "ApiUrl", value=server.api_url)

Deploy:

export OPENAI_API_KEY=sk-...
cdk deploy

Deployment Architecture

Container Image Approach:

  • Lambda function with Docker container (up to 10GB)
  • Index baked into image at /var/task/index
  • FastAPI + Lambda Web Adapter for HTTP handling
  • API Gateway REST API with CORS
  • CloudWatch Logs for monitoring

Resource Limits:

  • Docker image: 10GB uncompressed, 10GB compressed in ECR
  • Lambda memory: 128MB - 10GB (recommend 2GB for ML)
  • Lambda storage: /tmp up to 10GB (ephemeral)
  • Timeout: Up to 15 minutes (recommend 5 minutes)

Cost Estimate: ~$1-2/month for 10K requests with 2GB memory and 10MB index

Complete guides:

  • - Deployment template and instructions
  • - Examples overview

API Server

The server provides clean HTTP endpoints for RAG operations:

Endpoints

GET /

Landing page with API documentation

Response:

{
  "service": "Athenaeum API Server",
  "version": "0.1.0",
  "endpoints": {
    "/health": "Health check",
    "/models": "List available models",
    "/search": "Search for context chunks",
    "/answer": "Single-search RAG answer",
    "/chat": "Chat with tool calling (multi-search)"
  }
}
GET /health

Health check endpoint

Response:

{"status": "ok"}
GET /models

List available retrieval models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "athenaeum-index-retrieval",
      "object": "model",
      "created": 1234567890,
      "owned_by": "athenaeum"
    }
  ]
}
POST /search

Search for context chunks matching a query

Request:

{
  "query": "What are the key concepts?",
  "limit": 5
}

Response:

{
  "object": "list",
  "data": [
    {
      "id": "doc1.txt",
      "content": "Context chunk content...",
      "metadata": {
        "path": "doc1.txt",
        "score": 0.95
      }
    }
  ],
  "model": "athenaeum-index-retrieval"
}
POST /chat

Generate an answer using RAG

Request:

{
  "messages": [
    {"role": "user", "content": "What are the main topics?"}
  ],
  "model": "athenaeum-index-retrieval"
}

Response:

{
  "id": "chat-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "athenaeum-index-retrieval",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The main topics are..."
      },
      "finish_reason": "stop"
    }
  ]
}

Project Structure

The codebase is organized by concern with clear separation between indexing, retrieval, and interface layers:

src/athenaeum/
├── utils.py              # Shared utilities (~22 lines)
│   └── setup_settings() - Configure LlamaIndex with MarkdownNodeParser
├── indexer.py            # Markdown indexing (~169 lines)
│   ├── build_index() - PUBLIC API - Build FAISS index from markdown
│   └── _validate_paths(), _build_document_reader(), etc. - Private helpers
├── retriever.py          # Query & retrieval (~109 lines)
│   ├── query_index() - PUBLIC API - Query with answer generation
│   ├── retrieve_context() - PUBLIC API - Retrieve context chunks
│   └── _load_index_storage() - Private helper
├── api_server.py         # FastAPI REST API server (~400 lines)
│   ├── GET /            - Landing page with API docs
│   ├── GET /health      - Health check
│   ├── GET /models      - List models
│   ├── POST /search     - Search for context (raw vector search)
│   ├── POST /answer     - Single-search RAG (quick answers)
│   └── POST /chat       - Multi-search with tool calling (interactive)
└── main_cli.py           # Typer CLI (~160 lines)
    ├── index            - Build markdown index
    ├── query            - Query index
    └── serve            - Launch API server

tests/
├── test_utils.py         # Test shared utilities
├── test_indexer.py       # Test indexing functions
├── test_retriever.py     # Test retrieval functions
├── test_api_server.py    # Test all API endpoints
└── test_cli.py           # Test CLI commands

Design Principles

  1. Markdown-First: Uses LlamaIndex's MarkdownNodeParser for structure-aware chunking
  2. Separation of Concerns: Indexing (indexer.py) vs Retrieval (retriever.py)
  3. Minimal Public API: Internal helpers prefixed with _
  4. Thin Interface Layers: CLI and API delegate to business logic
  5. No Duplication: Only truly shared code in utils.py

Key Dependencies

  • LlamaIndex: Vector search, MarkdownNodeParser, and RAG orchestration
  • FastAPI: HTTP API server
  • FAISS: Efficient vector storage and similarity search
  • HuggingFace Transformers: Local embeddings (all-MiniLM-L6-v2)
  • Typer: CLI framework
  • Pydantic: Data validation for API

Deployment Dependencies (optional)

  • AWS CDK: Infrastructure as code for Lambda deployment
  • Lambda Web Adapter: AWS's official adapter for running web apps on Lambda
  • python-jose: JWT/OAuth token validation

Environment Variables

# Optional: Override default LLM for answer generation
export OPENAI_MODEL="gpt-4o-mini"

# For API server (set automatically by CLI)
export ATHENAEUM_INDEX_DIR="/path/to/index"

# Optional: Custom system prompt for chat/answer endpoints
export CHAT_SYSTEM_PROMPT="You are a helpful assistant..."

Markdown Indexing

Athenaeum uses LlamaIndex's MarkdownNodeParser for structure-aware chunking that respects:

  • Heading hierarchy
  • Code blocks
  • Tables
  • Blockquotes

Default chunk settings:

  • Size: 1024 characters (~200 words)
  • Overlap: 200 characters

See for detailed guidance on optimizing markdown documents for RAG.

Contributing

We welcome contributions! See for:

  • Development setup and workflow
  • Running tests and code quality checks
  • Code style guidelines
  • Pull request process

License

MIT