athenaeum by matthewhanson - MCP Server

Athenaeum

Give your LLM a library.

A RAG (Retrieval-Augmented Generation) system built with LlamaIndex and FastAPI that provides a REST API for document retrieval and question answering.

Features

Markdown-Focused: Optimized for indexing markdown documents with structure-aware parsing
Vector Search: FAISS-backed vector search using HuggingFace embeddings
REST API: Three endpoints for different use cases (search, answer, chat with tool calling)
CLI Tools: Build indices, query, and run the server
AWS Lambda Deployment: Serverless deployment with CDK, OAuth authentication, and S3 index storage
Reusable CDK Constructs: L3 constructs for dependencies layer and API server deployment
Well-Tested: Comprehensive test suite with 12 passing tests
Clean Architecture: Logical separation between indexing, retrieval, API, and CLI layers

Installation

From PyPI

pip install athenaeum

# With deployment extras for AWS CDK
pip install athenaeum[deploy]

From Source

This project uses uv for package management and Python 3.12+.

# Clone the repository
git clone https://github.com/matthewhanson/athenaeum.git
cd athenaeum

# Install dependencies (uv will automatically create/use .venv)
uv sync

# Or install in development mode
uv pip install -e ".[dev]"

Usage

CLI Commands

All commands are available via the athenaeum CLI:

# Show version
uv run athenaeum --version

# Get help
uv run athenaeum --help

Build an Index

# Basic indexing (defaults to *.md files)
uv run athenaeum index ./your_markdown_docs --output ./index

# Custom embedding model and chunk settings
uv run athenaeum index ./docs \
  --output ./index \
  --embed-model "sentence-transformers/all-MiniLM-L6-v2" \
  --chunk-size 1024 \
  --chunk-overlap 200

# Exclude specific patterns
uv run athenaeum index ./docs \
  --output ./index \
  --exclude "**/.git/**" "**/__pycache__/**"

Query the Index

# Basic query
uv run athenaeum query "What is the main topic?" --output ./index

# With more context
uv run athenaeum query "Explain the key concepts" \
  --output ./index \
  --top-k 10 \
  --sources

Run the API Server

# Start server with default settings
uv run athenaeum serve --index ./index

# Custom host and port
uv run athenaeum serve --index ./index --host 0.0.0.0 --port 8000

# With auto-reload for development
uv run athenaeum serve --index ./index --reload

AWS Lambda Deployment

Athenaeum provides example deployment configurations for AWS Lambda using Docker container images (required for PyTorch + ML dependencies).

Two Deployment Approaches

1. Application-Specific Deployment (Recommended)

Your application has its own Dockerfile that:

Installs athenaeum as a dependency (from PyPI)
Copies your application-specific index into the container image
Configures application-specific settings

This is the recommended approach. See examples/deployment/README.md for:

Complete Dockerfile template
Step-by-step customization guide
CDK deployment example
Production best practices

Benefits:

Index baked into Docker image (no S3 download latency)
Simpler architecture (no S3 bucket needed)
Faster cold starts
Easier to version and deploy

2. Example Template Deployment

Athenaeum includes complete example deployment files in examples/deployment/:

Dockerfile - Reference implementation
requirements.txt - Lambda dependencies
run.sh - Lambda Web Adapter startup script
.dockerignore - Build optimization

Use the template:

# Copy the template to your project
cp -r athenaeum/examples/deployment/* my-project/

# Customize for your needs:
# - Add your index: COPY index/ /var/task/index
# - Update requirements if needed
# - Modify environment variables

Quick Start with CDK

from aws_cdk import Stack, CfnOutput, Duration
from athenaeum.infra import APIServerContainerConstruct
import os

class MyStack(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        server = APIServerContainerConstruct(
            self, "Server",
            dockerfile_path="./Dockerfile",      # Your Dockerfile
            docker_build_context=".",             # Build from current dir
            index_path=None,                      # Index baked into image
            environment={
                "OPENAI_API_KEY": os.environ["OPENAI_API_KEY"],
            },
            memory_size=2048,  # 2GB for ML workloads
            timeout=Duration.minutes(5),
        )

        CfnOutput(self, "ApiUrl", value=server.api_url)

Deploy:

export OPENAI_API_KEY=sk-...
cdk deploy

Deployment Architecture

Container Image Approach:

Lambda function with Docker container (up to 10GB)
Index baked into image at /var/task/index
FastAPI + Lambda Web Adapter for HTTP handling
API Gateway REST API with CORS
CloudWatch Logs for monitoring

Resource Limits:

Docker image: 10GB uncompressed, 10GB compressed in ECR
Lambda memory: 128MB - 10GB (recommend 2GB for ML)
Lambda storage: /tmp up to 10GB (ephemeral)
Timeout: Up to 15 minutes (recommend 5 minutes)

Cost Estimate: ~$1-2/month for 10K requests with 2GB memory and 10MB index

Complete guides:

- Deployment template and instructions
- Examples overview

API Server

The server provides clean HTTP endpoints for RAG operations:

Endpoints

`GET /`

Landing page with API documentation

Response:

{
  "service": "Athenaeum API Server",
  "version": "0.1.0",
  "endpoints": {
    "/health": "Health check",
    "/models": "List available models",
    "/search": "Search for context chunks",
    "/answer": "Single-search RAG answer",
    "/chat": "Chat with tool calling (multi-search)"
  }
}

`GET /health`

Health check endpoint

Response:

{"status": "ok"}

`GET /models`

List available retrieval models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "athenaeum-index-retrieval",
      "object": "model",
      "created": 1234567890,
      "owned_by": "athenaeum"
    }
  ]
}

`POST /search`

Search for context chunks matching a query

Request:

{
  "query": "What are the key concepts?",
  "limit": 5
}

Response:

{
  "object": "list",
  "data": [
    {
      "id": "doc1.txt",
      "content": "Context chunk content...",
      "metadata": {
        "path": "doc1.txt",
        "score": 0.95
      }
    }
  ],
  "model": "athenaeum-index-retrieval"
}

`POST /chat`

Generate an answer using RAG

Request:

{
  "messages": [
    {"role": "user", "content": "What are the main topics?"}
  ],
  "model": "athenaeum-index-retrieval"
}

Response:

{
  "id": "chat-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "athenaeum-index-retrieval",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The main topics are..."
      },
      "finish_reason": "stop"
    }
  ]
}

Project Structure

The codebase is organized by concern with clear separation between indexing, retrieval, and interface layers:

src/athenaeum/
├── utils.py              # Shared utilities (~22 lines)
│   └── setup_settings() - Configure LlamaIndex with MarkdownNodeParser
│
├── indexer.py            # Markdown indexing (~169 lines)
│   ├── build_index() - PUBLIC API - Build FAISS index from markdown
│   └── _validate_paths(), _build_document_reader(), etc. - Private helpers
│
├── retriever.py          # Query & retrieval (~109 lines)
│   ├── query_index() - PUBLIC API - Query with answer generation
│   ├── retrieve_context() - PUBLIC API - Retrieve context chunks
│   └── _load_index_storage() - Private helper
│
├── api_server.py         # FastAPI REST API server (~400 lines)
│   ├── GET /            - Landing page with API docs
│   ├── GET /health      - Health check
│   ├── GET /models      - List models
│   ├── POST /search     - Search for context (raw vector search)
│   ├── POST /answer     - Single-search RAG (quick answers)
│   └── POST /chat       - Multi-search with tool calling (interactive)
│
└── main_cli.py           # Typer CLI (~160 lines)
    ├── index            - Build markdown index
    ├── query            - Query index
    └── serve            - Launch API server

tests/
├── test_utils.py         # Test shared utilities
├── test_indexer.py       # Test indexing functions
├── test_retriever.py     # Test retrieval functions
├── test_api_server.py    # Test all API endpoints
└── test_cli.py           # Test CLI commands

Design Principles

Markdown-First: Uses LlamaIndex's MarkdownNodeParser for structure-aware chunking
Separation of Concerns: Indexing (indexer.py) vs Retrieval (retriever.py)
Minimal Public API: Internal helpers prefixed with _
Thin Interface Layers: CLI and API delegate to business logic
No Duplication: Only truly shared code in utils.py

Key Dependencies

LlamaIndex: Vector search, MarkdownNodeParser, and RAG orchestration
FastAPI: HTTP API server
FAISS: Efficient vector storage and similarity search
HuggingFace Transformers: Local embeddings (all-MiniLM-L6-v2)
Typer: CLI framework
Pydantic: Data validation for API

Deployment Dependencies (optional)

AWS CDK: Infrastructure as code for Lambda deployment
Lambda Web Adapter: AWS's official adapter for running web apps on Lambda
python-jose: JWT/OAuth token validation

Environment Variables

# Optional: Override default LLM for answer generation
export OPENAI_MODEL="gpt-4o-mini"

# For API server (set automatically by CLI)
export ATHENAEUM_INDEX_DIR="/path/to/index"

# Optional: Custom system prompt for chat/answer endpoints
export CHAT_SYSTEM_PROMPT="You are a helpful assistant..."

Markdown Indexing

Athenaeum uses LlamaIndex's MarkdownNodeParser for structure-aware chunking that respects:

Heading hierarchy
Code blocks
Tables
Blockquotes

Default chunk settings:

Size: 1024 characters (~200 words)
Overlap: 200 characters

See for detailed guidance on optimizing markdown documents for RAG.

Contributing

We welcome contributions! See for:

Development setup and workflow
Running tests and code quality checks
Code style guidelines
Pull request process

License

MIT