disler/nano-agent
If you are the rightful owner of nano-agent and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Nano Agent MCP Server is designed to facilitate small-scale engineering tasks by leveraging multiple LLM providers, allowing for performance, speed, and cost comparisons.
Nano Agent
Watched how we used GPT-5 and Claude Code with nano-agents here.
What? A MCP Server for experimental, small scale engineering agents with multi-provider LLM support.
Why? To test and compare Agentic Capabilities of Cloud and Local LLMs across Performance, Speed, and Cost.
"It's not about a single prompt call anymore. It's about how well your agent chains together multiple tools to accomplish real engineering results on your behalf." - From our evaluation
š¬ See It In Action
Multi-Model Evaluation Flow - Watch 9 models (GPT-5, Claude Opus, Local GPT-OSS) running in parallel on the same M4 Max:
Model Comparison: GPT-5 vs Local Models - Surprising results: GPT-OSS 20B/120B running on-device with $0.00 cost:
š„ Key Findings from Our Testing
- Surprising Winners: GPT-5 Nano/Mini often outperform larger models when factoring in speed and cost
- Local Revolution: GPT-OSS 20B/120B models complete real agentic coding tasks on M4 Max (128GB RAM)
- Cost Reality Check: Claude Opus 4.1 is extraordinarily expensive - performance isn't everything
- The Trade-off Triangle: Performance vs Speed vs Cost - you don't always need the most expensive model
Installation
Quick Install (Recommended)
- Install Astral UV
- Setup Claude Code
- Setup Ollama
- Get your OpenAI API key and Anthropic API key
- Setup dotenv
cp ./.env.sample ./.env
and fill out variablescp ./apps/nano_agent_mcp_server/.env.sample ./apps/nano_agent_mcp_server/.env
and fill out variables
- Clone the repository
git clone https://github.com/disler/nano-agent
- Global Install
nano-agent
to expose it for Claude Code (any mcp client)cd nano-agent/apps/nano_agent_mcp_server
./scripts/install.sh
uv tool install -e .
- cp
.mcp.json.sample
to.mcp.json
to usenano-agent
- You should end up with a
.mcp.json
file that looks like this:
{
"mcpServers": {
"nano-agent": {
"command": "nano-agent",
"args": []
}
}
}
- You can also test without installing
nano-agent
globally by running it this directory with
{
"mcpServers": {
"nano-agent": {
"command": "uv",
"args": ["--directory", "apps/nano_agent_mcp_server", "run", "nano-agent"]
}
}
}
Now you can follow the Nano Agent Interaction section below to test out the nano agent.
Nano Agent Interaction
There are three ways to interact with the nano agent.
- Nano Agent Through the CLI (
uv run nano-cli run
)- Great for understanding agent capabilities
- Nano Agent Through Claude Code or any MCP client (
.mcp.json
or equivalent configuration)- Great for delegating work and scaling up compute in the field
- Nano Agent Through the Higher Order Prompt (HOP) and Lower Order Prompt (LOP) pattern to test and compare models across providers and models.
Through the CLI
Remember, when running directly your current directory is where ever you run uv run nano-cli run
from.
cd apps/nano_agent_mcp_server
# Test tools without API
uv run nano-cli test-tools
# Run with different models (provider auto-detected from model name)
uv run nano-cli run "List all Python files in the current directory" # gpt-5-mini (default)
uv run nano-cli run "Create a hello world script in python" --model gpt-5-nano
uv run nano-cli run "Summarize the README.md" --model gpt-5
# Test Anthropic models (requires ANTHROPIC_API_KEY)
uv run nano-cli run "Hello" --model claude-3-haiku-20240307 --provider anthropic
uv run nano-cli run "Hello" --model claude-sonnet-4-20250514 --provider anthropic
uv run nano-cli run "Hello" --model claude-opus-4-20250514 --provider anthropic
uv run nano-cli run "Hello" --model claude-opus-4-1-20250805 --provider anthropic
# Test local Ollama models (requires ollama service) (be sure to install the model first with `ollama pull gpt-oss:20b`)
uv run nano-cli run "List files" --model gpt-oss:20b --provider ollama
uv run nano-cli run "List files and count the total number of files and directories" --model gpt-oss:120b --provider ollama
# Verbose mode (shows token usage)
uv run nano-cli run "Create and edit a test file" --verbose
Through Claude Code
Call the MCP server directly
mcp nano-agent: prompt_nano_agent "Create a hello world script in python" --model gpt-5
mcp nano-agent: prompt_nano_agent "Summarize the README.md" --model claude-opus-4-1-20250805 --provider anthropic
mcp nano-agent: prompt_nano_agent "Read the first 10 lines and last 10 lines of the README.md" --verbose
etc...
Call the MCP server through a sub-agent
@agent-nano-agent-gpt-5-mini "Create a hello world script in python"
@agent-nano-agent-gpt-5 "Summarize the <file name>"
@agent-nano-agent-claude-opus-4-1 "<insert agentic prompt here>"
@agent-nano-agent-gpt-oss-20b "<insert agentic prompt here>"
@agent-nano-agent-gpt-oss-120b "<insert agentic prompt here>"
@agent-nano-agent-claude-sonnet-4 "<insert agentic prompt here>"
@agent-nano-agent-claude-3-haiku "<insert agentic prompt here>"
Through the Higher Order Prompt (HOP) and Lower Order Prompt (LOP) pattern
In Claude Code call
/perf:hop_evaluate_nano_agents .claude/commands/perf/lop_eval_1__dummy_test.md
/perf:hop_evaluate_nano_agents .claude/commands/perf/lop_eval_2__basic_read_test.md
/perf:hop_evaluate_nano_agents .claude/commands/perf/lop_eval_3__file_operations_test.md
/perf:hop_evaluate_nano_agents .claude/commands/perf/lop_eval_4__code_analysis_test.md
/perf:hop_evaluate_nano_agents .claude/commands/perf/lop_eval_5__complex_engineering_test.md
Understanding HOP/LOP: How It Works
The HOP/LOP pattern enables systematic parallel evaluation of multiple models:
- HOP (Higher Order Prompt): The orchestrator that reads test files, delegates to agents in parallel, and grades results
- LOP (Lower Order Prompt): Individual test definitions with prompts, expected outputs, and grading rubrics
- Execution Flow: HOP ā reads LOP ā calls 9 agents simultaneously ā collects results ā generates comparison tables
Example: When you run /perf:hop_evaluate_nano_agents lop_eval_3__file_operations_test.md
:
- HOP reads the test specification from the LOP file
- Extracts the prompt and list of agents to test
- Executes all agents in parallel (GPT-5, Claude, Local models)
- Each agent runs in isolation via the nano-agent MCP server
- Results are graded on Performance, Speed, and Cost
- Output shows ranked comparison with surprising results (e.g., Claude-3-haiku often beats expensive models)
This architecture ensures fair comparison by using the same OpenAI Agent SDK for all providers, creating a true apples-to-apples benchmark.
Features
- š¤ Multi-Provider Support: Seamlessly switch between OpenAI (GPT-5), Anthropic (Claude), and Ollama (local models)
- š§ File System Operations: Read, write, edit, and analyze files autonomously
- šļø Nested Agent Architecture: MCP server spawns internal agents for task execution
- šÆ Unified Interface: All providers use the same OpenAI SDK for consistency
- š¦ Experiment Ready: Decent testing, error handling, and token tracking
- š Easy Integration: Works with Claude Desktop, or as a CLI
Nano-Agent Tools
Feel free to add/remove/improve tools as you see fit.
Nano-Agent tools are stored in nano_agent_tools.py
.
Tools are:
read_file
- Read file contentslist_directory
- List directory contents (defaults to current working directory)write_file
- Create or overwrite filesget_file_info
- Get file metadata (size, dates, type)edit_file
- Edit files by replacing exact text matches
Project Structure
nano-agent/
āāā apps/ # ā ļø ALL APPLICATION CODE GOES HERE
ā āāā nano_agent_mcp_server/ # Main MCP server application
ā āāā src/ # Source code
ā ā āāā nano_agent/ # Main package
ā ā āāā modules/ # Core modules
ā ā ā āāā constants.py # Model/provider constants & defaults
ā ā ā āāā data_types.py # Pydantic models & type definitions
ā ā ā āāā files.py # File system operations
ā ā ā āāā nano_agent.py # Main agent execution logic
ā ā ā āāā nano_agent_tools.py # Internal agent tool implementations
ā ā ā āāā provider_config.py # Multi-provider configuration
ā ā ā āāā token_tracking.py # Token usage & cost tracking
ā ā ā āāā typing_fix.py # Type compatibility fixes
ā ā āāā __main__.py # MCP server entry point
ā ā āāā cli.py # CLI interface (nano-cli)
ā āāā tests/ # Test suite
ā ā āāā nano_agent/ # Unit tests
ā ā āāā isolated/ # Provider integration tests
ā āāā scripts/ # Installation & utility scripts
ā āāā pyproject.toml # Project configuration & dependencies
ā āāā uv.lock # Locked dependency versions
ā āāā .env.sample # Environment variables template
āāā .claude/ # Claude Code configuration
ā āāā agents/ # Sub-agent configurations (9 models)
ā ā āāā nano-agent-gpt-5-nano.md # OpenAI GPT-5 Nano
ā ā āāā nano-agent-gpt-5-mini.md # OpenAI GPT-5 Mini (default)
ā ā āāā nano-agent-gpt-5.md # OpenAI GPT-5
ā ā āāā nano-agent-claude-opus-4-1.md # Claude Opus 4.1
ā ā āāā nano-agent-claude-opus-4.md # Claude Opus 4
ā ā āāā nano-agent-claude-sonnet-4.md # Claude Sonnet 4
ā ā āāā nano-agent-claude-3-haiku.md # Claude 3 Haiku
ā ā āāā nano-agent-gpt-oss-20b.md # Ollama 20B model
ā ā āāā nano-agent-gpt-oss-120b.md # Ollama 120B model
ā ā āāā hello-world.md # Simple greeting agent
ā āāā commands/ # Claude Code commands
ā ā āāā perf/ # Performance evaluation commands
ā ā ā āāā hop_evaluate_nano_agents.md # Higher Order Prompt orchestrator
ā ā ā āāā lop_eval_1__dummy_test.md # Simple Q&A test
ā ā ā āāā lop_eval_2__basic_read_test.md # File reading test
ā ā ā āāā lop_eval_3__file_operations_test.md # Complex I/O test
ā ā ā āāā lop_eval_4__code_analysis_test.md # Code understanding
ā ā ā āāā lop_eval_5__complex_engineering_test.md # Full project test
ā ā āāā convert_paths_absolute.md # Convert to absolute paths
ā ā āāā convert_paths_relative.md # Convert to relative paths
ā ā āāā create_worktree.md # Git worktree management
ā ā āāā plan.md # Planning template
ā ā āāā prime.md # Codebase understanding
ā ā āāā build.md # Build commands
ā āāā hooks/ # Development hooks
ā āāā settings.json # Portable settings (relative paths)
ā āāā settings.local.json # Local settings (absolute paths)
āāā eval_results_1_dummy_test.md # Q&A test benchmark results
āāā eval_results_2_basic_read_test.md # File reading benchmark results
āāā eval_results_3_file_operations_test.md # I/O benchmark results
āāā eval_results_4_code_analysis_test.md # Code analysis benchmark results
āāā eval_results_5_complex_engineering_test.md # Project creation benchmark results
āāā images/ # Documentation images
ā āāā nano-agent.png # Project logo/diagram
āāā app_docs/ # Application-specific documentation
āāā ai_docs/ # AI/LLM documentation & guides
ā āāā python_uv_mcp_server_cookbook.md # MCP server development guide
ā āāā openai_agent_sdk_*.md # OpenAI SDK documentation
ā āāā anthropic_openai_compat.md # Anthropic compatibility guide
ā āāā ollama_openai_compat.md # Ollama compatibility guide
ā āāā new_openai_gpt_models.md # GPT-5 model specifications
āāā specs/ # Technical specifications
Development Guidelines
Prerequisites
- Python 3.12+ (required for proper typing support)
- uv package manager
- OpenAI API key (for GPT-5 model tests)
Development Setup
cd apps/nano_agent_mcp_server
uv sync --extra test # Include test dependencies
Claude Code Hook Configuration
If you're using Claude Code to work on this codebase, the project includes hooks for enhanced development experience. The hooks use relative paths by default for portability.
To activate hooks with absolute paths for your local environment: Convert relative paths to absolute paths in .claude/settings.local.json Run this command in Claude Code: This updates all hook paths to use your machine's absolute paths A backup is automatically created at .claude/settings.json.backup
/convert_paths_absolute.md
Note: The hooks are optional but provide useful features like:
- Pre/post tool use notifications
- Session tracking
- Event logging for debugging
For production use, see Installation section above.
UV Dependency Management
When working with UV and optional dependencies:
uv sync
- Installs only the main dependencies (mcp, typer, rich)uv sync --extra test
- Installs main + test dependencies (includes pytest, openai, etc.)uv sync --all-extras
- Installs main + all optional dependency groupsuv pip list
- Shows all installed packages in the virtual environment
Important: Always use --extra test
when you need to run tests, as uv sync
alone will remove test dependencies.
Configuration
- Copy the environment template:
cp .env.sample .env
- Add your OpenAI API key:
echo "OPENAI_API_KEY=sk-your-key-here" > .env
Running the Server
cd apps/nano_agent_mcp_server
uv run nano-agent --help
The server communicates via stdin/stdout using the MCP protocol.
Nano Agent Architecture
Nested Agent Hierarchy
Key Concept: This is a nested agent system with two distinct agent layers.
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā OUTER AGENT (e.g., Claude Code, any MCP client) ā
ā ⢠Communicates via MCP protocol ā
ā ⢠Sees ONE tool: prompt_nano_agent ā
ā ⢠Sends natural language prompts to nano-agent ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā MCP Protocol
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā NANO-AGENT MCP SERVER (apps/nano_agent_mcp_server) ā
ā ⢠Exposes SINGLE MCP tool: prompt_nano_agent ā
ā ⢠Receives prompts from outer agent ā
ā ⢠Spawns internal OpenAI agent to handle request ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā Creates & Manages
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā INNER AGENT (OpenAI GPT with function calling) ā
ā ⢠Created fresh for each prompt_nano_agent call ā
ā ⢠Has its OWN tools (not visible to outer agent): ā
ā - read_file: Read file contents ā
ā - list_directory: List directory contents ā
ā - write_file: Create/overwrite files ā
ā - get_file_info: Get file metadata ā
ā ⢠Runs autonomous loop (max 20 turns) ā
ā ⢠Returns final result to MCP server ā outer agent ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Validation & Testing
Unit Tests (Real API Calls)
# Run all integration tests
uv run pytest tests/ -v
# Test specific functionality
uv run pytest tests/nano_agent/modules/test_nano_agent.py::TestExecuteNanoAgent -v
# Quick validation
uv run pytest -k "test_execute_nano_agent_success" -v
CLI Validation
# Validate tools work (no API needed)
uv run nano-cli test-tools
# Quick agent test
export OPENAI_API_KEY=sk-your-key
uv run nano-cli run "What is 2+2?" # Uses DEFAULT_MODEL
Multi-Provider Support
The nano agent supports multiple LLM providers through a unified interface using the OpenAI SDK. All providers are accessed through OpenAI-compatible endpoints, providing a consistent API.
Available Providers & Models
Feel free to add/remove providers and models as you see fit.
OpenAI (Default)
- Models:
gpt-5
,gpt-5-mini
(default),gpt-5-nano
,gpt-4o
- Requirements:
OPENAI_API_KEY
environment variable - Special Features:
- GPT-5 models use
max_completion_tokens
instead ofmax_tokens
- GPT-5 models only support temperature=1
- Extended context windows (400K tokens)
- GPT-5 models use
Anthropic
- Models:
claude-opus-4-1-20250805
,claude-opus-4-20250514
,claude-sonnet-4-20250514
,claude-3-haiku-20240307
- Requirements:
ANTHROPIC_API_KEY
environment variable - Implementation: Uses Anthropic's OpenAI-compatible endpoint
- Base URL:
https://api.anthropic.com/v1/
Ollama (Local Models)
- Models:
gpt-oss:20b
,gpt-oss:120b
, or any model you've pulled locally - Requirements: Ollama service running locally
- Implementation: Uses Ollama's OpenAI-compatible API
- Base URL:
http://localhost:11434/v1
Using Different Providers
CLI Usage
# OpenAI (default)
uv run nano-cli run "Create a hello world script"
# Use specific OpenAI model
uv run nano-cli run "Analyze this code" --model gpt-5 --provider openai
# Anthropic
uv run nano-cli run "Write a test file" --model claude-3-haiku-20240307 --provider anthropic
# Ollama (local)
uv run nano-cli run "List files" --model gpt-oss:20b --provider ollama
Multi-Model Evaluation System
The nano-agent includes a sophisticated multi-layer evaluation system for comparing LLM performance across different providers and models. This creates a level playing field for benchmarking by using the same execution environment (OpenAI Agent SDK) regardless of the underlying provider.
"Don't trust any individual benchmark. You need to crack open the hood of all these models and say, where is the true value?" - Engineering is all about trade-offs.
šÆ The Bread and Butter: HOP/LOP Pattern
The evaluation system's core innovation is the HOP/LOP (Higher Order Prompt / Lower Order Prompt) pattern, which creates a hierarchical orchestration system for parallel model testing:
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā 1. HIGHER ORDER PROMPT (HOP) ā
ā File: .claude/commands/perf/hop_evaluate_nano_agents.md ā
ā ⢠Orchestrates entire evaluation process ā
ā ⢠Accepts test case files as $ARGUMENTS ā
ā ⢠Formats and grades results ā
ā ⢠Generates performance comparison tables ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā Reads & Executes
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā 2. LOWER ORDER PROMPT (LOP) ā
ā Files: .claude/commands/perf/lop_eval_*.md ā
ā ⢠Defines test cases (prompts to evaluate) ā
ā ⢠Lists agents to test (@agent-nano-agent-*) ā
ā ⢠Specifies expected outputs ā
ā ⢠Provides grading rubrics ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā @agent References
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā 3. CLAUDE CODE SUB-AGENTS ā
ā Files: .claude/agents/nano-agent-*.md ā
ā ⢠Individual agent configurations ā
ā ⢠Each specifies model + provider combination ā
ā ⢠Color-coded by model family: ā
ā - green: GPT-5 series (nano, mini, standard) ā
ā - blue: GPT-OSS series (20b, 120b) ā
ā - purple: Claude 4 Opus models ā
ā - orange: Claude 4 Sonnet & Claude 3 Haiku ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
ā Calls MCP Server
ā¼
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā 4. NANO-AGENT MCP SERVER ā
ā Function: prompt_nano_agent(prompt, model, provider) ā
ā ⢠Creates isolated agent instance per request ā
ā ⢠Uses OpenAI Agent SDK for ALL providers ā
ā ⢠Ensures consistent execution environment ā
ā ⢠Returns structured results with metrics ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Why This Architecture?
- Fair Comparison: All models use the same OpenAI Agent SDK, eliminating implementation differences
- Parallel Execution: Agents run simultaneously, reducing temporal variations
- Structured Metrics: Consistent tracking of time, tokens, and costs across all providers
- Extensibility: Easy to add new models, providers, or test cases
- Visual Hierarchy: Color-coded agents make results easy to scan in Claude Code
- Reproducibility: Same prompts and execution environment ensure consistent benchmarks
License
MIT
Master AI Coding
And prepare for Agentic Engineering
Learn to code with AI with foundational Principles of AI Coding
Follow the IndyDevDan youtube channel for more AI coding tips and tricks.