locallama-mcp
If you are the rightful owner of locallama-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
LocaLLama MCP Server optimizes costs by routing coding tasks between local LLMs and paid APIs.
LocalLama MCP Server is designed to reduce token usage and costs by dynamically deciding whether to offload a coding task to a local, less capable instruct LLM (e.g., LM Studio, Ollama) versus using a paid API. It features a cost and token monitoring module, a decision engine, API integration, fallback mechanisms, and a benchmarking system. The server integrates with OpenRouter to access a variety of free and paid models, providing a configuration interface for local instances and robust error handling strategies. It is particularly useful for developers looking to optimize their use of AI models by balancing cost and performance.
Features
- Cost & Token Monitoring Module: Monitors API usage, costs, and token prices to inform decisions.
- Decision Engine: Compares costs and quality trade-offs to decide on local vs. paid API usage.
- API Integration & Configurability: Configures endpoints for local LLMs and integrates with OpenRouter.
- Fallback & Error Handling: Implements robust logging and error handling strategies.
- Benchmarking System: Compares local LLMs against paid APIs and generates detailed reports.
Tools
clear_openrouter_tracking
Clear OpenRouter tracking data and force update
benchmark_free_models
Performance benchmarking of OpenRouter's free model
get_free_models
Search free model list from OpenRouter