Davz33/Cursor-Local-llm-MCP-proxy
If you are the rightful owner of Cursor-Local-llm-MCP-proxy and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
This server implements speculative decoding with a local LLM, prioritizing local model responses and falling back to other models when necessary.
Enhanced Local LLM Proxy MCP Server with LlamaIndex.TS
A powerful TypeScript-based MCP (Model Context Protocol) server that enhances local LLM capabilities with agentic behavior, RAG (Retrieval-Augmented Generation), and tool integration using LlamaIndex.TS.
This server is compatible with Cursor and similar IDEs supporting MCP client definitions similar to Cursor's mcp.json
. The server's goal is to take load (and budgeting) off the more powerful cloud-based LLMs in Cursor (and similar) and use them for either validation purposes against the locally prompted LLMs, and/or fallback system.
The project's goal and current minimal functionalities also aims at equipping your LM-studio local agent with agentic tools like RAG, memory graphs, math calculations and more, such that the locally-provided answer's accuracy is futher increased, decreasing the likelihood of interaction with the more expensive cloud-native models.
๐ Architecture & Workflow
graph TB
%% User and IDE Layer
User[๐ค User] --> Cursor[๐ฅ๏ธ Cursor IDE]
Cursor --> MCPClient[MCP Client]
%% MCP Communication Layer
MCPClient --> MCPServer[๐ง Local LLM Proxy MCP Server]
MCPServer --> AgenticService[๐ง Agentic Service]
MCPServer --> RAGService[๐ RAG Service]
%% Agentic Capabilities
AgenticService --> MathTool[โ Math Tool]
AgenticService --> FileSystemTool[๐ File System Tool]
AgenticService --> RAGTool[๐ RAG Tool]
AgenticService --> SonarTool[๐ Sonar API Tool]
%% RAG Persistence Layer
RAGService --> VectorIndex[๐๏ธ Vector Store Index]
RAGService --> DocumentStorage[๐พ Document Storage]
DocumentStorage --> DiskStorage[(๐ฟ Persistent Storage)]
%% LM Studio Integration
AgenticService --> LLMConfig[โ๏ธ LLM Configuration]
RAGService --> LLMConfig
LLMConfig --> LMStudio[๐ค LM Studio Server]
LMStudio --> LocalModel[๐ง Local LLM Model]
%% Embedding Integration
RAGService --> EmbeddingModel[๐ค HuggingFace Embeddings]
%% External API Integration
SonarTool --> SonarAPI[๐ Perplexity Sonar API]
SonarAPI --> RealTimeData[๐ก Real-time Information]
%% Data Flow
User -->|"1. Query/Request"| Cursor
Cursor -->|"2. MCP Protocol"| MCPClient
MCPClient -->|"3. Tool Call"| MCPServer
MCPServer -->|"4a. Agentic Processing"| AgenticService
MCPServer -->|"4b. RAG Query"| RAGService
AgenticService -->|"5a. Tool Selection"| MathTool
AgenticService -->|"5b. Tool Selection"| FileSystemTool
AgenticService -->|"5c. Tool Selection"| RAGTool
AgenticService -->|"5d. Web Search Query"| SonarTool
RAGService -->|"6a. Document Indexing"| VectorIndex
RAGService -->|"6b. Auto-Save"| DocumentStorage
DocumentStorage -->|"7. Persist to Disk"| DiskStorage
VectorIndex -->|"8. Query Processing"| EmbeddingModel
EmbeddingModel -->|"9. Vector Search"| VectorIndex
SonarTool -->|"10. API Request"| SonarAPI
SonarAPI -->|"11. Real-time Search"| RealTimeData
RealTimeData -->|"12. Search Results"| SonarAPI
SonarAPI -->|"13. API Response"| SonarTool
AgenticService -->|"14. LLM Generation"| LLMConfig
RAGService -->|"15. LLM Generation"| LLMConfig
LLMConfig -->|"16. API Call"| LMStudio
LMStudio -->|"17. Model Inference"| LocalModel
LocalModel -->|"18. Response"| LMStudio
LMStudio -->|"19. API Response"| LLMConfig
LLMConfig -->|"20. Processed Response"| AgenticService
LLMConfig -->|"21. Processed Response"| RAGService
AgenticService -->|"22. Final Response"| MCPServer
RAGService -->|"23. Final Response"| MCPServer
SonarTool -->|"24. Final Response"| MCPServer
MCPServer -->|"25. MCP Response"| MCPClient
MCPClient -->|"26. Display Result"| Cursor
Cursor -->|"27. Show to User"| User
%% Persistence Flow
DiskStorage -.->|"28. Load on Startup"| DocumentStorage
DocumentStorage -.->|"29. Recreate Index"| VectorIndex
%% Styling
classDef userLayer fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef mcpLayer fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef serviceLayer fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
classDef toolLayer fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef storageLayer fill:#fce4ec,stroke:#880e4f,stroke-width:2px
classDef llmLayer fill:#e0f2f1,stroke:#004d40,stroke-width:2px
class User,Cursor userLayer
class MCPClient,MCPServer mcpLayer
class AgenticService,RAGService serviceLayer
class MathTool,FileSystemTool,RAGTool,SonarTool toolLayer
class VectorIndex,DocumentStorage,DiskStorage storageLayer
class LLMConfig,LMStudio,LocalModel,EmbeddingModel llmLayer
class SonarAPI,RealTimeData toolLayer
๐ Features
๐ง Agentic Capabilities
- Math Tool: Performs basic mathematical operations (add, subtract, multiply, divide)
- File System Tool: Read, write, and list files and directories
- RAG System: Document indexing and querying with natural language
๐ RAG (Retrieval-Augmented Generation)
- Index documents from files or direct text input
- Query indexed documents with natural language
- Source attribution for responses
- Persistent document storage across Cursor restarts
- Automatic document loading on server startup
- File-based persistence with configurable storage path
๐ฏ MCP Orchestrator
- Tool Discovery: Automatically discovers and connects to other MCP servers
- Intelligent Tool Selection: Uses rule-based logic to select appropriate tools for queries
- Web Search Priority: Automatically routes web search queries to Sonar API
- Dual Rule System: Separates general functionality rules from personal preferences
- Fallback Communication: Provides targeted error reporting for Cursor fallback
- Context-Aware Processing: Gathers context from multiple MCP servers before tool execution
- Validation & Quality Control: Ensures response quality and accuracy
๐ Available MCP Tools
generate_text_v2
- Generate text with agentic capabilitieschat_completion
- Chat completion with tool integrationrag_query
- Query indexed documents using RAGindex_document
- Index documents for RAG queriessave_rag_storage
- Manually save RAG documents to diskclear_rag_storage
- Clear all persistent RAG storagerag_storage_status
- Get RAG storage status and persistence infosonar_query
- Real-time information gathering with Perplexity Sonar APIdelegate_to_local_llm
- Delegate requests to the local LLM orchestratororchestrator_status
- Get orchestrator status and connected toolslist_orchestrated_tools
- List all available orchestrated toolscall_orchestrated_tool
- Call specific orchestrated tools directly
๐ LM Studio Integration
- OpenAI-compatible API integration
- Support for Quen3 and other local models
- Configurable base URL and model selection
- Environment variable configuration
๐ฏ MCP Orchestrator
- Tool Discovery: Automatically discovers and connects to other MCP servers
- Intelligent Routing: Routes queries to appropriate tools based on content analysis
- Web Search Integration: Prioritizes real-time information gathering with Sonar API
- Fallback System: Graceful fallback to Cursor when needed
- Rules-Based Selection: Configurable rules for tool selection and usage
๐ Intelligent Validation & Fallback System
- Heuristic-based validation for response quality assessment
- Automatic fallback to cloud LLMs when local responses are inadequate
- Smart tool selection with enhanced sequential thinking integration
- Real-time validation checking for errors, inability expressions, and response completeness
- Configurable thresholds for confidence scoring and fallback triggers
๐ Configuration & Rules
- Cursor Delegation Rules: See for IDE integration guidelines
- Delegation Implementation Guide: See for complete system setup
- MCP Orchestrator Rules: See for server-side rule configuration
- Customizable rule engine for tool usage policies and validation behavior
- Environment-based configuration with sensible defaults
๐ฆ Installation
Prerequisites
- Node.js 18+
- LM Studio installed and running
- Git (for cloning the repository)
Setup
- Clone the repository:
git clone https://github.com/Davz33/Cursor-Local-llm-MCP-proxy
cd local-llm-proxy
- Install dependencies:
npm install
- Build the TypeScript project:
npm run build
โ ๏ธ Important: You must build the project before using it with MCP clients like Cursor.
๐ Usage
1. Start LM Studio
- Download and install LM Studio
- Load your preferred model (e.g., Qwen3, Llama, etc.)
- Start the server on
http://localhost:1234/v1
2. Configure Environment (Optional)
export LM_STUDIO_BASE_URL="http://localhost:1234/v1"
export LM_STUDIO_MODEL="qwen3-coder-30b-a3b-instruct"
3. Configure MCP Client (Cursor/IDE)
Add the following configuration to your MCP client (e.g., Cursor's mcp.json
):
{
"mcpServers": {
"local-llm-proxy": {
"command": "node",
"args": ["/path/to/your/local-llm-proxy/dist/index.js"],
"env": {
"LM_STUDIO_BASE_URL": "http://localhost:1234/v1",
"LM_STUDIO_MODEL": "qwen3-coder-30b-a3b-instruct"
}
}
}
}
Replace /path/to/your/local-llm-proxy
with the actual path to your cloned repository.
4. Start the MCP Server
Production:
npm start
Development (with hot reload):
npm run dev
Build TypeScript:
npm run build
Note: The MCP server runs automatically when called by your MCP client (like Cursor). You don't need to manually start it in most cases.
๐ง Configuration
The server can be configured using environment variables:
LM_STUDIO_BASE_URL
: LM Studio API endpoint (default:http://localhost:1234/v1
)LM_STUDIO_MODEL
: Model name in LM Studio (default:qwen3
)
Real-time Information with Sonar API
For real-time information gathering capabilities, the server includes Perplexity Sonar API integration. See for detailed setup instructions including:
- API key configuration
- Environment variable setup
- Usage examples
- Cost considerations
Orchestration Rules System
The MCP orchestrator uses a dual-rule system to separate general functionality from personal preferences:
General Rules (Repository-tracked)
- Location:
src/orchestrator/general-orchestration-rules.txt
- Purpose: Core MCP server functionality and tool orchestration logic
- Content: Tool selection, error handling, web search patterns, memory operations, thinking operations
- Maintenance: Version-controlled and shared across all users
Personal Rules (User-specific)
- Location:
$HOME/local-llm-proxy/personal-orchestration-rules.txt
- Purpose: Subjective preferences and personal workflow rules
- Content: Git preferences, coding style, development workflow, personal context preferences
- Maintenance: User-specific and customizable
Environment Variables
MCP_PERSONAL_RULES_PATH
: Custom path for personal rules (optional)- Default personal rules location:
$HOME/local-llm-proxy/personal-orchestration-rules.txt
Rule Combination
The orchestrator automatically combines both rule sets:
- Loads general rules from the repository
- Loads personal rules from user directory (if exists)
- Combines them for comprehensive orchestration behavior
- Falls back gracefully if either file is missing
๐ API Examples
Basic Text Generation
{
"name": "generate_text_v2",
"arguments": {
"prompt": "Explain quantum computing in simple terms",
"use_agentic": true,
"max_tokens": 500,
"temperature": 0.7
}
}
Chat Completion
{
"name": "chat_completion",
"arguments": {
"messages": [
{"role": "user", "content": "Can you help me calculate the area of a circle with radius 5?"}
],
"use_agentic": true,
"max_tokens": 300
}
}
RAG Document Indexing
{
"name": "index_document",
"arguments": {
"file_path": "/path/to/document.txt"
}
}
Or index text content directly:
{
"name": "index_document",
"arguments": {
"text_content": "Your text content to index for RAG queries"
}
}
RAG Query
{
"name": "rag_query",
"arguments": {
"query": "What are the main concepts discussed?",
"max_tokens": 300
}
}
RAG Storage Management
{
"name": "save_rag_storage",
"arguments": {}
}
{
"name": "rag_storage_status",
"arguments": {}
}
{
"name": "clear_rag_storage",
"arguments": {}
}
๐งช Testing
The server includes comprehensive testing capabilities:
# Build the project first
npm run build
# Test basic functionality
npm start
# Test with validation enabled
npm run start:with-validation
# Development mode with hot reload
npm run dev
Testing MCP Tools
Once configured in your MCP client, you can test the tools:
- Generate Text: Use
mcp_local-llm-proxy_generate_text_v2
in your IDE - Chat Completion: Use
mcp_local-llm-proxy_chat_completion
- RAG Query: Use
mcp_local-llm-proxy_rag_query
- Index Document: Use
mcp_local-llm-proxy_index_document
๐ Architecture
Modular Structure
src/
โโโ config/
โ โโโ llm-config.ts # LLM and embedding model configuration
โโโ rag/
โ โโโ rag-service.ts # RAG functionality and document indexing
โโโ agentic/
โ โโโ agentic-service.ts # Agentic LLM interactions with tools
โโโ tools/
โ โโโ agentic-tools.ts # Tool definitions (math, filesystem, RAG)
โโโ mcp/
โโโ mcp-server.ts # MCP server implementation
Core Components
- MCP Server: TypeScript-based Model Context Protocol communication
- LlamaIndex.TS Integration: Modern v0.11.28 with Settings API
- LM Studio Adapter: OpenAI-compatible API integration
- RAG Service: Document indexing and querying with HuggingFace embeddings
- Agentic Service: Tool-integrated LLM interactions
- Modular Tools: Extensible tool architecture with full type safety
Tool Architecture
interface Tool {
name: string;
description: string;
execute: (params: any, context?: ToolExecutionContext) => Promise<string>;
}
const tool: Tool = {
name: "tool_name",
description: "Tool description",
execute: async (params, context) => {
// Tool implementation with full type safety
}
};
๐ RAG Workflow
- Document Indexing: Documents are processed and stored in a vector index
- Automatic Persistence: Documents are automatically saved to disk after indexing
- Query Processing: Natural language queries are converted to vector searches
- Context Retrieval: Relevant document chunks are retrieved
- Response Generation: LLM generates responses using retrieved context
- Source Attribution: Responses include source document information
- Cross-Session Persistence: Documents persist across Cursor restarts and server restarts
๐จ Troubleshooting
Common Issues
- Connection Refused: Ensure LM Studio server is running on
http://localhost:1234/v1
- Model Not Found: Verify model name in LM Studio matches your configuration
- Port Conflicts: Change LM Studio port if needed and update configuration
- Memory Issues: Reduce model size or increase system memory
- Tool Not Found: Ensure you've built the project with
npm run build
- MCP Client Issues: Restart your MCP client (Cursor) after configuration changes
Debug Mode
DEBUG=* npm start
MCP Configuration Issues
- Tool Grayed Out: This usually indicates a caching issue. Try:
- Disable and re-enable the MCP integration in Cursor
- Restart Cursor completely
- Check that the path in
mcp.json
points todist/index.js
๐ Performance Tips
- Use quantized models for better performance
- Adjust
max_tokens
based on your needs - Enable streaming for long responses
- Use RAG for document-heavy queries
- Monitor memory usage with large documents
๐ฎ Future Enhancements
- Multi-agent workflows with handoffs
- Advanced streaming with real-time updates
- Persistent document storage across sessions โ
- Custom tool development framework
- Performance monitoring and metrics
- Integration with more LLM providers
๐ License
GPL 3.0 License - see COPYING file for details
๐ค Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
๐ Support
For support and questions:
- Create an issue in the repository
- Check the troubleshooting section
- Review the LM Studio configuration guide
๐ Quick Start Guide
-
Clone and Setup:
git clone https://github.com/Davz33/Cursor-Local-llm-MCP-proxy cd local-llm-proxy npm install npm run build
-
Start LM Studio:
- Download and install LM Studio
- Load a model (e.g., Qwen3, Llama)
- Start server on
http://localhost:1234/v1
-
Configure Cursor:
- Add the MCP configuration to your
mcp.json
- Update the path to point to your
dist/index.js
- Restart Cursor
- Add the MCP configuration to your
-
Test:
- Use
mcp_local-llm-proxy_generate_text_v2
in Cursor - Try
mcp_local-llm-proxy_chat_completion
with agentic capabilities - Index documents with
mcp_local-llm-proxy_index_document
- Query with
mcp_local-llm-proxy_rag_query
- Test persistence: Index documents, restart Cursor, and query again - your documents will persist!
- Use