MathewTomberlin/SwarmUI_MCP_Agent
If you are the rightful owner of SwarmUI_MCP_Agent and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The SwarmUI MCP Server is a prototype server that integrates with a local LangChain Ollama agent to generate images from text prompts using the SwarmUI API.
generate_image
Posts to the /generate-image endpoint to generate images from text prompts.
SwarmUI_MCP_Agent
A simple prototype SwarmUI MCP server with accompanying local LangChain Ollama agent. The prototype only exposes the SwarmUI /GenerateText2Image endpoint. It uses Python, Node.js, Typescript, Langchain, @ModelContextProtocol, Axios, Express, and Pydantic (though Pydantic is mostly unused in this implementation).
SwarmUI MCP Server
The SwarmUI MCP Server prototype is declared in server.ts.
The /GenerateText2Image and /GetNewSession SwarmUI API endpoints are exposed via generateImage() and getSessionId() methods, which are described in the (MCP) Server capabilities config when instantiating the Server. The Server capabilities config is used by the agent to know about the available tools' input and output schemas and descriptions. The text descriptions for these tools are used by the llm to understand the tool. When the server.js Node server is started, the SwarmUIServer is initialized and an Express server is created with a POST endpoint at '/generate-image' that calls SwarmUIServer.generateImage, then the Express server is started. An Axios instance is created to make HTTP requests to the local SwarmUI API and stored in the SwarmUIServer class.
DEV NOTE: You will likely need to change the port used by SwarmUI as well as the model hardcoded for image generation.
GenerateImage StructuredTool
A generate_image StructuredTool is declared in tools.py that POSTS to the /generate-image endpoint of the Express server started by the SwarmUIServer. The description of the StructuredTool and its docstring are used by the llm to know about the tool. Theoretically, Pydantic models should assert tool input, but the current setup doesn't work for multiple inputs yet.
LangChain Ollama Agent
A local Ollama agent (currently a dolphin-mistral model) is instantiated with tool-calling instructions, a max_iterations of 1, and AgentType of ZERO_SHOT_REACT_DESCRIPTION, with the generate_tool StructuredTool. User input is taken for the image prompt and then the agent is invoked with the input. The agent will use the generate_image Action with the supplied prompt as Action Input, calling the local SwarmUI API /GenerateText2Image endpoint with the prompt and other hardcoded image generation settings.
Starting the SwarmUI MCP Server
From the /SwarmUI_MCP_Agent/src directory, run: tsc server.ts && node server.js
Running the Agent
With the SwarmUIMCPServer running and SwarmUI itself running, from the /SwarmUI_MCP_Agent directory, run: python run.py
Viewing Generated Images
Generated images are available within the SwarmUI Image History or directly within the SwarmUI /Ouput directory
Quickstart
1. Clone Repository
git clone
2. Install Dependencies (Script)
Run setup.sh
2B. Install Dependencies (Manual)
npm install pip install -r requirements.txt
3. Start the API server
tsc server.ts && node server.js
5. Start the UI
streamlit run run.py