PA_MCP_Web_Based_Tool_Automation_Server_V1 by adrianosancarneiro - MCP Server

PA_MCP_Web_Based_Tool_Automation_Server_V1

Model Context Protocol (MCP) web-based to creator and executor server

Since I am going to use the "headless client/scraping" to setup some workflow or in other words step by step on how the agent or some tool should access a web-based tool that I normally use to perform some of the repetitive and simple actions I need to do daily (see other chats history and documentation). The idea is that

- I will create two main flexible function/MCP tools called "web-based tool creator"(WBTC) and "web-based tool executor"(WBTE)

- The WBTE will receive the web-based tool name or id to be executed, it will also receive a list or dictionary of key - value pairs created dynamically by the LLM agent with the parameter to get the web-based tool to be executed

- for this to work well we will first create eventually use an MCP tool called "web-based tool creator"(WBTC)

- WBTC will be also called by the llm agent, it will interact with the user to get a "web-based tool"(WBT) setup/created and tested step by step.

- The WBTC + LLM agent should guide the user towards the process of creating and testing step by step a functional and effective WBT registry. doing the following

- The LLM agent when prompted to register a new WBT should start the process by understanding what are the needed parameters (in the correct order) to get a new WBT created by the WBTC which will be a MCP server tool.

- Then the LLM agent will be the interface between the user and WBTC tools to get the WBT created and tested in each step of its creation

for example:

- User -> LLM agent: I need to create a new web-based tool or WBT

- LLM agent -> WBTC: What are the current needed steps/parameters to create an WBT? (This is probably available as part of the MCP server native features)

- WBTC -> LLM Agent: Here is a list of needed steps/parameter and their testing points

1. the web-based tool URL/or login page

2. Testing point: Check if the URL is accessible. If it is not inform the user and ask for a valid url

3. does it need authentication (yes/no)

4. Does the web-based tool need authentication?

4.1 ask the user for the login and password html "input id", tag_id, placeholder(text that is normally inside, above, or in the left side of the password and user text box), or any other unique identifier(html or just UI based) that the WBTC can use to identify and set the login and password

4.2 Testing point: Check if the password and user text box are accessible. If it is not inform the user and ask for a valid password and user identifier (this password and user text box id and testing point should be actually two steps instead of one, specially because some time platforms place user and password in different pages as part of an authentication workflow process)

4.3 ask the user for username and password (this should be totally encrypted end to end communication and and saved in the database encrypted as well)

4.4 Testing point: Check if the password and user provided is allowing the WBTC to authenticate and access the system. If not inform the user about the error and ask the user to try again with correct credentials

4.5 Does the web-based tool authentication require a 2 steps auth factor?

4.5.1 If yes inform the user what 2 steps auth factor info you need to authenticate

4.5.2 User sends the pin number or approves the access request using the mobile app or email confirmation.

4.5.3 before using the credentials the LLM agent + WBTC make sure that the "Keep me logged in", or "remember me" or any other descriptive text in the page that will keep the LLM agent + WBTC and of course later the "web-based tool executor"(WBTE) logged in so the user don't need to keep authenticating and or using the two factor authentication all the time.

4.5.4 if the pin number is needed. The LLM agent + WBTC test the pin and inform the user the results (accessed or not access and why not)

4.5.5 If the access result is not accessed the LLM agent shows the webpage it has been used by the LLM agent + WBTC to get the WBT creation done. IMPORTANT NOTE: I know the whole "headless client/scraping" but since this is going to be all running locally (to increase security) if there is a way to have this whole process done in my ubuntu machine and I can actually get the LLM agent + WBTC work toward those steps using my google chrome browser (where my credentials are already saved) and then I can manually intervene in the authentication process by for example adding the login and password, my two factor authentication, and or even doing the “CAPTCHA” validation since “CAPTCHA” validation are easy for humans to solve, but hard for “bots”. If that is the case and I would just check the "Keep me logged in"(or similar) checkbox and then I can use that login section for lets say next month or so or for how long the website allows me to do so.

If that is even an option, I'd start with the LLM agent + WBTC back and forth conversation/setup described above. If the WBT has a simple authentication that is straight forward enough that does not need my interaction I'd just use my telegram bot on the phone for example to get it done. Otherwise if the LLM agent + WBTC struggle to get it done that way it should let the user(me) knows that for this WBT I have to login into my ubuntu telegram chat and interact in real time with the google chrome browser and work in collaboration with LLM agent + WBTC to get it setup and login. The perfect situation is that on those cases I just need to login once and just use through WBTE the configured WBT as long as the website allows me to keep that connection. But even if the web ask me too login everytime and I need to watch on my ubuntu the LLM agent + WBTE perform all the steps in the WBT configuration I think worth the time I'll save by not needing to perform those action manually

5 If the authentication process was successfully completed

5.0 If the web-based tool automation requires authentication and the authentication process was successfully completed the LLM agent + WBTC asks the user for the WBT_type. Meaning is it a form that the WBTE later will need to filled and submit clicking in a button (WBT_type = submitting_form) or is it a page or a set of pages from where the WBTE later will need to scrape and get some information/data from the page and send back to the MCP client (WBT_type = getting_data)

5.1 LLM agent + WBTC asks the user for the menu link item(s) that should be clicked to get the LLM agent + WBTC to the correct action needed page(ANP). If more than one step is needed the LLM agent + WBTC keep asking if there is any other step to get the correct action needed page.

IMPORTANT NOTE: At this point if the user decided to provide a direct url the link the LLM agent + WBTC direct to the action needed page the LLM agent + WBTC should test that link and if that works just move on to the identifying fields in the form (Item 5.2).

5.2 The LLM agent + WBTE asks what are the fields in this action needed page(ANP) that should be filled up and/or selected to get the action done in this page. Meaning what fields need to be selected as part of the process to get the action done in this page, eg: text box(ex) and or checkbox(es), and or dropdown list(s), and or multi-selection checkboxes.

5.2.1 The LLM agent + WBTC will now start the process of creating the list or dictionary of key - value pairs that later LLM agent + WBTE will use to dynamically create the set of parameters needed to get the web-based tool automation to be executed. For example

action needed page(ANP)_name
action needed page(ANP)_description(optional)
action needed page(ANP)_url: Url or steps to get there
- Field1_Name: text
- Field1_Description(optional): text
- Field1_Identifier: Textbox1_id or Textbox1_label or Textbox1_input_id, Textbox1_tag
- Field1_Type: text_box or checkbox or dropdown or multi_selection_checkboxes
- Field1_Action:
  - text (Textbox_Text_To_Be_Added):”text”
    - if the Field1_Type=text_box just add the text into the text box input
  - state (checkbox state option_To_Be_Selected”: “checked” or “unchecked”
    - if the Field1_Type=checkbox
      - check the checkbox if the text is “checked”
      - uncheck the checkbox if the text is “unchecked”
  - selectedOption (dropdown_Option_To_Be_Selected): “text”
    - if the Field1_Type=dropdown just search the “dropdownTextNeededToBeSelected” text into the dropdown possible selection options and select the correct one.
  - selectedOptions (multi_selection_checkboxes_Option_To_Be_Selected): [ “MultiSelectionCheckBoxesTextOption1”,
    “MultiSelectionCheckBoxesTextOption2”,
    “MultiSelectionCheckBoxesTextOption3” ]
    - if the Field1_Type=multi_selection_checkboxes iterate through the selectedOptions array and check or select each corresponding box or options.

5.2.2 Testing point: Check each and every one of those field identifiers(Field1_Identifier) and their respective Type (Field1_Type) and action (Field1_Action) sent by the user are found and fillable or selectable in the current page. If it is not, inform the user and ask for a valid field identifier along with the correct type and action parameters.

5.3 When the current “action needed page” is correctly filled up and ready to be submitted the LLM agent + WBTC ask the user to identify the “submit action button” in this “action needed page”

5.3.1 LLM agent + WBTC ask the user for the “submit action button” html "input id", tag_id, placeholder(text that is normally inside of the “submit action button”), or any other unique identifier(html or just UI based) that the WBTC can use to identify and press the “submit action button”

5.3.2 Testing point: Check if “submit action button” went through. If the form sends back any validation error message in the time of “web-based tool creation” something needs to be fixed in the step 5.2.1

(5.2.1 The LLM agent + WBTE will now start the process of creating the list or dictionary of key - value pairs that later LLM agent + WBTC will use to dynamically create the set of parameters needed to get the web-based tool to be executed)
If the form sends back a confirmation message or redirects the LLM agent + WBTE to a new page the LLM agent + WBTE informs the user about the “submit action button” press action results and asks if that is the end of the WBT automation workflow.
- If the user says yes the LLM agent + WBTE close the browser (all the configurations are saved and updated as needed in execution time, meaning as the LLM agent interact with the user)
- If the user says no the LLM agent + WBTE move back to the step/item 5.1 and continue the looping into the user says that it is the end of the WBT automation workflow creation. Each interaction a new action needed page(ANP) along with its fields names, identifiers, types and actions should be created. And naturally the “submit action button” press action
  - (5.1 LLM agent + WBTC asks the user for the menu link item(s) that should be clicked to get the LLM agent + WBTC to the correct action needed page. If more than one step is needed the LLM agent + WBTC keep asking if there is any other step to get the correct action needed page.)

5.4 When a "web-based tool"(WBT) automation process start to be created the "web-based tool creator"(WBTC) + LLM agent should always required from the user for a description of the need for "web-based tool"(WBT) so that it can be saved or return to the orchestrator MCP client that can use that description to make decision if the current problem it is trying to resolve can be resolved by triggering this new added "web-based tool"(WBT). The list of action needed pages (ANP) along with their name, description should also be available to the orchestrator MCP client. In addition for each one of the action needed pages (ANP) their respective list of Field Names and Description should also be available to orchestrator MCP clients to see and be used to make decisions about when and how to use the newly added web-based tool.

IMPORTANT CLARIFICATION
Even though I have mentioned orchestrator MCP client, keep in mind that this is going to be a Model Context Protocol (MCP) web-based to creator and executor server that said not only the orchestrator MCP client should have access to those decisions helper data mentioned in the item 5.4, but any other MCP client that consume this “Model Context Protocol (MCP) web-based to creator and executor server”
This “Model Context Protocol (MCP) web-based to creator and executor server” should have a way for the MCP client to get all the WBTs created with all the decision making helper data along with their sharable_unique_identifier_ids in the database because from time to time this information can change the MCP client can get this information saved in its own database(maybe in a vector db for semantic search) and from time to time get a refreshed decision making helper data from all the WBTs to either save new WBTs or update the decision making helper data from the existing ones.

5.5 When a "web-based tool"(WBT) automation process starts to be created the "web-based tool creator"(WBTC) + LLM agent should always require from the user an wbt_automation_mode for this WBT

wbt_automation_mode = supervised
- need human approval before executing the WBT
wbt_automation_mode = autonomous
- The LLM agent is allowed to execute the WBT without human authorization. For example some extremely simple WBT automations such as getting information/data from a website(WBT_type = getting_data) offer low risk and do not need approval.

IMPORTANT CLARIFICATION
The difference between WBT_type = submitting_form and WBT_type = getting_data is well defined in the item 5.0. But the current workflow/steps are focused on WBT_type = submitting_form. Make to change the workflow/steps to better accommodate the WBT_type = getting_data type that may or may not need authentication but it is simpler then the submitting_form one because it does not need to submit(press button) or fill up any form field, but make sure that

The request of field identifier id of the field that should have the data collected exist also
The Testing point sections to make sure those field and their content actually exist and is possible to get them from the website also
Make sure the loop idea that you will be moving from one site(url) or menu item click on the website long with the looping part of the process to create multiple action needed page(ANP) check the item 5.2.1 for reference using similar concepts naming conventions and pattern as needed but focusing on the WBT_type = getting_data type as part of the overall Model Context Protocol (MCP) web-based to creator and executor server

"WEB-BASED TOOL EXECUTOR"(WBTE)

At this point it is probably very clear what is the role of a web-based tool executor.

As the LLM Agent identifies in the user message(prompt), email, text message or any other type of possible ways to trigger a problem to be solved and search which of the memories and or tools available would be able to solve the problem.

It will use the in memory references to the of the current MCP server tools and their “decision making helper data” (from time to time, maybe once a day a job will get all the MCP server tools and their “decision making helper data” and update the LLM Agent/orchestrator in memory references)

If a specific WBT tool is selected to be executed as part of problem solving process the LLM agent will request all the needed parameters to get that WBT executed if it is wbt_automation_mode = supervised WBT the llm agent should use its LLM memory prioritizing the recent added memories to come up with the best ways to fill up those needed parameters, share the tool and the parameters the llm agent is planning to use in order to run the tool and ask the user to approve, edit or reject the tool run and its parameters.

The LLM agent and the user start the back and forth process to get the final decision about which tool to call and which parameters to use. All of this is registered in real time into the problem process tables and records that were created as soon as the problem was identified.

When the final MCP server tool and its parameter are defined and approved the LLM agent execute the WBT using the WBTE process and return the results to the LLM agent/MCP client meaning it should return a submitting error message, submitting success message/that one maybe just means a change of page, the requesting data, form validation error(the WBT workflow is broken and need to be updated/fixed)

If the WBT_type = getting_data type the returned data should be stored in a jsonb column in the database inside of the problem tables structure. That problem has the status changed to resolved and the solution summary = something like collected data saved in the jsonb column.