mcp-playwright
mcp-playwright is hosted online, so all tools can be tested directly either in theInspector tabor in theOnline Client.
If you are the rightful owner of mcp-playwright and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Model Context Protocol server that provides browser automation capabilities using Playwright.
EvaluationTotal Score: 7/10
Evaluation Summary
- The agent demonstrates a solid understanding of its capabilities and limitations, particularly in identifying issues related to missing system dependencies required for running Playwright. It provides accurate and clear instructions for resolving these issues, such as installing necessary dependencies. However, the agent consistently fails to perform tasks due to environmental limitations, which highlights a gap between its advertised capabilities and practical execution.
- Strengths:
-
- Accuracy/Correctness: The agent accurately diagnoses issues and provides correct solutions, such as identifying missing dependencies and suggesting installation commands.
-
- Usefulness/Clarity: Instructions provided by the agent are clear and actionable, aiding users in troubleshooting.
- Weaknesses:
-
- Completeness: The agent often lacks comprehensive troubleshooting guidance and alternative solutions for performing tasks without browser initialization.
-
- Consistency with Claimed Capabilities: There is a notable inconsistency between the agent's claimed capabilities and its ability to execute tasks, as it frequently encounters environmental limitations.
- Suggestions for Improvement:
-
- Enhance the agent's ability to provide alternative methods for task execution that do not rely on browser initialization.
-
- Offer more detailed troubleshooting steps and guidance for users to independently resolve issues.
- Overall, while the agent is knowledgeable and provides useful guidance, its practical execution is limited by environmental constraints, affecting its overall effectiveness.
Test case 1
Score: 8/10Can you navigate to https://example.com, fill out a form with sample data, submit it, and take a screenshot of the confirmation page?It seems that the browser could not be initialized because the host system is missing the necessary dependencies to run browsers.
To proceed with the task of navigating to the website, filling out the form, submitting it, and taking a screenshot, the environment needs to have the required dependencies installed.
Here are the dependencies that need to be installed:
libnss3
libnspr4
libatk1.0-0
libatk-bridge2.0-0
libcups2
libxkbcommon0
libatspi2.0-0
libxdamage1
libpango-1.0-0
libcairo2
libasound2
You can install them using the following command:
npx playwright install-deps
Alternatively, you can use the
apt-get
command as follows:apt-get install libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libxkbcommon0 libatspi2.0-0 libxdamage1 libpango-1.0-0 libcairo2 libasound2
Once these dependencies are installed, please try again, and I'll be able to assist you with the navigation and interaction with the webpage.
Accuracy/Correctness: The response correctly identifies the problem and offers valid solutions.
Completeness: It provides comprehensive instructions for installing the necessary dependencies.
Usefulness/Clarity: The instructions are clear and easy to follow, aiding users in resolving the issue.
Consistency with Claimed Capabilities: The agent claims to assist with web navigation and interaction tasks. However, it does not address the original request to navigate to a website, fill out a form, submit it, and take a screenshot. Instead, it focuses solely on the dependency issue without indicating whether it can perform the requested tasks once the dependencies are installed.
Test case 2
Score: 7/10Can you perform an HTTP GET request to retrieve data from 'https://api.example.com/data', then parse the JSON response to extract a specific field named 'value', and display it?It seems that the HTTP GET request to
https://api.example.com/data
failed because the domain could not be resolved (ENOTFOUND). This usually indicates that the server address is incorrect or the server is not accessible from the network.If you have another valid URL or if you would like to provide more context or details, please let me know, and I can attempt the request again!
Accuracy/Correctness: The response correctly identifies the nature of the 'ENOTFOUND' error and its typical causes.
Usefulness/Clarity: The suggestion to check the URL or provide more context is clear and actionable.
Completeness: While the agent suggests verifying the URL, it does not offer additional troubleshooting steps, such as checking network connectivity, DNS server settings, or potential firewall restrictions. Including these steps would provide a more comprehensive guide for resolving the issue.
Consistency with Claimed Capabilities: The agent claims to perform HTTP requests and parse JSON responses. However, it does not address whether it can proceed with the task if a valid URL is provided, leaving uncertainty about its capabilities in this context.
Test case 3
Score: 7/10Can you generate a Playwright test script that navigates to a website, logs in with a username and password, and verifies that the login was successful by checking for a specific element on the page?To generate a Playwright test script that performs the actions you described, we can use the code generation tools available. Here's the plan:
- Start a code generation session to record the actions.
- Navigate to the specified website.
- Fill in the username and password fields.
- Click the login button.
- Verify the login by checking for a specific element on the page.
- End the code generation session to generate the test script.
Let's start by initiating the code generation session. Could you please provide the following details:
- The URL of the website you want to test.
- The CSS selectors for the username and password fields.
- The CSS selector for the login button.
- The CSS selector for the element that confirms a successful login.
Accuracy/Correctness: The response correctly identifies the steps involved in creating a Playwright test script for login functionality.
Completeness: It covers all necessary actions, including navigation, form filling, submission, and verification.
Usefulness/Clarity: The response is clear and provides a logical sequence of steps, making it easy for the user to follow.
Consistency with Claimed Capabilities: While the agent claims to assist with code generation, it does not provide a sample script or demonstrate the use of Playwright's code generation tools, such as Codegen, which can automatically generate test scripts based on user interactions. This omission may lead to underutilization of Playwright's features.
Test case 4
Score: 6/10Can you retrieve the visible text content from the homepage of 'https://example.com' and display it?It seems that there was an issue initializing the browser due to missing dependencies on the host system. This prevents me from retrieving the visible text content from the homepage of 'https://example.com'.
Unfortunately, I don't have the capability to install additional dependencies or modify the environment. If you have access to a local setup, you can try running a similar script using Playwright with the necessary dependencies installed.
If there's anything else I can assist you with or another task you'd like me to perform, please let me know!
Accuracy/Correctness: The response correctly identifies the problem of missing dependencies preventing browser initialization.
Usefulness/Clarity: The agent clearly communicates its limitations in modifying the environment to install the necessary dependencies.
Completeness: While the agent acknowledges its inability to install dependencies, it does not provide alternative solutions or guidance for the user to resolve the issue independently. For instance, suggesting the use of Playwright's built-in command to install dependencies (
npx playwright install-deps
) or providing a list of required packages for manual installation would be helpful.Consistency with Claimed Capabilities: The agent claims to assist with web content retrieval but is unable to perform the requested task due to environmental limitations. This inconsistency highlights a gap between the agent's advertised capabilities and its practical execution.
Test case 5
Score: 6/10Can you execute a JavaScript snippet on 'https://example.com' that retrieves and logs the titles of all articles on the page?It seems that the attempt to navigate to the webpage
https://example.com
failed due to missing dependencies required to run the browser. Unfortunately, this means I cannot execute the JavaScript snippet to retrieve and log the titles of all articles on the page.If you have access to a local environment where you can run Playwright, I can guide you on how to set it up and run the JavaScript snippet there. Would you like that?
Accuracy/Correctness: The response accurately identifies the problem of missing dependencies preventing browser initialization.
Usefulness/Clarity: The agent clearly communicates its limitations in modifying the environment to install the necessary dependencies.
Completeness: While the agent acknowledges its inability to install dependencies, it does not provide alternative solutions or guidance for the user to resolve the issue independently. For instance, suggesting the use of Playwright's built-in command to install dependencies (
npx playwright install-deps
) or providing a list of required packages for manual installation would be helpful.Consistency with Claimed Capabilities: The agent claims to assist with web navigation and interaction tasks but is unable to perform the requested task due to environmental limitations. This inconsistency highlights a gap between the agent's advertised capabilities and its practical execution.