Playwright-MCP-Server

idavidov13/Playwright-MCP-Server

3.2

If you are the rightful owner of Playwright-MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Model Context Protocol (MCP) Server is a tool designed to enhance the capabilities of Playwright by providing a structured protocol for communication between the client and browser servers.

Playwright-MCP-Server

Set Up and Utilization of Playwright MCP (Model Context Protocol) Server in Windsurf IDE

Table of Contents

Introduction

Playwright’s MCP (Multi-Channel Protocol) is a low-level protocol that enables communication between the Playwright client and browser servers (such as Chromium, Firefox, and WebKit). MCP is designed to be transport-agnostic, supporting communication over WebSockets, pipes, or other channels, and is the foundation for Playwright’s cross-browser automation capabilities.

Main Goal

Playwright MCP (Model Context Protocol) Server is a tool that allows you to generate better Playwright tests with LLMs by providing much helpful context to the LLMs and ability to interact with the browsers. I try to generate tests for web app Conduit and compare the final results. You can find the repository with MVP(Minimal Viable Product) for Playwright Automation Framework for the same web app, which features TypeScript, Page Object Model Design Pattern, Custom Fixtures, REST API Testing and Mocking, Schema Validation with Zod, Environment Utilization, and CI/CD integration with GitHub Actions and GitLab CI/CD which I created before this repository.

Used LLMs

  • GPT-4.1
  • Claude 3.7 Sonnet (Thinking)
  • DeepSeek R1
  • SWE-1
  • xAI Grok-3

Prerequisites

  • Node.js (version 20.13.1 or later)
  • npm (version 10 or later)

Installation

  1. Clone the repository:

    git clone https://github.com/idavidov13/Playwright-MCP-Server.git
    cd Playwright-MCP-Server
    
  2. Install dependencies:

    npm install
    
  3. Install Playwright Browsers:

    npx playwright install --with-deps
    

    This command ensures all necessary browser binaries and dependencies are installed.

Configure Playwright MCP Server

Follow the instructions in the Playwright MCP Server documentation to configure the Playwright MCP Server.

Add Custom Instructions

Create a folder .windserf/rules in the root of the project. Add a file playwright-mcp-server-test-generator.md in the .windserf/rules folder. Select "Always On" option from the dropdown menu. Put following content in Content Textarea:

1. You are a playwright test generator.
2. You are given a scenario and you need to generate a playwright test for it.
3. DO NOT generate test code based on the scenario alone.
4. DO run steps one by one using the tools provided by the Playwright MCP.
5. Only after all steps are completed, emit a Playwright TypeScript test that uses '@playwright/test'. Below are instruction for generating the test:
5.1. You are an expert in TypeScript, Frontend development, and Playwright end-to-end testing.
You write concise, technical TypeScript code with accurate examples and the correct types.
-   Create Class for all locators and methods for all actions
-   Avoid using `page.locator` and always use recommended build-in and role-based locators (`getByRole`, `getByLabel`, etc)
-   Prefer to use web-first assertions (`toBeVisible`, `toHaveText`, etc) whenever possible
-   Use build in config objects like `devices` whenever possible
-   Avoid hardcoded timeouts
-   Reuse Playwright locators by using variables
-   Follow the guidance and the best practices described on playwright.dev
-   Avoid commenting the resulting code
6. Save generated test file in the tests directory
7. Execute the test file and iterate until the test passes

Test Data

Create a test file "yourTestName.spec.ts" add the test data as follows:

const url = process.env.URL || 'https://conduit.bondaracademy.com/';
const email = process.env.EMAIL || 'yourEmail';
const password = process.env.PASSWORD || 'yourPassword';
const articleTitle = `Test Article for Playwright MCP Server ${Date.now()}`;
const articleAbout = 'This is a test article about Playwright MCP Server';
const articleContent = 'This article is created by Playwright MCP Server test automation';
const updatedArticleTitle = `Updated ${articleTitle}`;
const updatedArticleAbout = 'Updated article about';
const updatedArticleContent = 'This article has been updated';

Generate Test with Playwright MCP Server and GPT-4.1

  1. Select GPT-4.1 as a model
  2. Run the following prompt:
@gpt-4.1.spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time. Only update which was needed to be made was to specify .first() for the delete button and edit button.

Generate Test with Playwright MCP Server and Claude 3.7 Sonnet (Thinking)

  1. Select Claude 3.7 Sonnet (Thinking) as a model
  2. Run the following prompt:
@claude-3.7-Sonnet(Thinking).spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time. Only update which was needed to be made was to remove 'articleContent' locator due to wrong behavior.

Generate Test with Playwright MCP Server and DeepSeek R1

  1. Select DeepSeek R1 as a model
  2. Run the following prompt:
@deepseek-R1.spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time. Only update which was needed to be made was to change button name from 'Update Article' to 'Publish Article' in Edit Article step due to wrong locator setup.

Generate Test with Playwright MCP Server and SWE-1

  1. Select SWE-1 as a model
  2. Run the following prompt:
@swe-1.spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time. Updates which were needed to be made were to combine all tests (there were separate test for each action, which leads to failing tests due to single Log in), to remove one assertion for created/edited article ('await expect(app.articleAuthor).toContainText(email.split('@')[0]);'), to add .first() for the delete button and edit button, and to delete wrong navigation to article page. There are still unused locators, which were defined.

Generate Test with Playwright MCP Server and xAI Grok-3

  1. Select xAI Grok-3 as a model
  2. Run the following prompt:
@xai-grok-3.spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time. Updates which were needed to be made were to remove incorrect assertion (await expect(this.page.getByRole('link', { name: 'Your Feed' })).toBeVisible();), to add .first() for the delete button and edit button, and updete assertion after delete article button click (await expect(this.page.getByText(updatedArticleTitle)).toBeVisible();).

Generate Test with Playwright MCP Server and Claude 4 Sonnet

  1. Select Claude 4 Sonnet as a model
  2. Run the following prompt:
@claude-4-sonnet.spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time and no updates were made

Generate Test with Playwright MCP Server and Claude 4 Opus

  1. Select Claude 4 Opus as a model
  2. Run the following prompt:
@claude-4-opus.spec.ts Create a test case utilizing provided constants for navigating to the web app, login, create/edit/delete an article. Try to verify the result after every major step. Use provided instructions

After completion of the test, you can run it with the following command:

npx playwright test yourTestName.spec.ts

Note: The provided example was generated from the first time and no updates were made

Comparison of Generated POMs

Comparison of Page Object initialization patterns

  1. Using Getters for Locators - Claude 3.7 Sonnet (Thinking)
get usernameInput() { return this.page.getByLabel('Username'); }
  • Pros:

    • Lazy Evaluation: Locators are created only when accessed, ensuring up-to-date references.
    • Readability: Clean, property-like access (pageObject.usernameInput).
    • Encapsulation: Easy to add logic or assertions in the getter if needed.
    • IntelliSense: Good support in editors for auto-completion.
  • Cons:

    • Performance: Each access creates a new locator (though Playwright locators are lightweight).
    • Inheritance: Overriding getters in subclasses can be less straightforward than overriding fields.
  1. Using Private Getters - SWE-1
private get usernameInput() { return this.page.getByLabel('Username'); }
  • Pros:

    • Encapsulation: Prevents direct access from outside the class, enforcing usage only within class methods.
    • Cleaner API: Exposes only actions, not locators, to test code.
    • Reduces Misuse: Prevents test code from making direct assertions on locators.
  • Cons:

    • Test Flexibility: Makes it harder to write custom assertions or interact directly with elements from the test.
    • Discoverability: Less transparent for someone reading the test and wanting to know what elements are available.
  1. Using Objects for Locators - GPT-4.1
get loginForm () {
    return {
    email: this.page.getByRole('textbox', { name: 'Email' }),
    password: this.page.getByRole('textbox', { name: 'Password' }),
    submit: this.page.getByRole('button', { name: 'Sign in' }),
}
}
  • Pros:

    • Lazy Evaluation: Locators are created only when accessed, ensuring up-to-date references.
    • Centralized: All locators are grouped in one object, making them easy to find and update.
    • Encapsulation: Easy to add logic or assertions in the getter if needed.
    • IntelliSense: Good support in editors for auto-completion.
  • Cons:

    • Performance: Each access creates a new locator (though Playwright locators are lightweight).
    • Inheritance: Overriding getters in subclasses can be less straightforward than overriding fields.
  1. Using Methods Directly - xAI-Grok-3
async navigateToHome() {
    await this.page.goto(url);
    await expect(this.page.getByRole('heading', { name: 'conduit' })).toBeVisible();
}
  • Pros:

    • Encapsulation: Only exposes actions, not locators, enforcing the Page Object pattern strictly.
    • API Clarity: Test code reads like user actions (pageObject.fillUsername('foo')).
    • Maintainability: Easy to update selectors in one place, and logic can be added to methods.
  • Cons:

    • Reduced Flexibility: Cannot easily make custom assertions or interact with elements outside provided methods.
    • Verbosity: May require many methods for complex pages, leading to bloated classes.

Summary Table

PatternEncapsulationFlexibilityReadabilityStaleness RiskAPI Surface
GettersMediumHighHighLowMedium
Private GettersHighMediumMediumLowLow
Objects for LocatorsLowHighMediumMediumHigh
Methods DirectlyHighLowHighLowLow

When to Use Each

Getters: When you want readable code and flexibility, and are OK with exposing locators.

Private Getters: When you want to strictly encapsulate element access, exposing only actions.

Objects for Locators: When you want centralized, reusable locators and are not concerned with strict encapsulation.

Methods Directly: When you want the cleanest, most maintainable API and are OK with less flexibility in tests.

Comparison of Generated Tests

Comparison Criterias

  1. Code quality (structure, modularity, error handling)
  2. Readability (clarity, naming, comments, formatting)
  3. Adherence to Playwright and automation best practices (locator usage, assertions, reusability, maintainability)

Comparison Results

  1. Claude 3.7 Sonnet (Thinking)

    • Code Quality

      • Uses a well-structured Page Object Model (ConduitPage), grouping locators and actions as class methods/getters.
      • Good encapsulation and reusability; all page interactions are abstracted.
      • Uses role-based locators (getByRole) and avoids hardcoded selectors.
      • Handles login state check before logging in.
    • Readability

      • Clear and descriptive method names.
      • Consistent formatting and variable naming.
      • Minimal comments, but code is self-explanatory.
    • Best Practices

      • Follows Playwright best practices: web-first assertions, role-based locators, and modularity.
      • Uses environment variables for config.
      • No hardcoded timeouts.
      • Test scenario is end-to-end and readable.
  2. Deepseek-R1

    • Code Quality

      • No Page Object Model; all actions are inline within the test.
      • Uses constants for credentials and article data.
      • Directly uses Playwright locators in test steps.
      • Lacks abstraction and reusability.
    • Readability

      • Simple, readable, but less maintainable for larger tests.
      • Variable names are clear.
      • Test is split into logical sections (Create/Edit/Delete).
    • Best Practices

      • Uses role-based selectors and web-first assertions.
      • No modularization; not scalable for larger suites.
      • No comments, but the structure is easy to follow.
  3. GPT-4.1

    • Code Quality

      • Implements a Page Object Model (ArticlePage), with grouped locators and actions.
      • Uses nested objects for navigation, forms, and article actions.
      • Uses role-based and parameterized locators.
      • Good encapsulation and reusability.
    • Readability

      • Very clear structure, logical grouping, and descriptive method/variable names.
      • Minimal comments, but code is self-explanatory.
      • Consistent formatting.
    • Best Practices

      • Follows Playwright best practices: role-based selectors, web-first assertions, config via env vars.
      • No hardcoded timeouts.
      • Test covers all CRUD actions and checks for post-deletion state.
  4. SWE-1

    • Code Quality

      • Implements a Page Object Model (ConduitApp) with private getters for locators and methods for actions.
      • Good encapsulation and modularity.
      • Uses role-based selectors and web-first assertions.
      • Has beforeAll/afterAll hooks for setup/teardown.
    • Readability

      • Clear and descriptive method names.
      • Consistent formatting and logical structure.
    • Best Practices

      • Follows Playwright recommendations: modularity, role-based selectors, web-first assertions.
      • Uses environment variables.
      • Test is comprehensive and checks all CRUD operations.
  5. xAI-Grok-3

    • Code Quality

      • Implements a Page Object Model (ConduitApp) but with all data hardcoded inside methods.
      • Each action is a single method; no parameterization.
      • Uses role-based selectors and web-first assertions.
      • Less flexible for reusability.
    • Readability

      • Simple and readable, but less scalable.
      • Method names are clear, but lack parameterization for reuse.
      • Minimal comments.
    • Best Practices

      • Uses Playwright best practices for selectors and assertions.
      • No modularization of test data.
      • Test steps are clear and sequential.

Comparative Table

FilePOM UsedAbstractionReadabilityBest PracticesScalabilityCommentsManual Updates
Claude 3.7 Sonnet (Thinking)YesHighHighYesHighWell-structuredLow
Deepseek-R1NoLowMediumPartialLowInline logicLow
GPT-4.1YesHighHighYesHighWell-structuredLow
SWE-1YesHighHighYesHighHooks used, Old Setup/TeardownHigh
xAI-Grok-3YesMediumMediumYesMediumNo parameterizationMedium

Conclusion

  • Best Overall (Code Quality & Best Practices):
    • GPT-4.1 and Claude 3.7 Sonnet (Thinking) stand out for their structured Page Object Models, modularity, and adherence to Playwright best practices. Both are highly maintainable and readable, with good abstraction and scalability.
    • SWE-1 is also strong, but could improve old browser setup/teardown. There was a need to manually update the test/locators after generation.
  • Most Readable for Small Tests:
    • Deepseek R1 and xAI Grok-3 are readable and easy to follow for small, simple scenarios but lack abstraction and scalability for larger suites.
  • Best for Large Automation Suites:
    • GPT-4.1 and Claude 3.7 Sonnet (Thinking) are preferred due to their maintainability, modularity, and extensibility.

Comparison between Claude-4-Opus, Claude-4-Sonnet and GPT-4.1

  1. Common Patterns

    • Class-based Page Object Model: All three files define an ArticlePage class that encapsulates page interactions and locators using getters for navigation, forms, and article actions.

    • Lazy Locators: Locators are exposed as getters, ensuring up-to-date references to DOM elements (lazy evaluation).

    • Action Methods: Each class provides methods for login, article creation, editing, and deletion, using the encapsulated locators.

    • Test Flow:

      • Each test file has a single E2E test that:
      • Navigates to the app
      • Logs in
      • Creates an article
      • Edits the article
      • Deletes the article
      • Verifies deletion
    • Use of Playwright Best Practices:

      • Role-based locators (getByRole)
      • Web-first assertions (toBeVisible, toHaveURL, etc.)
      • No hardcoded timeouts except for explicit waits
  2. Differences

Aspectclaude-4-Opus.spec.tsclaude-4-Sonnet.spec.tsgpt-4.1.spec.ts
Test DescriptionGeneric E2E for articlesE2E with more step-by-step checksGeneric E2E for articles
ArticlePage StructureHas nav, loginForm, editorForm, article getters with nested functions for dynamic locatorsNearly identical to Opus, but with more explicit verification steps in the testSlightly simplified; nav getter lacks home link, and some verification steps are omitted
AssertionsChecks URLs, visibility, and content after each actionMore granular: checks for navigation, login, creation, editing, deletion with explicit commentsFewer assertions, especially after login and during edit/delete
Test DataUses Playwright MCP Server branding in article contentUses Claude Sonnet branding in article contentUses generic/shorter content and updated fields
Locator PatternsAll use role-based locators, dynamic getter for usernameIdentical patternIdentical, but slightly less comprehensive in navigation locators
Cleanup/VerificationAfter deletion, checks profile for absence of articlesSame as OpusSame as Opus, but with less robust assertion (no timeout on final expect)
Comments/ReadabilityModerate inline commentsMore step-by-step commentsFewer comments
  1. Key Takeaways
    • All three tests use a modern, maintainable Playwright pattern with class-based encapsulation and lazy locators.
    • The main differences are in test data, assertion thoroughness, and the level of documentation/comments.
    • The claude-4-Sonnet test is the most explicit and robust in terms of stepwise verification and comments.
    • The gpt-4.1 test is the most minimal, which could make it less robust for regression testing but easier to maintain for simple flows.

Disclaimer

This repository is intended for informational and educational purposes only. It compares the capabilities of five different Large Language Models (LLMs) in generating a single test case for Playwright, utilizing the Playwright MCP Server with the same input. The results and analyses presented do not imply any endorsement or disapproval of any specific LLM. Performance and outputs are subject to variation based on a range of factors, including but not limited to the specific input data and environment setup.

The information in this repository is provided "as is," and no warranty, express or implied, is made regarding its accuracy, reliability, or completeness. Users are advised to independently verify any results and exercise their discretion when interpreting and utilizing the findings. The authors and contributors assume no responsibility for any consequences arising from the use of the content provided within this repository.

For detailed information on the capabilities and limitations of the individual LLMs, please refer to their respective official documentation and licensing terms.