website-to-markdown-mcp

SunZhi-Will/website-to-markdown-mcp

3.3

If you are the rightful owner of website-to-markdown-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Website to Markdown MCP Server is a robust tool designed to fetch website content and convert it into Markdown format, enhancing AI's ability to process and understand web information.

Tools
  1. fetch_website

    Fetch any website and convert to Markdown

  2. list_configured_websites

    List all configured websites for easy access

๐ŸŒ Website to Markdown MCP Server

Language: |

A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information.

โœจ Key Features

๐ŸŒŸ Enhanced Processing๐Ÿ“Š OpenAPI Supportโš™๏ธ Smart Analysis๐ŸŽฏ Advanced Extraction
AI-powered content cleanupOpenAPI 3.x/Swagger 2.0Reading time calculationMain content detection
Auto ad removalProfessional validationWord count statisticsLanguage detection
Content summarizationStructured API parsingSmart retry mechanismMulti-format support

๐Ÿ†• What's New in v1.2.0

๐Ÿš€ Major Enhancements

FeatureStatusDescription
๐Ÿง  Enhanced Content Processorโœ…AI-powered content cleaning and extraction
๐Ÿ“Š Smart Analyticsโœ…Word count, reading time, content summary
๐ŸŒ Language Detectionโœ…Automatic language identification
๐ŸŽฏ Intelligent Retryโœ…Smart retry mechanism with exponential backoff
๐Ÿ” Stealth Browserโœ…Anti-detection browsing capabilities
โšก Rate Limitingโœ…Built-in rate limiting and concurrency control
๐Ÿงน Content Cleanupโœ…Remove ads, navigation, and irrelevant content
๐Ÿ“ Enhanced Markdownโœ…Support for strikethrough, underline, highlights

๐Ÿš€ Quick Start

๐ŸŽฏ Method 1: NPX Installation (๐ŸŒŸ Recommended)

๐Ÿ’ก Easiest way: No local installation needed!

Step 1: Create Configuration File ๐Ÿ“„

Create a my-websites.json file:

{
  "websites": [
    {
      "name": "your_website",
      "url": "https://your-website.com",
      "description": "Your Project Website"
    },
    {
      "name": "api_docs",
      "url": "https://api.example.com/openapi.json",
      "description": "Your API Specification"
    }
  ]
}
Step 2: Configure MCP Server โš™๏ธ

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "npx",
      "args": ["-y", "website-to-markdown-mcp"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}
Step 3: Restart and Test ๐Ÿ”„
  1. Restart Cursor
  2. Open Chat and use Agent mode
  3. Test command: Please list all configured websites

๐ŸŽ‰ Done! No installation required!


๐ŸŽฏ Method 2: Local Installation

๐Ÿ’ก Best Practice: Use this method for development or customization!

Step 1: Clone and Build
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
npm install
npm run build
Step 2: Configure MCP Server

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

๐Ÿ”ฅ Enhanced Output Features

๐Ÿ“Š Rich Content Analysis

Every fetched content now includes:

  • ๐Ÿ“ Content Summary: AI-generated summary of the main content
  • โฑ๏ธ Reading Time: Estimated reading time based on content length
  • ๐Ÿ”ข Word Count: Accurate word count for both English and Chinese
  • ๐ŸŒ Language Detection: Automatic language identification
  • ๐ŸŽฏ Content Quality Score: Assessment of content relevance

๐Ÿ“‹ Enhanced Markdown Output

# ๐Ÿš€ Example Website

**Source**: https://example.com
**Website**: example_site - Example Website
**๐Ÿ“Š Reading Time**: 5 minutes
**๐Ÿ”ข Word Count**: 1,250 words
**๐ŸŒ Language**: English
**๐Ÿ“ Summary**: This article discusses the latest developments in web technology...

---

[Enhanced Markdown content with better formatting...]

๐Ÿ†• Complete OpenAPI/Swagger Support

๐Ÿ”ฅ Professional API Documentation

FeatureOpenAPI 3.xSwagger 2.0Description
๐Ÿ” Auto Detectionโœ…โœ…Support JSON/YAML formats
โœ… Professional Validationโœ…โœ…Using @readme/openapi-parser
๐Ÿ“‹ Structured Parsingโœ…โœ…Endpoints, parameters, responses
๐Ÿ”— Reference Resolutionโœ…โœ…Auto handle $ref references
๐Ÿ“Š Smart Summaryโœ…โœ…Generate API overview
๐Ÿ“ Formatted Outputโœ…โœ…Readable Markdown

๐ŸŒŸ Pre-configured Example Websites

{
  "websites": [
    {
      "name": "petstore_openapi",
      "url": "https://petstore3.swagger.io/api/v3/openapi.json",
      "description": "๐Ÿ• Swagger Petstore OpenAPI 3.0 Spec (Demo)"
    },
    {
      "name": "petstore_swagger",
      "url": "https://petstore.swagger.io/v2/swagger.json",
      "description": "๐Ÿฑ Swagger Petstore Swagger 2.0 Spec (Demo)"
    },
    {
      "name": "github_api",
      "url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
      "description": "๐Ÿ™ GitHub REST API OpenAPI Spec"
    }
  ]
}

๐Ÿ“ฆ Installation & Setup

๐Ÿ› ๏ธ System Requirements

  • Node.js 20.18.1+ (Recommended: v22.15.0 LTS)
  • npm 10.0.0+ or yarn
  • Cursor Editor

โš ๏ธ Important: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings.

โšก NPM Package Installation

# Global installation
npm install -g website-to-markdown-mcp

# Or use directly with npx (recommended)
npx website-to-markdown-mcp

๐Ÿ”ง Development Setup

# 1. Clone repository
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp

# 2. Install dependencies
npm install

# 3. Build project
npm run build

๐ŸŽ›๏ธ Advanced Configuration Options

Configuration Priority Order
graph TD
    A[๐Ÿ” Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?}
    B -->|Yes| C[โœ… Load External Config File]
    B -->|No| D[๐Ÿ” Check Environment Variable<br/>WEBSITES_CONFIG]
    D --> E{Valid JSON?}
    E -->|Yes| F[โœ… Load Embedded Config]
    E -->|No| G[๐Ÿ” Check config.json]
    G --> H{File exists?}
    H -->|Yes| I[โœ… Load Local Config]
    H -->|No| J[๐Ÿ”ง Use Default Config]

๐ŸŽจ Configuration Method Details

๐Ÿ“‹ Method 1: External Configuration File (๐ŸŒŸ Recommended)

๐Ÿ’ก Advantages: Easy to edit, syntax highlighting, version control friendly

๐Ÿ”ง Detailed Setup Steps
  1. Create Configuration File

    # Can be placed anywhere
    touch my-api-configs.json
    
  2. Edit Configuration Content

    {
      "websites": [
        {
          "name": "my_docs",
          "url": "https://docs.example.com",
          "description": "๐Ÿ“š My Documentation Website"
        }
      ]
    }
    
  3. Set Environment Variable

    {
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-api-configs.json"
      }
    }
    

๐Ÿ“‹ Method 2: Embedded JSON (Backward Compatible)

๐Ÿ”ง Configuration Example
{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}"
      }
    }
  }
}

๐Ÿ“‹ Method 3: Local config.json

๐Ÿ”ง Local Configuration

Directly edit config.json in the project root directory:

{
  "websites": [
    {
      "name": "local_site",
      "url": "https://local.example.com",
      "description": "๐Ÿ  Local Test Website"
    }
  ]
}

๐Ÿ”ง Available Tools

๐ŸŒ General Tools

Tool NameFunctionParametersExample
fetch_websiteFetch any websiteurl: Website URLFetch OpenAPI spec files
list_configured_websitesList configured websitesNoneView all available websites

๐ŸŽฏ Dedicated Tools

Each configured website automatically generates corresponding dedicated tools:

  • fetch_petstore_openapi - Fetch Petstore OpenAPI 3.0 spec
  • fetch_petstore_swagger - Fetch Petstore Swagger 2.0 spec
  • fetch_github_api - Fetch GitHub API spec
  • fetch_tailwind_css - Fetch Tailwind CSS documentation

๐Ÿ“Š Enhanced Output Format Examples

๐ŸŒ General Website Content with Analytics

# Website Title

**Source**: https://example.com
**Website**: example_site - Example Website
**๐Ÿ“Š Reading Time**: 3 minutes
**๐Ÿ”ข Word Count**: 650 words
**๐ŸŒ Language**: English
**๐Ÿ“ Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies.

---

[Enhanced cleaned Markdown content with ads removed and main content extracted...]

๐Ÿ“‹ OpenAPI 3.x Specification File

# ๐Ÿš€ Example API (v2.1.0)

**Source**: https://api.example.com/openapi.json
**OpenAPI Version**: 3.0.3
**Validation Status**: โœ… Valid
**๐Ÿ“Š Processing Time**: 1.2 seconds
**๐Ÿ”ข Endpoints**: 25 endpoints
**๐ŸŒ Server Locations**: 3 servers

---

## ๐Ÿ“‹ API Basic Information

- **API Name**: Example API
- **Version**: 2.1.0
- **OpenAPI Version**: 3.0.3
- **Description**: A powerful example API for modern applications

## ๐ŸŒ Servers

1. **https://api.example.com**
   - ๐Ÿข Production server
2. **https://staging-api.example.com**
   - ๐Ÿงช Testing server

## ๐Ÿ› ๏ธ API Endpoints

Total of **25** endpoints:

### ๐Ÿ‘ฅ `/users`
- **GET**: Get user list
- **POST**: Create new user

### ๐Ÿ” `/users/{id}`
- **GET**: Get specific user
- **PUT**: Update user information
- **DELETE**: Delete user

## ๐Ÿงฉ Components

- **Schemas**: 12 data models
- **Parameters**: 8 reusable parameters  
- **Responses**: 15 reusable responses
- **Security Schemes**: 3 security mechanisms

๐ŸŽฏ Usage Examples

๐Ÿ’ป Basic Usage

Please fetch the content from https://docs.example.com and convert to markdown

๐Ÿ” OpenAPI Specification Fetching

Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification

๐Ÿ“š Documentation Website Fetching

Please fetch React official documentation content

๐Ÿšจ Troubleshooting

๐Ÿ“‹ Complete Troubleshooting Guide: See for detailed solutions to common issues.

โ“ Quick Solutions

๐Ÿ”ง Node.js Version Issues

Error: npm WARN EBADENGINE Unsupported engine

๐ŸŒ Module Not Found Issues

Error: Cannot find module './db.json'

  • Solution 1: Clear npm cache: npm cache clean --force
  • Solution 2: Update Node.js version
  • Solution 3: Use local installation instead of npx
โš™๏ธ Configuration Issues

Q: Configuration changes not taking effect?

  • โœ… Confirm JSON format is correct
  • โœ… Restart Cursor
  • โœ… Check environment variable names

Q: JSON format errors?

  • ๐Ÿ› ๏ธ Use JSON Validator
  • ๐Ÿ› ๏ธ Confirm using double quotes
  • ๐Ÿ› ๏ธ Check for extra commas

๐Ÿ” Debug Mode

Detailed logs are output to stderr at startup:

# View debug messages
npm run dev 2> debug.log

๐Ÿ“ˆ Performance & Optimization

โšก Performance Features

  • ๐Ÿš€ Smart Retry: Intelligent retry with exponential backoff
  • ๐Ÿ’พ Rate Limiting: Built-in rate limiting to prevent overload
  • ๐ŸŽฏ Content Filtering: Remove irrelevant content for faster processing
  • ๐Ÿงน Ad Removal: Automatic ad and popup removal
  • ๐Ÿ“Š Stealth Mode: Anti-detection browsing capabilities

๐Ÿ›ก๏ธ Security Considerations

  • ๐Ÿ”’ HTTPS websites only (recommended)
  • ๐Ÿ› ๏ธ Auto filter malicious scripts
  • ๐Ÿ“ Limit output content length
  • ๐Ÿ” Stealth browsing to avoid detection

๐Ÿ“ฆ Dependencies

PackageVersionPurpose
@modelcontextprotocol/sdk^1.0.0MCP Core Framework
@readme/openapi-parser^4.1.0Professional OpenAPI Parsing
axios^1.6.0HTTP Request Handling
cheerio^1.0.0HTML Parsing Engine
turndown^7.1.2HTML to Markdown
yaml^2.8.0YAML Format Support
zod^3.22.0Data Validation Framework
playwright^1.40.0Browser automation

๐Ÿ“ Changelog

๐ŸŽ‰ v1.2.0 (Latest)

๐Ÿš€ Major Feature Updates

  • โœจ Added Enhanced content processing with AI-powered cleanup
  • โœจ Added Smart analytics: word count, reading time, content summary
  • โœจ Added Language detection and multi-language support
  • โœจ Added Stealth browser capabilities for anti-detection
  • โœจ Added Built-in rate limiting and retry mechanisms
  • โœจ Added Advanced content filtering and ad removal
  • ๐Ÿ”ง Enhanced Markdown processing with more HTML element support
  • ๐Ÿ“Š Improved Output format with rich metadata
  • ๐ŸŽฏ Fixed Various technical issues and dependencies

๐ŸŽฏ v1.1.0 (Previous)

๐Ÿš€ Major Feature Updates

  • โœจ Added Full OpenAPI 3.x/Swagger 2.0 support
  • โœจ Added JSON/YAML format auto-detection
  • โœจ Added Professional-grade spec validation and reference resolution
  • โœจ Added Version auto-adaptation mechanism
  • โœจ Added Structured API documentation summary
  • ๐Ÿ”ง Pre-configured Multiple OpenAPI/Swagger examples
  • ๐Ÿ“ฆ Added NPM package distribution with npx support
  • ๐ŸŽฏ Enhanced Installation methods for better user experience

๐ŸŽฏ v1.0.0 (Stable)

  • ๐ŸŽ‰ Initial Release
  • ๐ŸŒ Basic Functions Website content fetching
  • ๐Ÿ“ Core Functions Markdown conversion
  • โš™๏ธ Configuration Support Multi-website management

๐Ÿค Contributing

๐Ÿ’ก How to Contribute

  1. ๐Ÿด Fork this project
  2. ๐ŸŒŸ Create feature branch (git checkout -b feature/AmazingFeature)
  3. ๐Ÿ“ Commit changes (git commit -m 'Add some AmazingFeature')
  4. ๐Ÿ“ค Push to branch (git push origin feature/AmazingFeature)
  5. ๐Ÿ”„ Open Pull Request

๐Ÿ› Issue Reporting

Report issues on the Issues page, please include:

  • ๐Ÿ” Issue Description
  • ๐Ÿ”„ Reproduction Steps
  • ๐Ÿ’ป Environment Information
  • ๐Ÿ“ธ Screenshots or Logs

๐Ÿ“„ License

This project is licensed under the MIT License - see the file for details.


๐ŸŒŸ If this project helps you, please give it a Star!

๐Ÿ’ฌ Have questions or suggestions? Feel free to open an Issue!


Made by Sun โค๏ธ for the Developer Community