hublun/MDConverter
If you are the rightful owner of MDConverter and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
MDConverter is a comprehensive Python ecosystem designed to convert HTML webpage packages into clean, well-formatted Markdown files, featuring both a standalone package and an MCP server implementation.
convert_html_to_markdown
Full HTML to Markdown conversion.
validate_html_file
Validate HTML files before conversion.
get_html_metadata
Extract metadata without full conversion.
list_supported_formats
Show supported formats and features.
convert_html_content
Convert HTML content strings directly.
MDConverter - HTML to Markdown Converter
A comprehensive Python ecosystem for converting HTML webpage packages into clean, well-formatted Markdown files. This repository contains both a standalone Python package and an MCP (Model Context Protocol) server implementation.
๐ Features
- Clean Conversion: Converts HTML to well-formatted Markdown with proper formatting
- Image Processing: Handles and preserves images from webpage packages
- Metadata Extraction: Extracts and preserves article metadata (title, author, description, etc.)
- Content Cleaning: Removes ads, scripts, navigation, and other unwanted elements
- Code Block Preservation: Maintains syntax highlighting in code blocks
- Configurable Output: Extensive configuration options via files or CLI
- Multiple Interfaces: CLI, Python API, and MCP server
- Package Structure: Proper Python package with modular design
๐ Repository Structure
MDConverter/
โโโ README.md # This file
โโโ LICENSE # License file
โโโ
โโโ standalone/ # ๐ฆ STANDALONE PACKAGE
โ โโโ setup.py # Package setup
โ โโโ requirements.txt # Dependencies
โ โโโ html_to_markdown_converter.py # Main converter script
โ โโโ src/mdconverter/ # Package source
โ โโโ tests/ # Test suite
โ โโโ examples/ # Usage examples
โ โโโ docs/ # Documentation
โ โโโ assets/ # Processed images and templates
โ โโโ output/ # Default output directory
โ
โโโ mcp-server/ # ๐ MCP SERVER
โโโ pyproject.toml # Modern Python project config
โโโ src/ # MCP server source
โโโ tests/ # MCP server tests
โโโ converted_articles/ # Example conversions
โโโ test_output/ # Test outputs
โโโ config/ # MCP configuration files
๐ง Installation
Standalone Package
# Clone the repository
git clone https://github.com/hublun/MDConverter.git
cd MDConverter/standalone
# Install the package
pip install -e .
# Or install dependencies directly
pip install -r requirements.txt
MCP Server
# Navigate to MCP server directory
cd MDConverter/mcp-server
# Install the MCP server
pip install -e .
๐ Usage
Command Line Interface (Standalone)
# Basic usage
python html_to_markdown_converter.py input.html
# With custom output file
python html_to_markdown_converter.py input.html output.md
# Using the package CLI
mdconverter input.html -o output.md
# With configuration file
mdconverter input.html --config config.json
Python API (Standalone)
from mdconverter import HTMLToMarkdownConverter, Config
# Basic conversion
converter = HTMLToMarkdownConverter("input.html")
success = converter.convert()
# With custom configuration
config = Config({
'output_dir': 'custom_output',
'add_metadata': True,
'log_level': 'DEBUG'
})
converter = HTMLToMarkdownConverter(
"input.html",
output_file="custom.md",
config=config
)
success = converter.convert()
MCP Server Usage
Add to your MCP client configuration:
{
"mcpServers": {
"mdconverter": {
"command": "mdconverter-mcp",
"args": []
}
}
}
Available MCP Tools
- convert_html_to_markdown: Full HTML to Markdown conversion
- validate_html_file: Validate HTML files before conversion
- get_html_metadata: Extract metadata without full conversion
- list_supported_formats: Show supported formats and features
- convert_html_content: Convert HTML content strings directly
๐ ๏ธ Configuration
Standalone Package
Create a config.json
file:
{
"output_dir": "output",
"images_dir": "assets/images",
"preserve_images": true,
"clean_html": true,
"add_metadata": true,
"log_level": "INFO"
}
MCP Server
The MCP server supports the same configuration options via tool parameters:
{
"tool": "convert_html_to_markdown",
"arguments": {
"html_file_path": "/path/to/webpage.html",
"output_dir": "converted_articles",
"preserve_images": true,
"add_metadata": true
}
}
๐งช Testing
Standalone Package
cd standalone
python -m pytest tests/
MCP Server
cd mcp-server
python test_conversion.py
๐ Documentation
๐ Migration Guide
From Legacy Script
If you were using the old html_to_markdown_converter.py
script:
# Old way
python html_to_markdown_converter.py input.html
# New way (still supported)
cd standalone
python html_to_markdown_converter.py input.html
# Or use the package
mdconverter input.html
From Standalone to MCP
The MCP server provides the same functionality with additional features:
# Standalone
converter = HTMLToMarkdownConverter("file.html")
converter.convert()
# MCP
{
"tool": "convert_html_to_markdown",
"arguments": {"html_file_path": "file.html"}
}
๐ฏ Key Features
Content Processing
- Smart Content Extraction: Identifies and extracts main article content
- Metadata Preservation: YAML frontmatter with article information
- Image Organization: Copies and organizes image files
- Code Syntax Preservation: Maintains syntax highlighting
- Clean Output: Removes unwanted elements (ads, navigation, etc.)
Multiple Interfaces
- CLI Tool: Command-line interface for batch processing
- Python API: Programmatic access for integration
- MCP Server: Model Context Protocol for AI assistant integration
Advanced Features
- Configurable Processing: Extensive customization options
- Error Handling: Comprehensive validation and error reporting
- Multiple Formats: Support for various HTML structures
- Template System: Customizable output templates
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new features
- Update documentation
- Submit a pull request
๐ License
This project is licensed under the Apache License 2.0 - see the file for details.
๐ Support
- Issues: Report bugs and feature requests via GitHub Issues
- Documentation: Check the
docs/
directories for detailed guides - Examples: See the
examples/
directories for usage examples
Note: This repository was extracted from the Leet_Vibe repository to create a focused, standalone HTML to Markdown conversion tool.