markitdown-mcp

digitalcyphrpnk/markitdown-mcp

3.2

If you are the rightful owner of markitdown-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

MarkItDown MCP Server is a versatile document conversion server that transforms various file formats into Markdown using the MarkItDown tool.

Tools
6
Resources
0
Prompts
0

MarkItDown MCP Server

An MCP (Model Context Protocol) server that provides document conversion to Markdown using MarkItDown.

Features

  • Multi-Format Support: Convert PDF, Word, Excel, PowerPoint, images, audio, HTML, and more
  • Batch Processing: Convert multiple files simultaneously
  • Azure AI Integration: Enhanced document processing with Azure Document Intelligence
  • OCR Capabilities: Extract text from images
  • Audio Transcription: Convert speech to text
  • Web Content: Convert live web pages to Markdown
  • LLM Enhancement: Use LLMs for image descriptions and content analysis

Tools Available

convert_file_to_markdown

Convert various file formats to Markdown.

Parameters:

  • file_path (string): Path to file to convert
  • output_path (string, optional): Output path for converted file
  • use_azure_ai (boolean, optional): Use Azure Document Intelligence
  • azure_endpoint (string, optional): Azure Document Intelligence endpoint
  • llm_client (string, optional): LLM client for image descriptions
  • llm_model (string, optional): LLM model for image descriptions

convert_url_to_markdown

Convert web content to Markdown.

Parameters:

  • url (string): URL to convert to Markdown
  • save_to_file (string, optional): Path to save the converted content

batch_convert_files

Convert multiple files to Markdown in batch.

Parameters:

  • file_paths (array): List of file paths to convert
  • output_dir (string, optional): Directory to save converted files
  • file_formats (array, optional): Filter by specific file formats
  • use_azure_ai (boolean, optional): Use Azure Document Intelligence
  • azure_endpoint (string, optional): Azure endpoint

extract_document_metadata

Extract metadata from document without full conversion.

Parameters:

  • file_path (string): Path to document file

convert_clipboard_content

Convert clipboard or pasted content to Markdown.

Parameters:

  • content_type (string): Type of content ("url", "html", "text")
  • content (string): The actual content to convert
  • save_to (string, optional): Path to save converted content

get_supported_formats

Get list of all supported file formats for conversion.

Supported Formats

Documents

  • PDF (.pdf) - Portable Document Format
  • Word (.docx) - Microsoft Word documents
  • PowerPoint (.pptx) - Microsoft PowerPoint presentations
  • Excel (.xlsx, .xls) - Microsoft Excel spreadsheets

Web Content

  • HTML (.html, .htm) - Web pages
  • URLs - Live web content

Data Formats

  • CSV (.csv) - Comma-separated values
  • JSON (.json) - JavaScript Object Notation
  • XML (.xml) - Extensible Markup Language

Images (with OCR)

  • JPEG (.jpg, .jpeg)
  • PNG (.png)
  • GIF (.gif)
  • BMP (.bmp)
  • TIFF (.tiff)

Audio (with transcription)

  • WAV (.wav)
  • MP3 (.mp3)

Archives

  • ZIP (.zip) - Extracts and converts contents

Installation

Using uv (recommended)

# Clone the repository
git clone https://github.com/digitalcyphrpnk/markitdown-mcp.git
cd markitdown-mcp

# Setup with uv
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
uv pip install -e .

Using pip

pip install markitdown-mcp

Configuration

OpenCode Configuration

Add to your opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "markitdown": {
      "type": "local",
      "command": ["markitdown-mcp"],
      "enabled": true,
      "environment": {
        "AZURE_DOC_INTEL_ENDPOINT": "${AZURE_DOC_INTEL_ENDPOINT}",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}"
      }
    }
  }
}

Claude Desktop Configuration

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "markitdown": {
      "command": "markitdown-mcp",
      "env": {
        "AZURE_DOC_INTEL_ENDPOINT": "your_azure_endpoint_here",
        "OPENAI_API_KEY": "your_openai_key_here"
      }
    }
  }
}

Usage Examples

Basic Document Conversion

Convert this PDF file to Markdown: /path/to/document.pdf

Batch Processing

Convert all PDF files in the /documents folder to Markdown

Web Content Conversion

Convert the content from https://example.com to Markdown

Enhanced Processing

Convert this document using Azure Document Intelligence for better accuracy

Audio Transcription

Convert this audio file to text: /path/to/audio.wav

Image OCR

Extract text from this image: /path/to/image.png

Environment Variables

  • AZURE_DOC_INTEL_ENDPOINT: Azure Document Intelligence endpoint for enhanced processing
  • OPENAI_API_KEY: OpenAI API key for LLM-powered image descriptions
  • ANTHROPIC_API_KEY: Anthropic API key for Claude-powered features

Development

Setup Development Environment

# Clone and setup
git clone https://github.com/digitalcyphrpnk/markitdown-mcp.git
cd markitdown-mcp

# Install with development dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src tests
ruff check src tests

# Type checking
mypy src

Testing

# Run all tests
pytest

# Test specific functionality
pytest tests/test_conversion.py

# Test with different file formats
pytest tests/test_formats.py

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite
  6. Submit a pull request

License

MIT License - see file for details.

Related Projects