zeph-gh/Docx-Mcp-Server
If you are the rightful owner of Docx-Mcp-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A comprehensive Model Context Protocol (MCP) server for processing Microsoft Word (.docx) documents with full formatting support.
extract_text
Extract plain text content from a DOCX file.
convert_to_html
Convert DOCX file to HTML with formatting preserved.
analyze_structure
Analyze document structure, headings, and formatting elements.
extract_images
Extract and list images from a DOCX file.
convert_to_markdown
Convert DOCX file to Markdown format.
DOCX MCP Server
A comprehensive Model Context Protocol (MCP) server for processing Microsoft Word (.docx) documents with full formatting support.
Features
This MCP server provides advanced DOCX document processing capabilities using the powerful mammoth
library:
- Text Extraction: Extract plain text with word count
- HTML Conversion: Convert to HTML with preserved formatting
- Structure Analysis: Analyze document structure, headings, and formatting elements
- Image Extraction: Extract embedded images (as base64 or save to files)
- Markdown Conversion: Convert to Markdown format
- Rich Formatting Support: Handles bold, italic, lists, headings, and more
Available Tools
1. extract_text
Extract plain text content from a DOCX file.
Parameters:
file_path
(string): Path to the .docx file
Returns:
- Plain text content
- Processing messages
- Word count
2. convert_to_html
Convert DOCX file to HTML with formatting preserved.
Parameters:
file_path
(string): Path to the .docx fileinclude_styles
(boolean, optional): Include inline styles (default: true)
Returns:
- HTML content with formatting
- Processing messages
- Warnings and errors
3. analyze_structure
Analyze document structure, headings, and formatting elements.
Parameters:
file_path
(string): Path to the .docx file
Returns:
- Document statistics (characters, words, paragraphs, headings)
- Structure analysis (headings with levels)
- Formatting analysis (bold, italic, lists count)
- Processing messages
4. extract_images
Extract and list images from a DOCX file.
Parameters:
file_path
(string): Path to the .docx fileoutput_dir
(string, optional): Directory to save extracted images
Returns:
- Total image count
- Image details (src, alt text, base64 status)
- Output directory information
- Processing messages
5. convert_to_markdown
Convert DOCX file to Markdown format.
Parameters:
file_path
(string): Path to the .docx file
Returns:
- Markdown content
- Word count
- Processing messages
Installation
npm install
npm run build
Usage
The server runs on stdio and communicates via JSON-RPC 2.0 protocol.
Example Usage with MCP Client
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "analyze_structure",
"arguments": {
"file_path": "/path/to/document.docx"
}
}
}
Example Usage with Roo
{
"file_path": "/path/to/document.docx"
}
Supported Features
- ✅ Text Extraction: Plain text with word counting
- ✅ Rich Formatting: Bold, italic, underline, strikethrough
- ✅ Document Structure: Headings (H1-H6), paragraphs
- ✅ Lists: Ordered and unordered lists with items
- ✅ Images: Extraction as base64 or file export
- ✅ Tables: Basic table structure (via HTML conversion)
- ✅ Links: Hyperlinks preservation
- ✅ Styles: Custom style mapping support
- ✅ Error Handling: Comprehensive error reporting
- ✅ Multiple Formats: HTML, Markdown, plain text output
Advanced Features
Custom Style Mapping
The convert_to_html
tool supports custom style mapping for better semantic HTML output:
// Example style mappings
"p[style-name='Heading 1'] => h1:fresh"
"r[style-name='Strong'] => strong"
"r[style-name='Emphasis'] => em"
Image Handling
- Base64 Embedding: Images can be embedded as base64 data URLs
- File Export: Images can be extracted to a specified directory
- Metadata: Alt text and content type preservation
Document Analysis
Provides comprehensive document analysis including:
- Character and word counts
- Paragraph and heading counts
- Formatting element statistics
- Document structure hierarchy
Development
Install dependencies:
npm install
Build the server:
npm run build
For development with auto-rebuild:
npm run watch
Installation for Claude Desktop
To use with Claude Desktop, add the server config:
On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"docx-format-server": {
"command": "/path/to/docx-format-server/build/index.js"
}
}
}
Dependencies
@modelcontextprotocol/sdk
: MCP protocol implementationmammoth
: Advanced DOCX processing libraryzod
: Schema validationtypescript
: TypeScript support
Error Handling
All tools include comprehensive error handling with detailed error messages for:
- File not found errors
- Invalid file format
- Processing errors
- Permission issues
Debugging
Since MCP servers communicate over stdio, debugging can be challenging. We recommend using the MCP Inspector, which is available as a package script:
npm run inspector
The Inspector will provide a URL to access debugging tools in your browser.
Version History
- v0.2.0: Complete rewrite with mammoth library, added 5 comprehensive tools
- v0.1.0: Basic text extraction with docx-parser (deprecated)
License
ISC License