shiosalt/powerpoint-analyzer-mcp
If you are the rightful owner of powerpoint-analyzer-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The PowerPoint Analyzer MCP Server is a tool designed to enhance the search and extraction capabilities of PowerPoint files by utilizing their structure and text formatting attributes.
PowerPoint Analyzer MCP Server
An MCP server that enables search and extraction using PowerPoint structure and text formatting attributes.
Background
Most AI Agent searches that claim PowerPoint support typically ignore PowerPoint file structure and only extract text for searching. This tool enables outputting text written in bold and other structured information.
Features
- Text formatting detection: Bold, italic, underline, strikethrough, highlighting, hyperlinks
- Font analysis: Font sizes, colors, and styling information
- Slide querying: Query slides with flexible filtering criteria
- Table data extraction: Extract table data with formatting detection
- Testing suite for formatting detection validation
- Implementation using Python standard libraries (no external PowerPoint dependencies)
- Direct XML parsing for processing
- Built with FastMCP 2.0
Project Structure
powerpoint-analyzer/
āāā main.py # Main FastMCP server entry point
āāā powerpoint_mcp_server/ # Core server implementation
ā āāā server.py # Main MCP server implementation
ā āāā config.py # Configuration management
ā āāā core/ # Core functionality
ā āāā utils/ # Utility modules
āāā tests/ # Test files
ā āāā test_powerpoint_fastmcp.py # Main server tests
ā āāā test_formatting_detection.py # Formatting detection tests
ā āāā ... # Other test files
āāā scripts/ # Utility scripts
ā āāā health_check.py # Server health check
ā āāā start_server.py # Alternative server startup
āāā requirements.txt # Python dependencies
āāā pytest.ini # Test configuration
āāā README.md # Documentation
Installation
- Clone the repository:
git clone <repository-url>
cd powerpoint-analyzer
- Install dependencies:
pip install -r requirements.txt
- Configure your AI agent (Claude Desktop, etc.) by adding the following to your configuration file:
Location of mcp_settings.json:
- macOS:
~/Library/Application Support/Claude/mcp_settings.json
- Windows:
%APPDATA%\Claude\mcp_settings.json
{
"mcpServers": {
"powerpoint-analyzer-mcp": {
"command": "python",
"args": ["/path/to/your/powerpoint-analyzer/main.py"],
"env": {
"POWERPOINT_MCP_LOG_LEVEL": "INFO"
}
}
}
}
Example with actual paths:
macOS/Linux:
{
"mcpServers": {
"powerpoint-analyzer-mcp": {
"command": "python",
"args": ["/Users/username/powerpoint-analyzer/main.py"],
"env": {
"POWERPOINT_MCP_LOG_LEVEL": "INFO"
}
}
}
}
Windows:
{
"mcpServers": {
"powerpoint-analyzer-mcp": {
"command": "python",
"args": ["C:\\Users\\username\\powerpoint-analyzer\\main.py"],
"env": {
"POWERPOINT_MCP_LOG_LEVEL": "INFO"
}
}
}
}
Technical Approach
This server processes PowerPoint files using the following approach:
- Direct ZIP handling: .pptx files are processed as ZIP archives using Python's
zipfile
module - XML parsing: Internal PowerPoint XML structure is parsed using
xml.etree.ElementTree
with namespace support - Dual formatting detection: Supports both XML attribute and child element formats for text formatting properties
- No external dependencies: Uses only Python standard library modules for PowerPoint processing
- Processing: Extracts only the required information without loading entire presentations into memory
- Caching: Caching system for performance on repeated operations
Text Formatting Detection
The server provides text formatting detection capabilities:
Supported Formatting Types
- Bold text: Detects bold formatting in text elements
- Italic text: Identifies italic styling across slides
- Underlined text: Finds underlined text with underline styles
- Strikethrough text: Detects strikethrough formatting
- Highlighted text: Identifies highlighted/background colored text
- Hyperlinks: Extracts hyperlink information and relationship IDs
- Font properties: Analyzes font sizes, colors (RGB and scheme colors)
Technical Implementation
- Dual detection method: Checks both XML attributes (
b="1"
) and child elements (<a:b val="1"/>
) - Namespace-aware parsing: Handling of Office Open XML namespaces
- Validation: Test suite for detection across different PowerPoint versions
- Debug capabilities: Debugging tools for troubleshooting formatting detection issues
Validation and Testing
- Test suite:
tests/test_formatting_detection.py
validates formatting types - Debug tools:
tests/debug_formatting_detection.py
provides XML analysis - Validation: Tested with PowerPoint files containing mixed formatting
Usage
Running the Server
python main.py
Available Tools
This MCP server provides three core tools:
- extract_formatted_text: Extract text with specific formatting types (bold, italic, underline, strikethrough, highlight, hyperlinks, font sizes, font colors)
- query_slides: Query slides with flexible filtering criteria
- extract_table_data: Extract table data with flexible selection and formatting detection
Development
Requirements
- Python 3.8+
- MCP (Model Context Protocol)
- FastMCP 2.0
- Standard Python libraries (zipfile, xml.etree.ElementTree)
Recent Updates
Version 2.0 - Advanced Text Formatting Detection
- Fixed critical formatting detection bug: Bold, italic, underline, and strikethrough attributes now correctly detected
- Dual detection support: Handles both XML attribute and child element formats
- Comprehensive test suite: Added extensive tests for all formatting types
- Debug tools: New debugging utilities for troubleshooting formatting issues
- Enhanced MCP tools: New tools for formatted text extraction and analysis
Formatting Detection Validation
All formatting types have been thoroughly tested and validated:
- ā Bold text: 8 elements detected in test files
- ā Italic text: 6 elements detected in test files
- ā Underlined text: 6 elements detected in test files
- ā Strikethrough text: 6 elements detected in test files
- ā Highlighted text: 5 elements detected in test files
- ā Hyperlinks: 2 elements detected in test files
- ā Font colors: Multiple colors detected and analyzed
License
This project is licensed under the Apache License 2.0 - see the file for details.