ARJ999/sharepoint-docling-ocr-mcp-hostinger
If you are the rightful owner of sharepoint-docling-ocr-mcp-hostinger and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Enhanced SharePoint-Docling-OCR MCP Server is a production-ready server designed for seamless document processing with integrated SharePoint, Docling, and OCR workflows, optimized for deployment on Hostinger VPS.
🚀 Enhanced SharePoint-Docling-OCR MCP Server (Hostinger Deployment)
Production-ready MCP server with fully integrated SharePoint, Docling, and OCR workflows for seamless document processing with automatic image text extraction. Optimized for one-click deployment to Hostinger VPS.
🎯 What This MCP Server Does
This MCP server provides enterprise-grade document processing capabilities by integrating three powerful systems into a single, unified workflow. The server connects to your SharePoint environment, processes documents with Docling's advanced conversion engine, and automatically extracts text from images using OCR technology.
✨ Key Innovation: Fully Automatic Integration
The enhanced version automatically handles the complete document processing pipeline in a single command. When you request document processing, the server downloads the document from SharePoint, converts it with Docling while extracting all embedded images, performs OCR on every image to extract text, merges all results into a unified output, and returns a comprehensive document containing both the original text and all image-extracted content.
🆚 Comparison with Standard Versions
| Feature | Standard Version | Enhanced Version |
|---|---|---|
| SharePoint Tools | ✅ 10 | ✅ 10 |
| Docling Tools | ✅ 9 | ✅ 9 |
| OCR Tools | ✅ 4 (separate) | ✅ 4 (separate) |
| Integrated Tools | ⚠️ 3 (no OCR) | ✅ 5 (with OCR) |
| Docling Image Extraction | ❌ No | ✅ Yes |
| Automatic OCR Integration | ❌ Manual only | ✅ Fully Automatic |
| Unified Output | ❌ Fragmented | ✅ Seamless |
| Total Tools | 26 | 28 |
| Deployment Platform | Railway | Hostinger VPS |
📊 Complete Feature Set
🔥 Integrated Tools (5) - The Power Features
-
process_sharepoint_document
- Downloads document from SharePoint
- Converts with Docling
- Returns processed content
- Supports markdown, HTML, JSON export
-
convert_and_upload
- Processes local or remote documents
- Converts with Docling
- Uploads results back to SharePoint
- Automatic format handling
-
batch_process_sharepoint_folder
- Batch processes entire folders
- Comprehensive statistics
- Error handling per document
- Progress tracking
-
process_document_with_ocr 🔥 NEW
- Fully integrated SharePoint + Docling + OCR workflow
- Automatic image detection and OCR
- Unified output with all content
- Supports multiple OCR languages
-
batch_process_with_ocr 🔥 NEW
- Batch process with full OCR integration
- Automatic OCR on all images in all documents
- Complete statistics and summaries
- Configurable language and preprocessing
📁 SharePoint Tools (10)
- list_documents - List files in SharePoint folder
- get_document_content - Download document content
- upload_document - Upload files to SharePoint
- delete_document - Remove documents
- create_folder - Create new folders
- search_documents - Search by keyword
- get_document_metadata - Retrieve file properties
- update_document_metadata - Modify file properties
- copy_document - Copy files within SharePoint
- move_document - Move files between folders
📄 Docling Tools (9)
- convert_document - Convert documents to structured format
- export_to_markdown - Export as Markdown
- export_to_html - Export as HTML
- export_to_json - Export as JSON
- extract_images - Extract images from documents
- get_document_structure - Analyze document structure
- get_tables - Extract tables from documents
- get_text_content - Extract plain text
- clear_cache - Clear processing cache
🖼️ OCR Tools (4)
- extract_text_from_image_file - OCR on single image
- batch_ocr_images - OCR on multiple images
- extract_text_from_scanned_pdf - OCR on scanned PDFs
- get_supported_ocr_languages - List available OCR languages
Total: 28 MCP Tools for comprehensive document processing
🚀 Deploy to Hostinger VPS
Prerequisites
Before deploying, ensure you have:
- A Hostinger VPS account with Docker support
- SSH access to your Hostinger VPS
- Azure AD application credentials for SharePoint access
- SharePoint site URL and document library name
Step 1: Prepare SharePoint Credentials
You need the following information from your Azure AD application:
- SHP_ID_APP - Azure Application (Client) ID
- SHP_ID_APP_SECRET - Azure Application Secret
- SHP_TENANT_ID - Azure Tenant ID
- SHP_SITE_URL - SharePoint Site URL (e.g., https://yourcompany.sharepoint.com/sites/YourSite)
- SHP_DOC_LIBRARY - Document Library name (default: Documents)
Step 2: SSH Deployment to Hostinger
Connect to your Hostinger VPS via SSH and execute the following commands:
# Clone the repository
git clone https://github.com/ARJ999/Enhanced-SharePoint-Docling-OCR-mcp-server.git
cd Enhanced-SharePoint-Docling-OCR-mcp-server
# Create environment file with your credentials
cat > .env << EOF
SHP_ID_APP=your-azure-app-id
SHP_ID_APP_SECRET=your-azure-app-secret
SHP_TENANT_ID=your-azure-tenant-id
SHP_SITE_URL=https://yourcompany.sharepoint.com/sites/YourSite
SHP_DOC_LIBRARY=Documents
EOF
# Build and deploy with Docker Compose
docker compose up -d --build
# Verify deployment
docker compose ps
docker compose logs -f --tail=100
Step 3: Verify Deployment
After deployment, test the MCP endpoint:
curl http://localhost:8080/mcp
You should receive a response indicating the MCP server is running.
Step 4: Configure MCP Clients
Once deployed, configure your MCP clients to connect to the server. Use your Hostinger VPS IP address and port 8080.
Claude Desktop Configuration
Edit the configuration file at:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add the following configuration:
{
"mcpServers": {
"sharepoint-docling-ocr": {
"url": "http://YOUR_VPS_IP:8080/mcp",
"description": "Enhanced SharePoint document processing with Docling and OCR"
}
}
}
Replace YOUR_VPS_IP with your actual Hostinger VPS IP address.
Cursor Configuration
Edit the configuration file at:
- macOS:
~/Library/Application Support/Cursor/cursor_desktop_config.json - Windows:
%APPDATA%\Cursor\cursor_desktop_config.json
Use the same JSON configuration as above.
Windsurf Configuration
Edit the configuration file at:
- macOS:
~/Library/Application Support/Windsurf/windsurf_desktop_config.json - Windows:
%APPDATA%\Windsurf\windsurf_desktop_config.json
Use the same JSON configuration as above.
💡 Usage Examples
Example 1: Integrated Document Processing with OCR
Ask your MCP client:
Extract all information including text from images from "quarterly-report.pdf" in SharePoint
What happens automatically:
- ✅ Downloads from SharePoint
- ✅ Converts with Docling
- ✅ Extracts all images
- ✅ Performs OCR on each image
- ✅ Returns unified output with both document text and image text
Example 2: Batch Processing with OCR
Ask your MCP client:
Process all documents in the "contracts" folder with full OCR integration
Result:
- All documents processed
- All images OCR'd automatically
- Complete statistics provided
- Unified outputs for each document
Example 3: Custom OCR Language
Ask your MCP client:
Process "french-document.pdf" with French OCR
The server uses ocr_language="fra" parameter for accurate French text extraction.
Example 4: SharePoint File Operations
Ask your MCP client:
List all PDF files in the "Reports" folder in SharePoint
The server searches SharePoint and returns a list of matching documents.
🔧 Technical Architecture
System Components
The MCP server integrates three major components into a unified workflow:
SharePoint Client handles authentication with Azure AD using MSAL, manages document operations (upload, download, search), and provides folder and metadata management capabilities.
Docling Processor converts documents to structured formats (Markdown, HTML, JSON), extracts images from documents with configurable resolution, analyzes document structure and tables, and provides caching for improved performance.
OCR Engine uses Tesseract OCR for text extraction, supports multiple languages (English, French, Spanish, etc.), includes preprocessing for better accuracy, and provides confidence scores for extracted text.
Image Extraction Configuration
The enhanced version configures Docling to extract images at high resolution:
pipeline_options = PdfPipelineOptions()
pipeline_options.generate_picture_images = True
pipeline_options.images_scale = 2.0 # Higher resolution for better OCR
Automatic OCR Integration Flow
When processing a document with OCR:
- Convert with Docling (images extracted automatically)
- Extract images from the converted document
- Perform OCR on each image with preprocessing
- Merge OCR results into the document structure
- Export with OCR results included in the output
Unified Output Format
The integrated tools return a unified output that combines document text and OCR-extracted image text:
# Document Title
[Document text content from Docling]
---
## 📷 OCR-Extracted Text from Images
### Image 1
**Confidence:** 94.5%
**Extracted Text:**
[OCR text from image 1]
---
### Image 2
**Confidence:** 89.2%
**Extracted Text:**
[OCR text from image 2]
📁 Supported File Types
| Type | Extensions | Docling | OCR | Integrated |
|---|---|---|---|---|
| PDFs with images | .pdf | ✅ | ✅ | ✅ Auto |
| Scanned PDFs | .pdf | ✅ | ✅ | ✅ Auto |
| Office Docs | .docx, .xlsx, .pptx | ✅ | ✅ | ✅ Auto |
| Images | .jpg, .png, .gif, .bmp | ✅ | ✅ | ✅ Auto |
🔒 Environment Variables
The following environment variables are required for the MCP server to function:
| Variable | Description | Required | Default |
|---|---|---|---|
SHP_ID_APP | Azure Application (Client) ID | ✅ Yes | - |
SHP_ID_APP_SECRET | Azure Application Secret | ✅ Yes | - |
SHP_TENANT_ID | Azure Tenant ID | ✅ Yes | - |
SHP_SITE_URL | SharePoint Site URL | ✅ Yes | - |
SHP_DOC_LIBRARY | Document Library name | ⚠️ Optional | Documents |
PORT | Server port | ⚠️ Optional | 8080 |
Set these variables in the .env file or through your Hostinger environment configuration.
🐛 Troubleshooting
Container Won't Start
Check the Docker logs to identify the issue:
docker compose logs -f
Common causes include missing environment variables, incorrect SharePoint credentials, or port conflicts.
Images Not Being Extracted
Ensure the deployment includes Tesseract OCR system dependencies. Verify that generate_picture_images=True is configured in the Docling processor. Check that the PDF contains actual images rather than scanned pages.
Low OCR Confidence
Try enabling preprocessing with ocr_preprocessing=True. Increase image resolution using images_scale=2.0 or higher. Ensure the correct OCR language is specified (e.g., ocr_language="fra" for French).
MCP Client Can't Connect
Verify the URL includes the /mcp endpoint. Check that port 8080 is accessible on your VPS. Ensure the server is running using docker compose ps. Test the endpoint with curl http://YOUR_VPS_IP:8080/mcp.
SharePoint Authentication Fails
Verify all Azure AD credentials are correct. Ensure the Azure application has the necessary SharePoint permissions. Check that the SharePoint site URL is accessible. Review the server logs for specific authentication error messages.
🛠️ Local Development
For local testing and development before deploying to Hostinger:
Prerequisites
- Python 3.11 or higher
- Tesseract OCR installed on your system
- Azure AD application credentials
Setup
# Clone the repository
git clone https://github.com/ARJ999/Enhanced-SharePoint-Docling-OCR-mcp-server.git
cd Enhanced-SharePoint-Docling-OCR-mcp-server
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file with your credentials
cp .env.example .env
# Edit .env with your actual credentials
# Run the local server
python local-server.py
Testing Locally
Configure your MCP client to use the local server with stdio transport. The local server runs with stdio transport for direct integration with MCP clients during development.
📚 Additional Resources
Documentation
- FastMCP Framework - MCP server framework
- Docling Documentation - Document processing
- Tesseract OCR - OCR engine
- Model Context Protocol - MCP specification
Related Projects
- Hostinger MCP Deployment Reference - Gold standard deployment guide
- Original Railway Version - Railway deployment version
📄 License
MIT License - See LICENSE file for details
🙏 Credits
Built on top of:
- Docling - Document processing framework
- Tesseract OCR - OCR engine
- FastMCP - MCP server framework
- MSAL Python - Microsoft authentication
Made with ❤️ for seamless document processing on Hostinger VPS
Enhanced Hostinger Version - November 15, 2025