property-document-classifier-mcp

deshitha-github/property-document-classifier-mcp

3.2

If you are the rightful owner of property-document-classifier-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Property Document Classifier MCP Server is a tool that connects Claude Desktop to local property documents for automatic classification and organization.

Property Document Classifier - MCP Server

MCP Python License

An MCP (Model Context Protocol) server that automatically classifies property documents using Claude Desktop with OCR support.

Read the Full Tutorial

What It Does

This MCP server connects Claude Desktop to your local property documents and enables:

  • Automatic classification into 20+ property document categories
  • PDF text extraction with PyPDF2
  • OCR support for scanned documents using Tesseract
  • File organization into categorized folders
  • Metadata tracking with confidence scores
  • Search and statistics tools

Document Categories

The server classifies documents into 20 property-related categories:

  • Invoices, Receipts, Title Summary
  • Chain Sheet, Property Card(s), Tax Data
  • Mobile Home Data, Mortgage(s), Deeds
  • Covenants, Easements & Right of Ways
  • Leases & Lease Assignments, Plats
  • Liens, Judgments, Estates
  • Power of Attorney, UCC Filings
  • Miscellaneous, Index / Check Sheets

Architecture

User → Claude Desktop (Host) ↓ MCP Client (Protocol Handler) ↓ MCP Server (This Project) ↓ Local Documents (Your PDFs)

šŸš€ Quick Start

Prerequisites

  • Python 3.10 or higher
  • Claude Desktop (Download here)
  • Tesseract OCR

Installation

  1. Clone the repository
git clone https://github.com/YOUR-USERNAME/property-document-classifier-mcp.git
cd property-document-classifier-mcp

Create virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Install Tesseract OCR

macOS:

brew install tesseract

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install tesseract-ocr poppler-utils

Windows:

Download from: https://github.com/UB-Mannheim/tesseract/wiki Add to PATH

Create directories

mkdir documents classified_documents

Configure Claude Desktop

Get your absolute path: bashpwd # Copy this output Edit Claude Desktop config: macOS/Linux: bashcode ~/Library/Application\ Support/Claude/claude_desktop_config.json Windows: bashnotepad %APPDATA%\Claude\claude_desktop_config.json Add this configuration (replace paths with your actual paths): json{ "mcpServers": { "property-classifier": { "command": "/FULL/PATH/TO/venv/bin/python", "args": [ "/FULL/PATH/TO/property_classifier.py" ] } } }

Restart Claude Desktop

Completely quit (Cmd+Q / Alt+F4) and restart.

Test it!

In Claude Desktop: Are you connected to any MCP servers? You should see the property-classifier! Usage Examples Classify a Single Document Can you classify Sample-Deed.pdf from the documents folder? Batch Classification Classify all unclassified documents View Statistics Show me classification statistics Search by Category Show me all Mortgage documents Get All Classifications List all classified documents grouped by category

How It Works

Two-Stage PDF Processing

Direct Text Extraction (PyPDF2)

Fast processing for text-based PDFs Works for digitally created documents

OCR Fallback (Tesseract)

Automatically triggered if no text found Handles scanned documents and images Converts PDF pages to images first

Automatic Organization Classified documents are copied to: classified_documents/ ā”œā”€ā”€ Deeds/ ā”œā”€ā”€ Mortgages/ ā”œā”€ā”€ Tax Data/ └── ... Original files remain in documents/ folder. Metadata Tracking Each classification stores:

Document category Confidence level (high/medium/low) Extraction method used Timestamp Custom notes Organized file path

Stored in classifications.json. Performance Typical processing times:

Text-based PDFs: ~1 second per document Scanned PDFs (OCR): ~3-5 seconds per document 100 documents: ~3-4 minutes total

Security & Privacy

Local-first: All processing happens on your machine No cloud uploads: Documents never leave your computer User control: Claude asks permission before using tools Transparent: All operations visible in Claude Desktop Open source: Audit the code yourself

Troubleshooting See TROUBLESHOOTING.md for common issues. Quick Fixes Server not connecting:

Verify paths in config are absolute Check Python is in PATH Restart Claude Desktop completely

OCR not working: bash# Check Tesseract installation tesseract --version

macOS

brew install tesseract

Ubuntu

sudo apt-get install tesseract-ocr "Read-only file system" error:

Make sure documents/ folder exists Check file permissions

Documentation

Installation Guide Architecture Overview Troubleshooting Guide Medium Article - Full tutorial

Learning Resources About MCP:

Official MCP Documentation MCP Specification [Anthropic's Announcement](https://www.anthropic.com/news/