docsynthai by raahulrawat - MCP Server

DocSynthAI – Intelligent Document Processing MCP Server

A modular, extensible, next-generation document understanding engine powered by MCP (Model Context Protocol) and Gemini Vision.

Quick Overview

DocSynthAI is a modular document understanding platform built on MCP. It provides:

AI-powered document classification (Gemini Vision integration)
Rule-based & general LLM classification modes
STDIO MCP server and async STDIO client for local/dev integration
Roadmap: Extraction → Validation → Knowledge Graph creation → HTTP/SSE transport

Install & Setup

1. Clone

git clone https://github.com/raahulrawat/docsynthai.git

2. Python packages

Install dependencies (recommended to use a virtualenv):

python -m venv .venv

Run — MCP Server (STDIO) — Current

This starts the MCP server in STDIO mode (default/current). Clients connect over stdio pipes.

Start server (local)

python server.py

Start server (explicit stdio mode)

DOCSYNTH_TRY_HTTP=0 python server.py

By default the server will load rules from classifier_rules.json if present and persist rules to that file. The server exposes the following MCP tools:

setup_classifier
create_rule
get_all_rules
delete_rule
classify_document

Running the STDIO client demo

python mcp_stdio_client.py

The demo will: launch the server subprocess, do MCP initialization, ask for API key, and let you classify a local image file.

Run — HTTP & SSE (Next Release)

Planned in the next release:

HTTP Transport: mcp.run(transport="http", host="0.0.0.0", port=8000) — REST-like access to tools
SSE Transport: streaming support for long-running/extraction tasks

When HTTP is enabled you will be able to run:

python server.py   # will detect DOCSYNTH_TRY_HTTP=1 and bind to host/port

Client libraries will be updated to support HTTP tool discovery and SSE streaming.

Running MCP Server JSON (STDIO config)

Use this sample JSON for external orchestrators or MCP host configs (e.g., Cursor / IDE tool integrations):


{
  "mcpServers": {
    "docsynth": {
      "command": "python",
      "args": [
        "server.py"
      ],
      "transport": {
        "type": "stdio"
      },
      "env": {}
    }
  }
}

Save as .mcp/docsynth-mcp.json or include in your MCP host configuration. This tells an MCP host to spawn server.py and connect via stdio.

Classification Roadmap (current support)

Core pipeline stages we implement or plan to implement — each becomes an MCP tool.

Stage 1 — Classification (current)

Rule-based classification (user-defined rules)
General LLM classification (Gemini Vision)
Single-image & batch classification
Strict JSON response format for downstream parsing

Stage 2 — Extraction (next)

Key–Value pair extraction (KV)
Table extraction → CSV/JSON
Multi-page PDF → page images conversion (optional helper)
Tool: extract_document

Stage 3 — Validation

Field-level validation (PAN/Aadhaar format, dates, totals)
Cross-document validation (e.g., PAN ↔ Bank Statement)
Rule-based & model-assisted validation
Tool: validate_document

Stage 4 — Knowledge Graph Creation

Triplet extraction (subject, predicate, object)
Ontology mapping & transformation
Neo4j / Memgraph integrations
Tool: kg_insert, kg_generate_triplets

Testing & Development Tips

Use DocumentClassifier(mock_mode=True) for fast local tests without Gemini API calls.
Persist rules in classifier_rules.json to re-use definitions across restarts.
To test HTTP mode when implemented, set DOCSYNTH_TRY_HTTP=1 and pass DOCSYNTH_HOST/DOCSYNTH_PORT.
Create pytest tests that launch the server subprocess via stdio and call the client (mock_mode recommended).

Contributing

PRs welcome. Suggested first issues:

HTTP transport adapter & docs
SSE streaming for long extraction jobs
PDF→image helper & multi-page handling
KG connector for Neo4j

Please follow the code style, add tests, and include changelog entries for breaking changes.