mcp-document-server

danielitus/mcp-document-server

3.2

If you are the rightful owner of mcp-document-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The MCP Document Server is a Model Context Protocol server designed to facilitate interaction and analysis of documents by Claude, offering tools for searching, reading, and extracting information from various document formats.

Tools
  1. list_documents

    Lists all available documents with metadata.

  2. search_documents

    Searches for text across all documents.

  3. extract_sections

    Extracts sections from a document based on regex patterns.

MCP Document Server

A Model Context Protocol (MCP) server that enables Claude to interact with and analyze a collection of documents. This server provides tools for searching, reading, and extracting information from various document formats.

Table of Contents

Features

  • Multi-format Support: Read and process various document formats including plain text, Markdown, PDF, and JSON
  • Full-text Search: Search across all documents with configurable case sensitivity
  • Document Management: List all documents with metadata (size, modification date, type)
  • Content Extraction: Extract specific sections from documents using regex patterns
  • Resource Access: Direct access to document contents through MCP resource URIs
  • Real-time Updates: Automatically detects new documents added to the monitored directory

Prerequisites

  • Node.js 18.0 or higher
  • npm or yarn package manager
  • Claude Desktop application

Installation

  1. Clone or download this repository:
git clone <repository-url>
cd mcp-document-server
  1. Install dependencies:
npm install
  1. Create a documents directory (or prepare your existing directory):
mkdir documents

Configuration

Claude Desktop Configuration

  1. Open Claude Desktop settings
  2. Navigate to the Developer settings
  3. Add the following to your MCP servers configuration:
{
  "mcpServers": {
    "document-server": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-document-server/src/index.js"],
      "env": {
        "DOCUMENTS_PATH": "/absolute/path/to/your/documents"
      }
    }
  }
}

Note: Replace /absolute/path/to/ with the actual paths on your system.

Configuration Options

  • DOCUMENTS_PATH: Environment variable to specify the directory containing your documents (defaults to ./documents)

Usage

Starting the Server

The server starts automatically when Claude Desktop launches if configured correctly. To test it manually:

# Using default documents directory
node src/index.js

# Using custom documents directory
DOCUMENTS_PATH=/path/to/docs node src/index.js

Adding Documents

Simply place your documents in the configured documents directory. The server will automatically detect them when:

  • Listing documents
  • Performing searches
  • Accessing resources

Interacting Through Claude

Once configured, you can ask Claude to:

  1. List all documents:

    • "Show me all available documents"
    • "What documents do you have access to?"
  2. Search across documents:

    • "Search for 'machine learning' in all documents"
    • "Find all mentions of 'API' (case sensitive)"
  3. Read specific documents:

    • "Read the contents of example.txt"
    • "Show me what's in the README file"
  4. Extract sections:

    • "Extract all headings from the markdown file"
    • "Find all sections starting with '##' in documentation.md"

Available Tools

1. list_documents

Lists all available documents with metadata.

Parameters: None

Returns:

{
  "documents": [
    {
      "name": "example.txt",
      "path": "/full/path/to/example.txt",
      "size": 1234,
      "modified": "2024-01-15T10:30:00.000Z",
      "type": "txt"
    }
  ]
}

2. search_documents

Searches for text across all documents.

Parameters:

  • query (required): Text to search for
  • case_sensitive (optional): Boolean for case-sensitive search (default: false)

Returns:

{
  "results": [
    {
      "file": "example.txt",
      "path": "/full/path/to/example.txt",
      "matches": [
        {
          "line": 5,
          "text": "This line contains the search term"
        }
      ],
      "total_matches": 3
    }
  ]
}

3. extract_sections

Extracts sections from a document based on regex patterns.

Parameters:

  • file_path (required): Path to the document
  • pattern (required): Regex pattern to match sections

Returns:

{
  "sections": [
    {
      "heading": "## Section Title",
      "line_start": 10,
      "line_end": 25,
      "content": "Section content..."
    }
  ]
}

Supported File Formats

  • Plain Text (.txt): Direct text reading
  • Markdown (.md): Treated as plain text with full markdown syntax preserved
  • PDF (.pdf): Text extraction from PDF documents
  • JSON (.json): Pretty-printed JSON content

Examples

Example 1: Document Research Workflow

User: "I need to find all mentions of authentication in my documentation"

Claude will:
1. Use search_documents tool with query "authentication"
2. Present all matches with file names and line numbers
3. Offer to read specific documents for more context

Example 2: Extracting API Documentation

User: "Extract all API endpoints from my API.md file"

Claude will:
1. Use extract_sections with pattern "^###.*endpoint|^###.*api"
2. Return all matching sections with their content

Example 3: Document Overview

User: "Give me an overview of all technical documents"

Claude will:
1. Use list_documents to get all files
2. Filter for technical documentation
3. Provide summary with file sizes and last modified dates

Environment Variables

VariableDescriptionDefault
DOCUMENTS_PATHPath to the documents directory./documents

Troubleshooting

Common Issues

  1. Server not appearing in Claude:

    • Verify the configuration path is absolute, not relative
    • Check that Node.js is in your system PATH
    • Restart Claude Desktop after configuration changes
  2. Documents not found:

    • Ensure DOCUMENTS_PATH is set correctly
    • Check file permissions on the documents directory
    • Verify documents are in the root of the directory (not subdirectories)
  3. PDF reading errors:

    • Some PDFs may have text extraction issues
    • Try converting to text format if problems persist

Debug Mode

To see server logs:

node src/index.js 2> server.log

Development

Project Structure

mcp-document-server/
ā”œā”€ā”€ src/
│   └── index.js        # Main server implementation
ā”œā”€ā”€ documents/          # Default documents directory
ā”œā”€ā”€ package.json        # Project dependencies
└── README.md          # This file

Adding New Features

To add support for new file formats:

  1. Add the file extension to getMimeType() method
  2. Add parsing logic in loadDocument() method
  3. Install any necessary parsing libraries

Contributing

Feel free to submit issues or pull requests for:

  • New file format support
  • Additional search capabilities
  • Performance improvements
  • Bug fixes

License

[Your chosen license]

Acknowledgments

Built using the Model Context Protocol SDK by Anthropic.