mcp-document-server by danielitus - MCP Server

MCP Document Server

A Model Context Protocol (MCP) server that enables Claude to interact with and analyze a collection of documents. This server provides tools for searching, reading, and extracting information from various document formats.

Features
Prerequisites
Installation
Configuration
Usage
Available Tools
Supported File Formats
Examples
Environment Variables
Troubleshooting
Development

Features

Multi-format Support: Read and process various document formats including plain text, Markdown, PDF, and JSON
Full-text Search: Search across all documents with configurable case sensitivity
Document Management: List all documents with metadata (size, modification date, type)
Content Extraction: Extract specific sections from documents using regex patterns
Resource Access: Direct access to document contents through MCP resource URIs
Real-time Updates: Automatically detects new documents added to the monitored directory

Prerequisites

Node.js 18.0 or higher
npm or yarn package manager
Claude Desktop application

Installation

Clone or download this repository:

git clone <repository-url>
cd mcp-document-server

Install dependencies:

npm install

Create a documents directory (or prepare your existing directory):

mkdir documents

Configuration

Claude Desktop Configuration

Open Claude Desktop settings
Navigate to the Developer settings
Add the following to your MCP servers configuration:

{
  "mcpServers": {
    "document-server": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-document-server/src/index.js"],
      "env": {
        "DOCUMENTS_PATH": "/absolute/path/to/your/documents"
      }
    }
  }
}

Note: Replace /absolute/path/to/ with the actual paths on your system.

Configuration Options

DOCUMENTS_PATH: Environment variable to specify the directory containing your documents (defaults to ./documents)

Usage

Starting the Server

The server starts automatically when Claude Desktop launches if configured correctly. To test it manually:

# Using default documents directory
node src/index.js

# Using custom documents directory
DOCUMENTS_PATH=/path/to/docs node src/index.js

Adding Documents

Simply place your documents in the configured documents directory. The server will automatically detect them when:

Listing documents
Performing searches
Accessing resources

Interacting Through Claude

Once configured, you can ask Claude to:

List all documents:
- "Show me all available documents"
- "What documents do you have access to?"
Search across documents:
- "Search for 'machine learning' in all documents"
- "Find all mentions of 'API' (case sensitive)"
Read specific documents:
- "Read the contents of example.txt"
- "Show me what's in the README file"
Extract sections:
- "Extract all headings from the markdown file"
- "Find all sections starting with '##' in documentation.md"

Available Tools

1. `list_documents`

Lists all available documents with metadata.

Parameters: None

Returns:

{
  "documents": [
    {
      "name": "example.txt",
      "path": "/full/path/to/example.txt",
      "size": 1234,
      "modified": "2024-01-15T10:30:00.000Z",
      "type": "txt"
    }
  ]
}

2. `search_documents`

Searches for text across all documents.

Parameters:

query (required): Text to search for
case_sensitive (optional): Boolean for case-sensitive search (default: false)

Returns:

{
  "results": [
    {
      "file": "example.txt",
      "path": "/full/path/to/example.txt",
      "matches": [
        {
          "line": 5,
          "text": "This line contains the search term"
        }
      ],
      "total_matches": 3
    }
  ]
}

3. `extract_sections`

Extracts sections from a document based on regex patterns.

Parameters:

file_path (required): Path to the document
pattern (required): Regex pattern to match sections

Returns:

{
  "sections": [
    {
      "heading": "## Section Title",
      "line_start": 10,
      "line_end": 25,
      "content": "Section content..."
    }
  ]
}

Supported File Formats

Plain Text (.txt): Direct text reading
Markdown (.md): Treated as plain text with full markdown syntax preserved
PDF (.pdf): Text extraction from PDF documents
JSON (.json): Pretty-printed JSON content

Examples

Example 1: Document Research Workflow

User: "I need to find all mentions of authentication in my documentation"

Claude will:
1. Use search_documents tool with query "authentication"
2. Present all matches with file names and line numbers
3. Offer to read specific documents for more context

Example 2: Extracting API Documentation

User: "Extract all API endpoints from my API.md file"

Claude will:
1. Use extract_sections with pattern "^###.*endpoint|^###.*api"
2. Return all matching sections with their content

Example 3: Document Overview

User: "Give me an overview of all technical documents"

Claude will:
1. Use list_documents to get all files
2. Filter for technical documentation
3. Provide summary with file sizes and last modified dates

Environment Variables

Variable	Description	Default
`DOCUMENTS_PATH`	Path to the documents directory	`./documents`

Troubleshooting

Common Issues

Server not appearing in Claude:
- Verify the configuration path is absolute, not relative
- Check that Node.js is in your system PATH
- Restart Claude Desktop after configuration changes
Documents not found:
- Ensure DOCUMENTS_PATH is set correctly
- Check file permissions on the documents directory
- Verify documents are in the root of the directory (not subdirectories)
PDF reading errors:
- Some PDFs may have text extraction issues
- Try converting to text format if problems persist

Debug Mode

To see server logs:

node src/index.js 2> server.log

Development

Project Structure

mcp-document-server/
├── src/
│   └── index.js        # Main server implementation
├── documents/          # Default documents directory
├── package.json        # Project dependencies
└── README.md          # This file

Adding New Features

To add support for new file formats:

Add the file extension to getMimeType() method
Add parsing logic in loadDocument() method
Install any necessary parsing libraries

Contributing

Feel free to submit issues or pull requests for:

New file format support
Additional search capabilities
Performance improvements
Bug fixes

License

[Your chosen license]

Acknowledgments

Built using the Model Context Protocol SDK by Anthropic.

danielitus/mcp-document-server

list_documents

search_documents

extract_sections