vedantparmar12/Document-Automation
If you are the rightful owner of Document-Automation and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A sophisticated Model Context Protocol (MCP) server that enables AI assistants to automatically analyze codebases and generate comprehensive, professional documentation.
Document Automation - Comprehensive Documentation
A powerful Python-based documentation automation tool that analyzes codebases and generates comprehensive documentation with multiple export formats.
Table of Contents
- Overview
- Features
- Architecture
- Prerequisites
- Installation
- Configuration
- Usage
- API Reference
- Project Structure
- How It Works
- Deployment
- Contributing
- Troubleshooting
- License
Overview
Document-Automation is a powerful Python-based tool designed to automatically analyze codebases and generate comprehensive documentation. It provides intelligent codebase analysis, multiple output formats, and professional-grade documentation generation capabilities.
Why Use Document-Automation?
- Comprehensive Analysis: Deep codebase inspection with AST parsing
- Multiple Formats: Generate HTML, PDF, Markdown, and interactive documentation
- Professional Quality: Enterprise-ready documentation with modern themes
- Automated Workflows: Reduce manual documentation overhead
- Framework Detection: Intelligent technology stack analysis
- Database Integration: Schema analysis and ER diagram generation
Features
Core Capabilities
- Codebase Analysis: Complete project structure analysis with metrics
- AST Parsing: Deep code analysis for Python and JavaScript
- Framework Detection: Automatic technology stack identification
- Database Schema Analysis: SQL schema extraction and visualization
- Security Analysis: Code security assessment and recommendations
- Interactive Documentation: Modern, searchable documentation interfaces
Output Formats
- Interactive HTML with search and navigation
- Professional PDF reports
- Markdown documentation
- Confluence-ready content
- JSON data exports
- LaTeX and academic formats
Advanced Features
- Mermaid Diagrams: Architecture and database relationship diagrams
- Multi-language Support: Internationalization capabilities
- Custom Themes: Modern, dark, corporate, and minimal themes
- Accessibility Compliance: WCAG 2.1 AA compliant output
- Responsive Design: Mobile-friendly documentation
- Background Processing: Handle large codebases efficiently
Architecture
Document Automation is an advanced documentation generation tool designed to analyze codebases and automatically create comprehensive, professional documentation. Built with Python and leveraging modern technologies like FastAPI, AST parsing, and multiple export formats, this tool streamlines the documentation process for developers and teams.
Why Use Document Automation?
- Automated Analysis: Automatically analyzes your codebase structure, dependencies, and architecture
- Multiple Formats: Export to HTML, PDF, Markdown, DOCX, and more
- Interactive Documentation: Generate searchable, navigable documentation
- Framework Detection: Automatically detects and documents frameworks and technologies
- Database Schema Analysis: Analyzes and documents database structures
- Security Analysis: Identifies potential security issues
- Mermaid Diagrams: Auto-generates architecture and flow diagrams
Features
Core Features
- š Comprehensive Codebase Analysis: AST parsing, dependency analysis, and framework detection
- š Multiple Export Formats: HTML, PDF, Markdown, DOCX, Confluence, Notion
- šØ Professional Themes: Modern, minimal, dark, corporate, and custom themes
- š Security Analysis: Built-in security scanning and vulnerability detection
- š Interactive Diagrams: Auto-generated Mermaid diagrams for architecture visualization
- š Multi-language Support: Supports Python, JavaScript, and more
- š± Responsive Design: Mobile-friendly documentation output
- š Search Functionality: Full-text search in generated documentation
- āæ Accessibility Compliance: WCAG 2.1 AA compliant outputs
Advanced Features
- Concurrent Processing: Multi-threaded analysis for large codebases
- Pagination Support: Handle large repositories with smart pagination
- Background Processing: Async processing for improved performance
- Custom CSS Support: Inject custom styles for branding
- API Endpoint Discovery: Automatically documents REST APIs
- Database Schema Visualization: ER diagrams and relationship mapping
System Components
1. Analyzers Module (src/analyzers/
)
- BaseAnalyzer: Core analysis functionality
- CodebaseAnalyzer: Project structure and file analysis
- DatabaseAnalyzer: SQL schema and relationship analysis
- FrameworkDetector: Technology stack identification
2. Parsers Module (src/parsers/
)
- ASTAnalyzer: Abstract syntax tree parsing
- PythonParser: Python-specific code analysis
- JavaScriptParser: JavaScript code analysis
- ParserFactory: Language-agnostic parser selection
System Architecture
flowchart TB
title["Document-Automation Architecture"]
subgraph "Input Layer"
A[Code Repository]
B[Configuration Files]
C[Custom Templates]
end
subgraph "Analysis Layer"
D[Codebase Analyzer]
E[AST Parser]
F[Framework Detector]
G[Database Analyzer]
H[Security Scanner]
end
subgraph "Processing Layer"
I[Concurrent Processor]
J[Background Tasks]
K[Token Estimator]
L[Pagination Manager]
end
subgraph "Generation Layer"
M[Documentation Generator]
N[Diagram Generator]
O[Template Engine]
P[Format Exporter]
end
subgraph "Output Layer"
Q[HTML Documentation]
R[PDF Reports]
S[Markdown Files]
T[Interactive Docs]
end
A --> D
B --> D
C --> O
D --> E
D --> F
D --> G
D --> H
E --> I
F --> I
G --> I
H --> I
I --> J
I --> K
I --> L
J --> M
K --> M
L --> M
M --> N
M --> O
M --> P
N --> Q
O --> Q
P --> R
P --> S
P --> T
Component Overview
Analyzers
- BaseAnalyzer: Core analysis functionality
- CodebaseAnalyzer: Repository structure analysis
- DatabaseAnalyzer: Database schema analysis
- FrameworkDetector: Technology stack detection
Parsers
- ASTAnalyzer: Abstract Syntax Tree parsing
- PythonParser: Python-specific parsing
- JavaScriptParser: JavaScript-specific parsing
- BaseParser: Generic parsing functionality
Generators
- DocumentationGenerator: Core documentation generation
- InteractiveDocGenerator: Interactive HTML generation
- ProfessionalDocGenerator: Professional format generation
Export & Processing
- FormatExporter: Multi-format export capability
- ConcurrentAnalyzer: Parallel processing
- BackgroundProcessor: Async task management
Prerequisites
System Requirements
- Python 3.8 or higher
- Git (for repository analysis)
- 4GB RAM minimum (8GB recommended for large projects)
- 1GB free disk space
Required Dependencies
# Core Dependencies
fastapi>=0.104.1
uvicorn[standard]>=0.24.0
pydantic>=2.5.0
starlette>=0.27.0
# Analysis & Parsing
tree-sitter>=0.20.4
tree-sitter-python>=0.20.4
tree-sitter-javascript>=0.20.3
gitpython>=3.1.40
# Documentation Generation
mkdocs>=1.5.3
markdown-it-py>=3.0.0
jinja2>=3.1.2
markdown>=3.5.1
# Export Formats
reportlab>=4.0.7
weasyprint>=60.2
python-docx>=1.1.0
openpyxl>=3.1.2
# Visualization
matplotlib>=3.8.2
plotly>=5.17.0
mermaid-py>=0.3.0
# Processing
pandas>=2.1.4
numpy>=1.24.4
sqlalchemy>=2.0.23
celery>=5.3.4
redis>=5.0.1
Required Dependencies
# Core dependencies
fastapi>=0.68.0
uvicorn[standard]>=0.15.0
pydantic>=1.8.0
sqlalchemy>=1.4.0
requests>=2.25.0
# Analysis libraries
tree-sitter>=0.20.0
tree-sitter-python>=0.20.0
tree-sitter-javascript>=0.20.0
gitpython>=3.1.0
# Documentation generation
jinja2>=3.0.0
markdown>=3.3.0
weasyprint>=54.0
matplotlib>=3.3.0
plotly>=5.0.0
mermaid-py>=0.3.0
# Optional dependencies
redis>=4.0.0 # For caching
celery>=5.2.0 # For background processing
Installation
Method 1: pip Installation (Recommended)
# Install from PyPI (when available)
pip install document-automation
# Or install from source
git clone https://github.com/vedantparmar12/Document-Automation.git
cd Document-Automation
pip install -r requirements.txt
Method 2: Docker Installation
# Pull the Docker image
docker pull vedantparmar12/document-automation:latest
# Run with volume mounting
docker run -v /path/to/your/project:/app/input \
-v /path/to/output:/app/output \
vedantparmar12/document-automation:latest
Method 3: Development Setup
# Clone repository
git clone https://github.com/vedantparmar12/Document-Automation.git
cd Document-Automation
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run development server
python run_server.py
Configuration
Environment Variables
Create a .env
file in the project root:
# Server Configuration
HOST=0.0.0.0
PORT=8000
DEBUG=True
WORKERS=4
# Processing Configuration
MAX_CONCURRENT_ANALYSES=3
DEFAULT_TIMEOUT=300
MAX_FILE_SIZE=10MB
# Export Configuration
DEFAULT_THEME=modern
DEFAULT_FORMAT=interactive
ENABLE_PDF_EXPORT=True
ENABLE_SEARCH=True
# Security Configuration
VALIDATE_PATHS=True
SANDBOX_MODE=False
MAX_ANALYSIS_TIME=3600
# External Services (Optional)
REDIS_URL=redis://localhost:6379
DATABASE_URL=sqlite:///./analysis.db
Configuration File
Create config.yaml
:
analysis:
max_files: 1000
include_patterns:
- "*.py"
- "*.js"
- "*.ts"
- "*.jsx"
- "*.tsx"
- "*.sql"
exclude_patterns:
- "node_modules"
- "__pycache__"
- ".git"
- "*.pyc"
- "dist"
- "build"
documentation:
title: "Auto-Generated Documentation"
author: "Document Automation"
version: "1.0.0"
theme: "modern"
include_toc: true
include_search: true
include_diagrams: true
export:
formats:
- html
- pdf
- markdown
output_dir: "./docs"
responsive_design: true
accessibility_compliance: true
security:
validate_inputs: true
sanitize_paths: true
max_analysis_depth: 10
allowed_extensions:
- .py
- .js
- .ts
- .md
- .sql
Usage
Command Line Interface
# Basic usage
python -m document_automation analyze /path/to/project
# With custom output format
python -m document_automation analyze /path/to/project --format html --theme modern
# Multiple formats
python -m document_automation analyze /path/to/project --formats html,pdf,markdown
# With custom configuration
python -m document_automation analyze /path/to/project --config config.yaml
# GitHub repository analysis
python -m document_automation analyze-repo https://github.com/user/repo
# Server mode
python -m document_automation serve --host 0.0.0.0 --port 8000
Web Server
# Start the web server
python run_server.py
# Or using uvicorn directly
uvicorn src.server:app --host 0.0.0.0 --port 8000 --reload
Python API Usage
from src.analyzers import CodebaseAnalyzer
from src.generators import DocumentationGenerator
from src.export import FormatExporter
# Initialize components
analyzer = CodebaseAnalyzer()
generator = DocumentationGenerator()
exporter = FormatExporter()
# Analyze codebase
analysis_result = analyzer.analyze_repository("/path/to/project")
# Generate documentation
documentation = generator.generate(
analysis_result,
theme="modern",
include_diagrams=True
)
# Export to multiple formats
exporter.export_multiple(
documentation,
formats=["html", "pdf", "markdown"],
output_dir="./docs"
)
REST API Usage
import requests
# Start analysis
response = requests.post("http://localhost:8000/analyze", json={
"path": "/path/to/project",
"include_ast_analysis": True,
"include_security_analysis": True,
"formats": ["html", "pdf"]
})
analysis_id = response.json()["analysis_id"]
# Check status
status = requests.get(f"http://localhost:8000/status/{analysis_id}")
# Download results
docs = requests.get(f"http://localhost:8000/download/{analysis_id}")
API Reference
Core Classes
CodebaseAnalyzer
class CodebaseAnalyzer:
"""Main analyzer for codebase analysis."""
def analyze_repository(self, path: str, **options) -> AnalysisResult:
"""Analyze a repository and return structured results."""
def analyze_files(self, files: List[str], **options) -> AnalysisResult:
"""Analyze specific files."""
def get_metrics(self, analysis: AnalysisResult) -> Dict:
"""Extract metrics from analysis results."""
DocumentationGenerator
class DocumentationGenerator:
"""Generate documentation from analysis results."""
def generate(self, analysis: AnalysisResult, **options) -> Documentation:
"""Generate documentation in specified format."""
def generate_interactive(self, analysis: AnalysisResult) -> str:
"""Generate interactive HTML documentation."""
def generate_api_docs(self, analysis: AnalysisResult) -> str:
"""Generate API documentation."""
FormatExporter
class FormatExporter:
"""Export documentation to various formats."""
def export_html(self, content: str, output_path: str) -> bool:
"""Export to HTML format."""
def export_pdf(self, content: str, output_path: str) -> bool:
"""Export to PDF format."""
def export_multiple(self, content: str, formats: List[str], output_dir: str) -> Dict:
"""Export to multiple formats simultaneously."""
REST API Endpoints
Analysis Endpoints
POST /analyze
Content-Type: application/json
{
"path": "/path/to/project",
"source_type": "local",
"include_ast_analysis": true,
"include_security_analysis": true,
"include_diagrams": true,
"formats": ["html", "pdf"],
"theme": "modern"
}
GET /status/{analysis_id}
Response: {
"status": "completed",
"progress": 100,
"results_available": true,
"error": null
}
GET /download/{analysis_id}
Response: Binary content or redirect to download URL
Repository Analysis
POST /analyze-repo
Content-Type: application/json
{
"repo_url": "https://github.com/user/repo",
"branch": "main",
"include_ast_analysis": true,
"formats": ["html", "markdown"]
}
Project Structure
Document-Automation/
ā
āāā src/ # Source code
ā āāā __init__.py
ā āāā server.py # FastAPI server
ā āāā schemas.py # Pydantic models
ā ā
ā āāā analyzers/ # Analysis components
ā ā āāā __init__.py
ā ā āāā base_analyzer.py # Base analysis class
ā ā āāā codebase_analyzer.py # Main codebase analyzer
ā ā āāā database_analyzer.py # Database schema analysis
ā ā āāā framework_detector.py # Framework detection
ā ā
ā āāā parsers/ # Code parsers
ā ā āāā __init__.py
ā ā āāā base_parser.py # Base parser class
ā ā āāā ast_analyzer.py # AST analysis
ā ā āāā python_parser.py # Python-specific parsing
ā ā āāā javascript_parser.py # JavaScript parsing
ā ā āāā parser_factory.py # Parser factory
ā ā
ā āāā generators/ # Documentation generators
ā ā āāā __init__.py
ā ā āāā documentation_generator.py
ā ā āāā interactive_doc_generator.py
ā ā āāā professional_doc_generator.py
ā ā
ā āāā export/ # Export functionality
ā ā āāā format_exporter.py
ā ā
ā āāā diagrams/ # Diagram generation
ā ā āāā __init__.py
ā ā āāā mermaid_generator.py
ā ā āāā architecture_diagrams.py
ā ā āāā database_diagrams.py
ā ā
ā āāā processing/ # Processing utilities
ā ā āāā __init__.py
ā ā āāā concurrent_analyzer.py
ā ā āāā background_processor.py
ā ā
ā āāā pagination/ # Pagination handling
ā ā āāā __init__.py
ā ā āāā chunker.py
ā ā āāā strategies.py
ā ā āāā context.py
ā ā āāā token_estimator.py
ā ā
ā āāā security/ # Security validation
ā ā āāā __init__.py
ā ā āāā validation.py
ā ā
ā āāā tools/ # Consolidated tools
ā āāā __init__.py
ā āāā consolidated_documentation_tools.py
ā
āāā docs/ # Generated documentation
āāā tests/ # Test files
āāā templates/ # Documentation templates
āāā static/ # Static assets
āāā requirements.txt # Dependencies
āāā pyproject.toml # Project configuration
āāā package.json # Node.js dependencies (if any)
āāā tsconfig.json # TypeScript configuration
āāā wrangler.toml # Cloudflare Workers config
āāā run_server.py # Server runner
āāā README.md # Project README
How It Works
Analysis Process
- Repository Scanning: Recursively scans the target directory
- File Type Detection: Identifies file types and programming languages
- AST Parsing: Parses source code into Abstract Syntax Trees
- Framework Detection: Identifies frameworks and libraries used
- Dependency Analysis: Maps dependencies and their relationships
- Security Scanning: Identifies potential security issues
- Metric Calculation: Computes code metrics and complexity scores
Documentation Generation
- Template Selection: Chooses appropriate template based on theme
- Content Assembly: Assembles analyzed data into documentation structure
- Diagram Generation: Creates Mermaid diagrams for visualization
- Format Rendering: Renders content in requested formats
- Export Processing: Optimizes and exports final documentation
Supported Analysis Types
- Static Code Analysis: Function/class/variable analysis
- Dependency Mapping: Import/export relationships
- Architecture Analysis: High-level system architecture
- Database Schema: Table relationships and structures
- API Discovery: REST endpoint identification
- Security Scanning: Common vulnerability detection
Deployment
Docker Deployment
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8000"]
# Build and run
docker build -t document-automation .
docker run -p 8000:8000 document-automation
Cloud Deployment
AWS EC2
# Install on EC2 instance
sudo yum update -y
sudo yum install python3 python3-pip git -y
# Clone and setup
git clone https://github.com/vedantparmar12/Document-Automation.git
cd Document-Automation
pip3 install -r requirements.txt
# Run with systemd
sudo nano /etc/systemd/system/document-automation.service
sudo systemctl enable document-automation
sudo systemctl start document-automation
Heroku
# Heroku deployment
heroku create your-app-name
heroku buildpacks:set heroku/python
git push heroku main
Cloudflare Workers
The project includes wrangler.toml
for Cloudflare Workers deployment:
npm install -g @cloudflare/wrangler
wrangler publish
Contributing
We welcome contributions! Here's how to get started:
Development Setup
# Fork and clone the repository
git clone https://github.com/your-username/Document-Automation.git
cd Document-Automation
# Create feature branch
git checkout -b feature/your-feature-name
# Setup development environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
Running Tests
# Run all tests
python -m pytest
# Run with coverage
python -m pytest --cov=src
# Run specific test file
python -m pytest tests/test_analyzer.py
# Run with verbose output
python -m pytest -v
Code Style
We use:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
# Format code
black src/ tests/
isort src/ tests/
# Check linting
flake8 src/ tests/
# Type checking
mypy src/
Contribution Guidelines
- Fork the repository and create a feature branch
- Write tests for new functionality
- Follow code style guidelines
- Update documentation as needed
- Submit a pull request with clear description
Reporting Issues
When reporting issues, please include:
- Python version and OS
- Error messages and stack traces
- Minimal reproducible example
- Expected vs actual behavior
Troubleshooting
Common Issues
Analysis Fails with Large Repositories
# Increase memory limits
export PYTHONHASHSEED=0
export PYTHONMAXMEMORY=8GB
# Use pagination
python -m document_automation analyze /path/to/project --max-files 500
PDF Export Issues
# Install additional dependencies
# On Ubuntu/Debian:
sudo apt-get install wkhtmltopdf
# On macOS:
brew install wkhtmltopdf
# On Windows: Download from https://wkhtmltopdf.org/
Permission Errors
# Ensure proper permissions
chmod +x run_server.py
# Run with proper user permissions
sudo chown -R $(whoami):$(whoami) ./docs/
Performance Optimization
For Large Codebases
# Optimize analysis settings
analyzer = CodebaseAnalyzer(
max_concurrent_files=10,
enable_caching=True,
skip_binary_files=True,
max_file_size="10MB"
)
Memory Usage
# Reduce memory footprint
import gc
# Enable garbage collection
gc.enable()
# Use streaming for large files
analyzer.enable_streaming = True
analyzer.chunk_size = 1024
Debug Mode
# Enable debug logging
export DEBUG=True
export LOG_LEVEL=DEBUG
# Run with verbose output
python -m document_automation analyze /path/to/project --verbose --debug
License
This project is licensed under the MIT License. See the file for details.
MIT License
Copyright (c) 2024 Vedant Parmar
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Quick Start Summary
- Clone the repository:
git clone https://github.com/vedantparmar12/Document-Automation.git
- Install dependencies:
pip install -r requirements.txt
- Start the server:
python run_server.py
- Access the API: Navigate to
http://localhost:8000
- Analyze your project: Use the web interface or REST API
- Download documentation: Get your generated docs in multiple formats
For more detailed information, please refer to the specific sections above or check the project's GitHub repository.
Generated by Document Automation v1.0.0 - Automated documentation generation tool