yt-obsidian-mcp

sumit1612/yt-obsidian-mcp

3.2

If you are the rightful owner of yt-obsidian-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The YouTube Archivist MCP Server is designed to archive YouTube videos into Obsidian with semantic hierarchical tagging, transforming streaming content into structured knowledge assets.

Tools
4
Resources
0
Prompts
0

YouTube Archivist MCP Server

A Model Context Protocol (MCP) server for archiving YouTube videos to Obsidian with semantic hierarchical tagging. Transform ephemeral streaming content into permanent, structured, discoverable knowledge assets for your Second Brain.

Features

  • Automatic Transcript Extraction: Uses youtube-transcript-api with fallback strategies (Manual → Auto → Translated)
  • Rich Metadata: Extracts comprehensive video metadata via yt-dlp
  • Hierarchical Tagging: Poly-hierarchical taxonomy system (e.g., MATHEMATICS/LINEAR_ALGEBRA/SVD)
  • Timestamped Transcripts: Generates clickable timestamps linked to video moments
  • Security First: Path traversal protection prevents unauthorized filesystem access
  • Taxonomy Management: Scans vault to maintain tag consistency and prevent drift
  • Obsidian Integration: Generates properly formatted Markdown with YAML frontmatter

Architecture

The project follows professional software engineering patterns:

  • Strategy Pattern: Flexible transcript extraction with multiple fallback strategies
  • Singleton Pattern: Centralized configuration management
  • Factory Pattern: Consistent Markdown note generation
  • Hexagonal Architecture: Clean separation between domain logic and adapters
src/archivist/
├── main.py              # FastMCP server implementation
├── config.py            # Singleton configuration
├── core/
│   ├── extractors.py    # YouTube transcript extraction strategies
│   ├── processors.py    # Transcript chunking and formatting
│   └── taxonomy.py      # Hierarchical tag validation
└── adapters/
    ├── youtube_api.py   # yt-dlp metadata wrapper
    └── filesystem.py    # Secure Obsidian vault I/O

Installation

Prerequisites

  • Python 3.10+
  • uv (recommended) or pip
  • Obsidian vault
  • Claude Desktop

Setup

  1. Clone the repository:
git clone <repository-url>
cd yt-archieve-mcp
  1. Install dependencies:

Using uv (recommended):

uv sync

Using pip:

pip install -e .
  1. Configure Claude Desktop:

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent on your platform:

{
  "mcpServers": {
    "obsidian-archivist": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/yt-archieve-mcp",
        "run",
        "python",
        "src/archivist/main.py"
      ],
      "env": {
        "VAULT_ROOT": "/absolute/path/to/your/ObsidianVault",
        "DEFAULT_FOLDER": "Inbox",
        "DEFAULT_LANGUAGES": "en",
        "CHUNK_DURATION": "60"
      }
    }
  }
}

Environment Variables

VariableDescriptionDefaultRequired
VAULT_ROOTAbsolute path to Obsidian vault-Yes
DEFAULT_FOLDERDefault folder for new notesInboxNo
DEFAULT_LANGUAGESComma-separated language codesenNo
CHUNK_DURATIONTranscript chunk duration (seconds)60No
YOUTUBE_PROXYOptional proxy URL-No
DEBUG_MODEEnable debug loggingfalseNo

Usage

Basic Workflow

  1. Open Claude Desktop
  2. Provide a YouTube URL:
Archive this video: https://www.youtube.com/watch?v=VIDEO_ID

Claude will automatically:

  1. Extract video metadata and transcript
  2. Analyze the content
  3. Generate hierarchical tags following your vault's taxonomy
  4. Create a comprehensive summary
  5. Save a formatted Markdown note to your vault

Available MCP Tools

The server exposes these tools to Claude:

process_video

Extract complete video context including metadata and transcript.

Parameters:

  • url (required): YouTube video URL
  • languages (optional): Preferred transcript languages (e.g., ["en", "es"])
  • chunk_duration (optional): Duration for transcript chunks in seconds (default: 60)
save_note

Save a formatted Markdown note to the Obsidian vault.

Parameters:

  • filename: Sanitized filename (without extension)
  • title: Note title
  • url: Video URL
  • channel: Channel name
  • upload_date: Upload date (YYYY-MM-DD)
  • tags: List of hierarchical tags (e.g., ["MATHEMATICS/LINEAR_ALGEBRA"])
  • summary: Video summary
  • transcript: Formatted transcript
  • folder (optional): Target folder in vault
validate_tags

Validate tags against taxonomy rules.

Parameters:

  • tags: List of tags to validate
get_existing_tags

Scan the vault and return all existing hierarchical tags.

Parameters: None

MCP Resources

obsidian://vault/taxonomy

A JSON resource containing the complete hierarchical tag taxonomy from your vault.

Taxonomy System

Rules

  1. ALL UPPERCASE: Tags must be uppercase (e.g., MATHEMATICS, not mathematics)
  2. Hierarchy with /: Use forward slashes for hierarchy (e.g., MATHEMATICS/LINEAR_ALGEBRA/SVD)
  3. Allowed Roots: Must start with one of:
    • MATHEMATICS
    • COMPUTER_SCIENCE
    • PHYSICS
    • ENGINEERING
    • HUMANITIES
    • PRODUCTIVITY
    • GENERAL

Examples

Correct:

MATHEMATICS/LINEAR_ALGEBRA/SVD
MATHEMATICS/LINEAR_ALGEBRA/EIGENVALUES
COMPUTER_SCIENCE/MACHINE_LEARNING/DIMENSIONALITY_REDUCTION
PHYSICS/QUANTUM_MECHANICS/ENTANGLEMENT

Incorrect:

math/linearalgebra          # lowercase
Math/Linear Algebra         # mixed case, spaces
MATHEMATICS                 # too broad, needs hierarchy
SVD                         # missing root
MACHINE-LEARNING            # hyphens not allowed

Poly-Hierarchical Classification

Videos can have multiple hierarchical tags:

tags:
  - MATHEMATICS/LINEAR_ALGEBRA/SVD
  - COMPUTER_SCIENCE/MACHINE_LEARNING/RECOMMENDATION_SYSTEMS
  - DATA_SCIENCE/ALGORITHMS

This creates a Directed Acyclic Graph (DAG) of knowledge, making content discoverable from multiple perspectives.

Development

Running Tests

# Using uv
uv run pytest

# Using pytest directly
pytest tests/

# With coverage
pytest --cov=src/archivist tests/

Code Quality

# Format code
black src/ tests/

# Lint
ruff check src/ tests/

Project Structure

yt-archieve-mcp/
├── src/archivist/          # Main source code
│   ├── main.py             # MCP server entry point
│   ├── config.py           # Configuration management
│   ├── core/               # Domain logic
│   │   ├── extractors.py   # Transcript extraction
│   │   ├── processors.py   # Transcript processing
│   │   └── taxonomy.py     # Tag management
│   └── adapters/           # External interfaces
│       ├── youtube_api.py  # YouTube data fetching
│       └── filesystem.py   # Obsidian vault I/O
├── tests/                  # Unit tests
├── CLAUDE.md               # System prompt for Claude
├── pyproject.toml          # Project configuration
└── README.md               # This file

Troubleshooting

Claude Desktop doesn't see the server

  1. Check the config path:

    • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
    • Linux: ~/.config/Claude/claude_desktop_config.json
  2. Verify the paths in the config are absolute, not relative

  3. Restart Claude Desktop completely

  4. Check Claude Desktop logs:

    • macOS: ~/Library/Logs/Claude/mcp*.log

Transcript extraction fails

  1. Check if the video has captions enabled
  2. Try specifying multiple languages: languages=["en", "es", "auto"]
  3. If the video is private or restricted, transcripts won't be available
  4. Consider using a proxy if you're rate-limited: Set YOUTUBE_PROXY environment variable

Tags are rejected

  1. Ensure tags are UPPERCASE
  2. Use underscores _, not spaces or hyphens
  3. Use forward slashes / for hierarchy
  4. Check that the root category is in the allowed list
  5. Use validate_tags tool to check tags before saving

Path security errors

The server restricts file operations to within your vault. If you see PathSecurityError:

  1. Verify VAULT_ROOT is set correctly
  2. Ensure you're not using .. in paths
  3. Check that the target folder exists

Security Considerations

  • Path Traversal Protection: All filesystem operations validate paths stay within vault
  • Stdio Transport: Server runs as subprocess, no network ports exposed
  • Sandboxed Environment: Cannot access files outside configured vault root
  • Input Validation: All tool inputs validated via Pydantic schemas

Related Projects

License

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Acknowledgments

Based on the architectural patterns described in the INSTRUCTION.txt document, implementing professional software engineering practices for personal knowledge management.