yt-obsidian-mcp by sumit1612 - MCP Server

YouTube Archivist MCP Server

A Model Context Protocol (MCP) server for archiving YouTube videos to Obsidian with semantic hierarchical tagging. Transform ephemeral streaming content into permanent, structured, discoverable knowledge assets for your Second Brain.

Features

Automatic Transcript Extraction: Uses youtube-transcript-api with fallback strategies (Manual → Auto → Translated)
Rich Metadata: Extracts comprehensive video metadata via yt-dlp
Hierarchical Tagging: Poly-hierarchical taxonomy system (e.g., MATHEMATICS/LINEAR_ALGEBRA/SVD)
Timestamped Transcripts: Generates clickable timestamps linked to video moments
Security First: Path traversal protection prevents unauthorized filesystem access
Taxonomy Management: Scans vault to maintain tag consistency and prevent drift
Obsidian Integration: Generates properly formatted Markdown with YAML frontmatter

Architecture

The project follows professional software engineering patterns:

Strategy Pattern: Flexible transcript extraction with multiple fallback strategies
Singleton Pattern: Centralized configuration management
Factory Pattern: Consistent Markdown note generation
Hexagonal Architecture: Clean separation between domain logic and adapters

src/archivist/
├── main.py              # FastMCP server implementation
├── config.py            # Singleton configuration
├── core/
│   ├── extractors.py    # YouTube transcript extraction strategies
│   ├── processors.py    # Transcript chunking and formatting
│   └── taxonomy.py      # Hierarchical tag validation
└── adapters/
    ├── youtube_api.py   # yt-dlp metadata wrapper
    └── filesystem.py    # Secure Obsidian vault I/O

Installation

Prerequisites

Python 3.10+
uv (recommended) or pip
Obsidian vault
Claude Desktop

Setup

Clone the repository:

git clone <repository-url>
cd yt-archieve-mcp

Install dependencies:

Using uv (recommended):

uv sync

Using pip:

pip install -e .

Configure Claude Desktop:

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent on your platform:

{
  "mcpServers": {
    "obsidian-archivist": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/yt-archieve-mcp",
        "run",
        "python",
        "src/archivist/main.py"
      ],
      "env": {
        "VAULT_ROOT": "/absolute/path/to/your/ObsidianVault",
        "DEFAULT_FOLDER": "Inbox",
        "DEFAULT_LANGUAGES": "en",
        "CHUNK_DURATION": "60"
      }
    }
  }
}

Environment Variables

Variable	Description	Default	Required
`VAULT_ROOT`	Absolute path to Obsidian vault	-	Yes
`DEFAULT_FOLDER`	Default folder for new notes	`Inbox`	No
`DEFAULT_LANGUAGES`	Comma-separated language codes	`en`	No
`CHUNK_DURATION`	Transcript chunk duration (seconds)	`60`	No
`YOUTUBE_PROXY`	Optional proxy URL	-	No
`DEBUG_MODE`	Enable debug logging	`false`	No

Usage

Basic Workflow

Open Claude Desktop
Provide a YouTube URL:

Archive this video: https://www.youtube.com/watch?v=VIDEO_ID

Claude will automatically:

Extract video metadata and transcript
Analyze the content
Generate hierarchical tags following your vault's taxonomy
Create a comprehensive summary
Save a formatted Markdown note to your vault

Available MCP Tools

The server exposes these tools to Claude:

`process_video`

Extract complete video context including metadata and transcript.

Parameters:

url (required): YouTube video URL
languages (optional): Preferred transcript languages (e.g., ["en", "es"])
chunk_duration (optional): Duration for transcript chunks in seconds (default: 60)

`save_note`

Save a formatted Markdown note to the Obsidian vault.

Parameters:

filename: Sanitized filename (without extension)
title: Note title
url: Video URL
channel: Channel name
upload_date: Upload date (YYYY-MM-DD)
tags: List of hierarchical tags (e.g., ["MATHEMATICS/LINEAR_ALGEBRA"])
summary: Video summary
transcript: Formatted transcript
folder (optional): Target folder in vault

`validate_tags`

Validate tags against taxonomy rules.

Parameters:

tags: List of tags to validate

`get_existing_tags`

Scan the vault and return all existing hierarchical tags.

Parameters: None

MCP Resources

`obsidian://vault/taxonomy`

A JSON resource containing the complete hierarchical tag taxonomy from your vault.

Taxonomy System

Rules

ALL UPPERCASE: Tags must be uppercase (e.g., MATHEMATICS, not mathematics)
Hierarchy with /: Use forward slashes for hierarchy (e.g., MATHEMATICS/LINEAR_ALGEBRA/SVD)
Allowed Roots: Must start with one of:
- MATHEMATICS
- COMPUTER_SCIENCE
- PHYSICS
- ENGINEERING
- HUMANITIES
- PRODUCTIVITY
- GENERAL

Examples

Correct:

MATHEMATICS/LINEAR_ALGEBRA/SVD
MATHEMATICS/LINEAR_ALGEBRA/EIGENVALUES
COMPUTER_SCIENCE/MACHINE_LEARNING/DIMENSIONALITY_REDUCTION
PHYSICS/QUANTUM_MECHANICS/ENTANGLEMENT

Incorrect:

math/linearalgebra          # lowercase
Math/Linear Algebra         # mixed case, spaces
MATHEMATICS                 # too broad, needs hierarchy
SVD                         # missing root
MACHINE-LEARNING            # hyphens not allowed

Poly-Hierarchical Classification

Videos can have multiple hierarchical tags:

tags:
  - MATHEMATICS/LINEAR_ALGEBRA/SVD
  - COMPUTER_SCIENCE/MACHINE_LEARNING/RECOMMENDATION_SYSTEMS
  - DATA_SCIENCE/ALGORITHMS

This creates a Directed Acyclic Graph (DAG) of knowledge, making content discoverable from multiple perspectives.

Development

Running Tests

# Using uv
uv run pytest

# Using pytest directly
pytest tests/

# With coverage
pytest --cov=src/archivist tests/

Code Quality

# Format code
black src/ tests/

# Lint
ruff check src/ tests/

Project Structure

yt-archieve-mcp/
├── src/archivist/          # Main source code
│   ├── main.py             # MCP server entry point
│   ├── config.py           # Configuration management
│   ├── core/               # Domain logic
│   │   ├── extractors.py   # Transcript extraction
│   │   ├── processors.py   # Transcript processing
│   │   └── taxonomy.py     # Tag management
│   └── adapters/           # External interfaces
│       ├── youtube_api.py  # YouTube data fetching
│       └── filesystem.py   # Obsidian vault I/O
├── tests/                  # Unit tests
├── CLAUDE.md               # System prompt for Claude
├── pyproject.toml          # Project configuration
└── README.md               # This file

Troubleshooting

Claude Desktop doesn't see the server

Check the config path:
- macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
- Windows: %APPDATA%\Claude\claude_desktop_config.json
- Linux: ~/.config/Claude/claude_desktop_config.json
Verify the paths in the config are absolute, not relative
Restart Claude Desktop completely
Check Claude Desktop logs:
- macOS: ~/Library/Logs/Claude/mcp*.log

Transcript extraction fails

Check if the video has captions enabled
Try specifying multiple languages: languages=["en", "es", "auto"]
If the video is private or restricted, transcripts won't be available
Consider using a proxy if you're rate-limited: Set YOUTUBE_PROXY environment variable

Tags are rejected

Ensure tags are UPPERCASE
Use underscores _, not spaces or hyphens
Use forward slashes / for hierarchy
Check that the root category is in the allowed list
Use validate_tags tool to check tags before saving

Path security errors

The server restricts file operations to within your vault. If you see PathSecurityError:

Verify VAULT_ROOT is set correctly
Ensure you're not using .. in paths
Check that the target folder exists

Security Considerations

Path Traversal Protection: All filesystem operations validate paths stay within vault
Stdio Transport: Server runs as subprocess, no network ports exposed
Sandboxed Environment: Cannot access files outside configured vault root
Input Validation: All tool inputs validated via Pydantic schemas

Related Projects

License

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Acknowledgments

Based on the architectural patterns described in the INSTRUCTION.txt document, implementing professional software engineering practices for personal knowledge management.