sumit1612/yt-obsidian-mcp
If you are the rightful owner of yt-obsidian-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The YouTube Archivist MCP Server is designed to archive YouTube videos into Obsidian with semantic hierarchical tagging, transforming streaming content into structured knowledge assets.
YouTube Archivist MCP Server
A Model Context Protocol (MCP) server for archiving YouTube videos to Obsidian with semantic hierarchical tagging. Transform ephemeral streaming content into permanent, structured, discoverable knowledge assets for your Second Brain.
Features
- Automatic Transcript Extraction: Uses
youtube-transcript-apiwith fallback strategies (Manual → Auto → Translated) - Rich Metadata: Extracts comprehensive video metadata via
yt-dlp - Hierarchical Tagging: Poly-hierarchical taxonomy system (e.g.,
MATHEMATICS/LINEAR_ALGEBRA/SVD) - Timestamped Transcripts: Generates clickable timestamps linked to video moments
- Security First: Path traversal protection prevents unauthorized filesystem access
- Taxonomy Management: Scans vault to maintain tag consistency and prevent drift
- Obsidian Integration: Generates properly formatted Markdown with YAML frontmatter
Architecture
The project follows professional software engineering patterns:
- Strategy Pattern: Flexible transcript extraction with multiple fallback strategies
- Singleton Pattern: Centralized configuration management
- Factory Pattern: Consistent Markdown note generation
- Hexagonal Architecture: Clean separation between domain logic and adapters
src/archivist/
├── main.py # FastMCP server implementation
├── config.py # Singleton configuration
├── core/
│ ├── extractors.py # YouTube transcript extraction strategies
│ ├── processors.py # Transcript chunking and formatting
│ └── taxonomy.py # Hierarchical tag validation
└── adapters/
├── youtube_api.py # yt-dlp metadata wrapper
└── filesystem.py # Secure Obsidian vault I/O
Installation
Prerequisites
- Python 3.10+
- uv (recommended) or pip
- Obsidian vault
- Claude Desktop
Setup
- Clone the repository:
git clone <repository-url>
cd yt-archieve-mcp
- Install dependencies:
Using uv (recommended):
uv sync
Using pip:
pip install -e .
- Configure Claude Desktop:
Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or the equivalent on your platform:
{
"mcpServers": {
"obsidian-archivist": {
"command": "uv",
"args": [
"--directory",
"/absolute/path/to/yt-archieve-mcp",
"run",
"python",
"src/archivist/main.py"
],
"env": {
"VAULT_ROOT": "/absolute/path/to/your/ObsidianVault",
"DEFAULT_FOLDER": "Inbox",
"DEFAULT_LANGUAGES": "en",
"CHUNK_DURATION": "60"
}
}
}
}
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
VAULT_ROOT | Absolute path to Obsidian vault | - | Yes |
DEFAULT_FOLDER | Default folder for new notes | Inbox | No |
DEFAULT_LANGUAGES | Comma-separated language codes | en | No |
CHUNK_DURATION | Transcript chunk duration (seconds) | 60 | No |
YOUTUBE_PROXY | Optional proxy URL | - | No |
DEBUG_MODE | Enable debug logging | false | No |
Usage
Basic Workflow
- Open Claude Desktop
- Provide a YouTube URL:
Archive this video: https://www.youtube.com/watch?v=VIDEO_ID
Claude will automatically:
- Extract video metadata and transcript
- Analyze the content
- Generate hierarchical tags following your vault's taxonomy
- Create a comprehensive summary
- Save a formatted Markdown note to your vault
Available MCP Tools
The server exposes these tools to Claude:
process_video
Extract complete video context including metadata and transcript.
Parameters:
url(required): YouTube video URLlanguages(optional): Preferred transcript languages (e.g.,["en", "es"])chunk_duration(optional): Duration for transcript chunks in seconds (default: 60)
save_note
Save a formatted Markdown note to the Obsidian vault.
Parameters:
filename: Sanitized filename (without extension)title: Note titleurl: Video URLchannel: Channel nameupload_date: Upload date (YYYY-MM-DD)tags: List of hierarchical tags (e.g.,["MATHEMATICS/LINEAR_ALGEBRA"])summary: Video summarytranscript: Formatted transcriptfolder(optional): Target folder in vault
validate_tags
Validate tags against taxonomy rules.
Parameters:
tags: List of tags to validate
get_existing_tags
Scan the vault and return all existing hierarchical tags.
Parameters: None
MCP Resources
obsidian://vault/taxonomy
A JSON resource containing the complete hierarchical tag taxonomy from your vault.
Taxonomy System
Rules
- ALL UPPERCASE: Tags must be uppercase (e.g.,
MATHEMATICS, notmathematics) - Hierarchy with
/: Use forward slashes for hierarchy (e.g.,MATHEMATICS/LINEAR_ALGEBRA/SVD) - Allowed Roots: Must start with one of:
MATHEMATICSCOMPUTER_SCIENCEPHYSICSENGINEERINGHUMANITIESPRODUCTIVITYGENERAL
Examples
Correct:
MATHEMATICS/LINEAR_ALGEBRA/SVD
MATHEMATICS/LINEAR_ALGEBRA/EIGENVALUES
COMPUTER_SCIENCE/MACHINE_LEARNING/DIMENSIONALITY_REDUCTION
PHYSICS/QUANTUM_MECHANICS/ENTANGLEMENT
Incorrect:
math/linearalgebra # lowercase
Math/Linear Algebra # mixed case, spaces
MATHEMATICS # too broad, needs hierarchy
SVD # missing root
MACHINE-LEARNING # hyphens not allowed
Poly-Hierarchical Classification
Videos can have multiple hierarchical tags:
tags:
- MATHEMATICS/LINEAR_ALGEBRA/SVD
- COMPUTER_SCIENCE/MACHINE_LEARNING/RECOMMENDATION_SYSTEMS
- DATA_SCIENCE/ALGORITHMS
This creates a Directed Acyclic Graph (DAG) of knowledge, making content discoverable from multiple perspectives.
Development
Running Tests
# Using uv
uv run pytest
# Using pytest directly
pytest tests/
# With coverage
pytest --cov=src/archivist tests/
Code Quality
# Format code
black src/ tests/
# Lint
ruff check src/ tests/
Project Structure
yt-archieve-mcp/
├── src/archivist/ # Main source code
│ ├── main.py # MCP server entry point
│ ├── config.py # Configuration management
│ ├── core/ # Domain logic
│ │ ├── extractors.py # Transcript extraction
│ │ ├── processors.py # Transcript processing
│ │ └── taxonomy.py # Tag management
│ └── adapters/ # External interfaces
│ ├── youtube_api.py # YouTube data fetching
│ └── filesystem.py # Obsidian vault I/O
├── tests/ # Unit tests
├── CLAUDE.md # System prompt for Claude
├── pyproject.toml # Project configuration
└── README.md # This file
Troubleshooting
Claude Desktop doesn't see the server
-
Check the config path:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
- macOS:
-
Verify the paths in the config are absolute, not relative
-
Restart Claude Desktop completely
-
Check Claude Desktop logs:
- macOS:
~/Library/Logs/Claude/mcp*.log
- macOS:
Transcript extraction fails
- Check if the video has captions enabled
- Try specifying multiple languages:
languages=["en", "es", "auto"] - If the video is private or restricted, transcripts won't be available
- Consider using a proxy if you're rate-limited: Set
YOUTUBE_PROXYenvironment variable
Tags are rejected
- Ensure tags are UPPERCASE
- Use underscores
_, not spaces or hyphens - Use forward slashes
/for hierarchy - Check that the root category is in the allowed list
- Use
validate_tagstool to check tags before saving
Path security errors
The server restricts file operations to within your vault. If you see PathSecurityError:
- Verify
VAULT_ROOTis set correctly - Ensure you're not using
..in paths - Check that the target folder exists
Security Considerations
- Path Traversal Protection: All filesystem operations validate paths stay within vault
- Stdio Transport: Server runs as subprocess, no network ports exposed
- Sandboxed Environment: Cannot access files outside configured vault root
- Input Validation: All tool inputs validated via Pydantic schemas
Related Projects
License
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Acknowledgments
Based on the architectural patterns described in the INSTRUCTION.txt document, implementing professional software engineering practices for personal knowledge management.