jonastbrg/paper-intelligence
If you are the rightful owner of paper-intelligence and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Paper Intelligence System (PIS) is a local-first database and assistant layer designed to efficiently organize, analyze, and retrieve research papers.
Paper Intelligence System (PIS)
A local-first database and assistant layer for organizing, analyzing, and retrieving research papers efficiently.
Features
- Paper Management: Add, query, and organize research papers with rich metadata
- Local SQLite Database: Fast, reliable, and fully offline-capable
- YAML Metadata: Human-readable metadata files for each paper
- Flexible Querying: Search by title, author, tags, year, importance, and more
- Export Capabilities: Export summaries and notes to Markdown
- MCP Server: Interact with your paper database through AI assistants (Claude, etc.)
- Extensible: Ready for AI integration, semantic search, and automation
Directory Structure
paper-intelligence/
│
├── papers.db # SQLite database (created on first run)
├── README.md # This file
├── MCP_SETUP.md # MCP server setup guide
├── requirements.txt # Python dependencies
├── pyproject.toml # Python project configuration
├── mcp_server.py # MCP server implementation
├── .gitignore # Git ignore rules
│
├── raw/ # PDF files (gitignored)
├── metadata/ # YAML metadata files (gitignored)
├── scripts/ # Python scripts
│ ├── init_db.py # Database initialization
│ ├── ingest_paper.py # Add new papers
│ ├── query_papers.py # Query and search
│ └── summarize_paper.py # Summarize and export
└── embeddings/ # (Future) Vector embeddings (gitignored)
Setup
1. Install Dependencies
pip install -r requirements.txt
Core dependencies: pyyaml, mcp (for MCP server). Additional dependencies are optional for future features.
2. MCP Server Setup (Optional)
If you want to use this system with AI assistants like Claude:
For Claude Code (CLI)
Add to your Claude Code MCP settings file (~/.config/claude-code/mcp_settings.json):
{
"mcpServers": {
"paper-intelligence": {
"command": "python3",
"args": [
"/path/to/paper-intelligence/mcp_server.py"
]
}
}
}
Replace /path/to/paper-intelligence/ with the actual path to your cloned repository.
Then restart Claude Code or reload the MCP servers.
For Claude Desktop
See for Claude Desktop configuration instructions.
3. Initialize Database
The database has already been initialized, but you can reinitialize it if needed:
python3 scripts/init_db.py
Usage
Add a New Paper
# Move PDF to database (removes original)
python3 scripts/ingest_paper.py path/to/paper.pdf
# Copy PDF to database (keeps original)
python3 scripts/ingest_paper.py path/to/paper.pdf --copy
You'll be prompted to enter:
- Title
- Authors
- Collaborators (optional)
- Publication date (YYYY-MM-DD)
- Summary/Abstract
- Key ideas
- Tags
- Importance rating (1-10)
Query Papers
List all papers:
python3 scripts/query_papers.py list
List with filters:
# Filter by author
python3 scripts/query_papers.py list --author "Smith"
# Filter by tag
python3 scripts/query_papers.py list --tag "robotics"
# Filter by year
python3 scripts/query_papers.py list --year 2024
# Filter by minimum importance
python3 scripts/query_papers.py list --min-importance 8
# Combine filters
python3 scripts/query_papers.py list --tag "ML" --min-importance 7 --year 2024
# Show detailed view
python3 scripts/query_papers.py list --detailed
# Limit results
python3 scripts/query_papers.py list --limit 10
# Sort by importance, date, or title
python3 scripts/query_papers.py list --sort importance
Show specific paper:
python3 scripts/query_papers.py show <paper_id>
Search papers:
python3 scripts/query_papers.py search "adversarial attacks"
View statistics:
python3 scripts/query_papers.py stats
Update Paper Summaries
Interactive update:
python3 scripts/summarize_paper.py update <paper_id>
You can update:
- Summary
- Key ideas
- Personal notes
Export to Markdown:
python3 scripts/summarize_paper.py export <paper_id>
Database Schema
Table: papers
| Column | Type | Description |
|---|---|---|
id | INTEGER | Auto-incrementing ID |
title | TEXT | Paper title |
authors | TEXT | Author list (comma-separated) |
collaborators | TEXT | Key collaborators |
date_published | TEXT | Publication date (YYYY-MM-DD) |
summary | TEXT | Abstract + personal summary |
key_ideas | TEXT | Key insights |
tags | TEXT | Keywords/categories |
importance | INTEGER | Rating (1-10) |
file_path | TEXT | Path to PDF |
metadata_path | TEXT | Path to YAML metadata |
added_at | TEXT | Timestamp of ingestion |
Table: embeddings
(For future semantic search capabilities)
| Column | Type | Description |
|---|---|---|
paper_id | INTEGER | Foreign key to papers |
embedding | BLOB | Vector representation |
model | TEXT | Embedding model name |
created_at | TEXT | Timestamp |
Examples
Example Workflow
# 1. Add a new paper
python3 scripts/ingest_paper.py ~/Downloads/new_paper.pdf
# 2. List all papers
python3 scripts/query_papers.py list
# 3. View a specific paper
python3 scripts/query_papers.py show 1
# 4. Update summary and notes
python3 scripts/summarize_paper.py update 1
# 5. Search for papers on a topic
python3 scripts/query_papers.py search "reinforcement learning"
# 6. Export paper to markdown
python3 scripts/summarize_paper.py export 1
# 7. View statistics
python3 scripts/query_papers.py stats
Future Enhancements
Phase 2: Automation
- Folder watcher for automatic ingestion
- PDF metadata extraction (PyPDF2, pdfplumber)
- API integration (CrossRef, Semantic Scholar)
- Embedding generation for semantic search
Phase 3: AI Integration
- Automatic summarization using LLMs
- Semantic search with vector embeddings
- Related paper recommendations
- REST API for LLM agents
Phase 4: Sync & Collaboration
- Google Drive sync
- Multi-user support
- Citation network visualization
- Obsidian/Notion integration
Tips
- Tags: Use consistent, hierarchical tags (e.g.,
ML/RL,CV/detection) - Importance: Rate based on relevance to your research
- Metadata Files: You can manually edit YAML files in
/metadata/ - Backup: Regularly backup
papers.dband/raw/folder
Troubleshooting
Database locked error:
- Close any SQLite browser tools
- Only one script should write to the database at a time
Import error for yaml:
pip install pyyaml
Permission denied:
chmod +x scripts/*.py
License
Personal research tool. Use freely for academic and research purposes.
Contributing
This is a personal system, but feel free to fork and extend for your needs.
Version: 1.0.0 Last Updated: 2025-10-25