README - InsightsLibrary by v587d

Insights Knowledge Base(IKB) MCP Server

🍭A free, plug-and-play knowledge base. Built-in with 10,000+ high-quality insights reports, packaged as MCP Server, and secure local data storage.

⚠️⚠️ All collected reports in this project come from free resources on official research report websites. ⚠️⚠️

Features

🍾 Zero configuration required, designed for plug-and-play usage.
🚀 Built-in Qwen3-Embedding-0.6B embedding model, related reports can be retrieved through vector search.📢 Report details can also be searched via keyword retrieval.
🍥 over 100 insights reports from well-known consulting firms such as McKinsey, PwC, and BAIN have been collected, including 6,000+ report pages, covering 70+ topics.
💎 Real-time online browsing of full reports in MCP Client.
🎉 Ultra-fast response: All Function_call returns typically <1 second, keyword-based queries <150ms.
🎨 Paste private local documents into the library_files folder (create it manually if absent; name must match). Configure VLM models/parameters in .env (e.g., VLM_MODEL_NAME=qwen2.5-vl-72b-instruct) for local document extraction, parsing, and recognition.
🦉 Permanently free—no wasted effort collecting reports. Share reliable, copyright-compliant resources via issues.
🔔 Commit to weekly report updates; bug fixes depend on personal whim (I'm not an engineer 🤭).

Optimizations as of June 30

Added 2000+ report pages.

Future Directions

Continuous report updates.
Prompt engineering optimization.

Newest Files Profile

{
    "statistics": {
        "total_files": 174,
        "total_pages": 9320,
        "unique_publishers": 9,
        "unique_topics": 93,
        "last_updated": "2025-06-30T10:08:35.928329"
    },
    "details": {
        "publishers": [
            "",
            "Accenture",
            "BAIN",
            "BCG",
            "CBS",
            "Deloite",
            "McKinsey",
            "PWC",
            "亿欧"
        ],
        "topics": [
            "",
            "AI",
            "AI Agent",
            "Africa",
            "Aftermarket",
            "Asian American",
            "Auto",
            "Aviation",
            "Beauty",
            "Business",
            "Chemical industry",
            "Chemicals",
            "Chinese banking",
            "Chinese securities",
            "Consumer Goods",
            "Decarbonation",
            "Decarbonization",
            "Digital",
            "ESG",
            "Economy",
            "Economy and Trade",
            "Education",
            "Electric two wheelers",
            "Employment",
            "Energy",
            "Europe",
            "FMCG",
            "Fashion",
            "Finance",
            "Financial Technology",
            "Financial service",
            "Fintech",
            "Food-meatless",
            "Gen Z",
            "Global banking",
            "Global energy",
            "Global insurance",
            "Global macroeconomic",
            "Global materials",
            "Global private market",
            "Global private markets",
            "Global trade",
            "Grocery",
            "Grocery retail",
            "Health",
            "Healthcare",
            "Human capital",
            "Hydrogen",
            "Insurance",
            "Investing",
            "Investment management",
            "Labor market",
            "Latinos",
            "Low-altitude Economy",
            "Luxury Goods",
            "Luxury goods",
            "M&A",
            "Maritime",
            "Media",
            "Medical Health",
            "Medtech",
            "Net zero",
            "New Energy Vehicle",
            "New era",
            "Packing",
            "Payments",
            "Pet Food",
            "Population",
            "Power",
            "Private Equity",
            "Private market",
            "Productivity",
            "Quantum",
            "Real estate",
            "Retail",
            "Retail Digitalization",
            "Retailers",
            "Risk",
            "Small business",
            "Smart Home",
            "Smart hospital",
            "Sporting goods",
            "Sustainability",
            "Sustainable",
            "Tax-free",
            "Technology",
            "Travel",
            "Truck",
            "United Kingdom",
            "VSOC",
            "Wealth management",
            "Workplace",
            "连锁经营"
        ]
    }
}

Installation (Beginner-Friendly)

💡Pro tip: Stuck? Drag this page to an LLM client (like DeepSeek) for step-by-step guidance. Actually, these instructions were written by DeepSeek too...

Prerequisites: Python 3.12+ (Download from official website and ADD ENVIRONMENT PATH)

Install UV:

pip install uv

1. Clone the project(Confirm successfully installed Git and Git LFS)

git clone https://github.com/v587d/InsightsLibrary.git
cd InsightsLibrary
git lfs pull

2. Create virtual environment

uv venv .venv  # Create dedicated virtual environment

# Activate environment
# Windows:
.\.venv\Scripts\activate
# Mac/Linux:
source .venv/bin/activate

3. Install core dependencies

uv install .  # Note the trailing dot indicating current directory

4. Create environment variables (for future needs)

notepad .env  # Windows
# Or
nano .env     # Mac/Linux

5. Configure MCP Server

VSCode.Cline

Note: Replace <Your Project Root Directory!!!> with actual root directory.

{
  "mcpServers": {
    "ikb-mcp-server": {
      "command": "uv",
      "args": [
        "--directory",
        "<Your Project Root Directory!!!>", 
        "run",
        "ikb_mcp_server.py"
      ]
    }
  }
}

Cherry Studio
- Command: uv
- Arguments:

--directory
<Your Project Root Directory!!!>
run
ikb_mcp_server.py

Adding Private Documents to ikb_mcp_server

Configure VLM models and parameters in .env:

VLM_API_KEY=<API Key>
VLM_BASE_URL=<Base URL> # https://openrouter.ai/api/v1
VLM_MODEL_NAME=<Model Name> # qwen/qwen2.5-vl-72b-instruct:free

Upload the PDF document to the library_files folder under the project root directory.
Manually run main.py.

# Navigate to the project root directory
# Activate the virtual environment
uv run main.py
(InsightsLibrary) PS D:\Projects\mcp\InsightsLibrary> uv run main.py
[INFO] extractor: PDF extraction initialized | Files directory: library_files | Pages directory: library_pages
[INFO] extractor: Starting scan of directory: library_files
[INFO] extractor: Found 69 PDF files
[INFO] extractor: Scan completed | Total files: 69 | Processed: 0 | Failed: 0
[INFO] recognizer: No pages to process.
# Data has been updated to the database
============================================================
Confirm if you need to create text vector embeddings
⚠️ This process may take approximately 20 minutes
============================================================
Create embeddings? (Enter Y or N): 
# Y: create text vector embeddings
# N: Skip text vector embeddings and exit program

License

This project is licensed under the MIT License. See the LICENSE file for details.

Optimization Updates as of June 17th

💡Optimized models.py: Improved data query efficiency by 1,000%
💡Optimized extractor.py: Slightly enhanced PDF extraction efficiency
💡Optimized recognizer.py: Boosted image comprehension efficiency by 50%
💡Optimized ikb_mcp_server.py:
- Added pagination functionality
- Displayed local paths of referenced files
💡Add MIT License(https://github.com/v587d/InsightsLibrary/pull/1#issuecomment-2969226661)
📦 Overall compressed project package size reduced by approximately 50%
💡Streamline Private Document Handling
💡Fixed other identified bugs

Optimizations as of June 22

Added embedder.py: Implements text vectorization indexing via local Qwen3-Embedding-0.6B model, stored in faiss_index.
Modified main.py: Closed-loop workflow PDFExtractor → IMGRecognizer → Embedder (optional).
New @mcp.tool(): get_similar_content_by_rag: Finds most similar document content via vector similarity (RAG).
All admin-uploaded reports now support online viewing → Removed library_files folder to reduce project size.
Added 2000+ report pages.