CARTE-Toronto/alliance-docs-mcp
If you are the rightful owner of alliance-docs-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Alliance Documentation MCP Server provides programmatic access to the Digital Research Alliance of Canada's technical documentation by mirroring content from the MediaWiki site and exposing it through MCP resources and tools.
Alliance Documentation MCP Server
A Model Context Protocol (MCP) server that provides programmatic access to the Digital Research Alliance of Canada's technical documentation. This server mirrors the documentation from the MediaWiki site and exposes it through MCP resources and tools for use with MCP-compatible clients.
Features
- Documentation Mirroring: Syncs documentation from the Alliance MediaWiki site
- MCP Resources: Exposes individual documentation pages as MCP resources
- Full-Text Search: Whoosh-backed content and title search with highlights and scoring
- Related Pages: Embeddings-backed related-page discovery with heuristic fallback
- Search & Query Tools: Provides search, categorization, and querying capabilities
- Startup Refresh: Container entrypoint triggers an incremental sync on boot; schedule additional runs as needed
- Markdown Storage: Stores documentation as markdown files with metadata
Quick Start
Prerequisites
- Python 3.11+
- uv for package management
Installation
-
Clone and setup the repository:
git clone <repository-url> cd alliance-docs-mcp -
Install dependencies:
uv sync -
Configure environment (optional): Create a
.envfile (or export the variables directly) if you want to override defaults. For example:MEDIAWIKI_API_URL=https://docs.alliancecan.ca/mediawiki/api.php DOCS_DIR=./docs USER_AGENT=AllianceDocsMCP/1.0 -
Initial documentation sync:
uv run python scripts/sync_docs.pyNote: Docker images built from this repository automatically run this full sync during the image build so containers start with a warm cache.
-
Start the MCP server:
uv run python -m alliance_docs_mcp.server
Usage
MCP Resources
The server exposes documentation pages as MCP resources:
- Resource URI:
alliance-docs://page/{slug} - Content: Markdown content of the documentation page
Example:
alliance-docs://page/technical_documentation
MCP Tools
The server provides several tools for querying documentation:
search_docs(query: str, category: Optional[str] = None, limit: int = 20, search_content: bool = True, fuzzy: bool = False)
Search documentation pages by title (fallback) or full-text index when available. Full-text results include relevance scores and highlighted snippets.
Parameters:
query: Search query stringcategory: Optional category filterlimit: Maximum number of resultssearch_content: Use full-text index when available (default: True)fuzzy: Enable fuzzy matching for typo tolerance (full-text only)
Returns: List of matching pages with metadata, highlights, and scores (when indexed)
list_categories()
List all available documentation categories.
Returns: List of category names
get_page_by_title(title: str)
Find a specific page by its title.
Parameters:
title: Page title to search for
Returns: Page metadata or None if not found
list_recent_updates(limit: int = 10)
List recently updated pages.
Parameters:
limit: Maximum number of pages to return
Returns: List of recent pages with metadata
get_page_info(slug: str)
Get detailed information about a specific page.
Parameters:
slug: Page slug
Returns: Detailed page information including metadata
list_all_pages()
List all available documentation pages.
Returns: List of all pages with basic metadata
find_related_pages(slug: str, limit: int = 5)
Embeddings-backed related-pages helper (Chroma + sentence-transformers) with automatic fallback to lightweight heuristics.
Parameters:
slug: Source page sluglimit: Max related pages to returnmin_score: Optional similarity threshold when embeddings are available
Returns: List of related pages with similarity scores (or heuristic scores when falling back)
MCP Prompts
The server provides reusable prompt templates that guide LLMs on how to effectively query and use the documentation system. These prompts can be used by MCP clients to structure queries and improve consistency.
documentation_search_guide(query: str, category: Optional[str] = None)
Guide for effectively searching Alliance documentation. Provides instructions on using the search_docs tool, interpreting search results, and filtering by category.
Parameters:
query: The user's search querycategory: Optional category filter
Use Case: When an LLM needs to help a user search for documentation on a specific topic.
technical_question_template(question: str, context: Optional[str] = None)
Template for answering technical questions using documentation. Guides the LLM through searching, reading relevant pages, finding related content, and synthesizing information.
Parameters:
question: The technical question to answercontext: Additional context about what the user is trying to accomplish
Use Case: When an LLM needs to answer technical questions based on the documentation.
category_exploration_guide(category: str, purpose: Optional[str] = None)
Guide for exploring documentation by category. Helps discover pages within a specific category and understand the documentation structure.
Parameters:
category: The category to explorepurpose: What the user is trying to accomplish
Use Case: When an LLM needs to help users explore documentation in a specific category (e.g., "Getting Started", "Technical Reference").
related_content_discovery(topic: str, goal: Optional[str] = None)
Guide for finding related documentation pages. Provides instructions on using the find_related_pages tool and interpreting similarity scores.
Parameters:
topic: The topic or page slug to find related content forgoal: The user's goal (learning, troubleshooting, etc.)
Use Case: When an LLM needs to help users discover related documentation after finding a relevant page.
getting_started_helper(use_case: str)
Template for helping new users get started. Guides LLMs to point users to getting started documentation and common first steps.
Parameters:
use_case: What the user wants to do (e.g., "set up account", "run first job", "install software")
Use Case: When an LLM needs to help new users with onboarding and initial setup tasks.
Synchronization
Manual Sync
Run a full synchronization (with rich progress bars and visual feedback):
uv run python scripts/sync_docs.py
Run an incremental sync (only changed pages):
uv run python scripts/sync_docs.py --incremental
Index controls:
uv run python scripts/sync_docs.py --rebuild-index # Rebuild Whoosh index
uv run python scripts/sync_docs.py --no-index # Skip indexing
uv run python scripts/sync_docs.py --index-dir /tmp/idx # Custom index location
uv run python scripts/sync_docs.py --rebuild-related-index # Rebuild related-page embeddings
uv run python scripts/sync_docs.py --no-related-index # Skip related-page embeddings
uv run python scripts/sync_docs.py --related-index-dir /tmp/rel# Custom related index location
uv run python scripts/sync_docs.py --related-model-name all-MiniLM-L6-v2
The related-page index downloads the configured sentence-transformer model (default: all-MiniLM-L6-v2, ~90 MB) the first time it runs.
For FastMCP Cloud deployments, run one of the sync commands above locally and commit the updated docs/ directory before pushing so the hosted server always mirrors the latest content.
The sync script provides:
- Colored output with rich formatting
- Progress bars for download and processing phases
- Real-time statistics including pages/second
- Summary table with detailed metrics
- Error tracking with warnings for failed pages
Note: Markdown pages larger than 10 MB are stored as
.md.gzfiles. The server automatically decompresses them at runtime, so no additional configuration is required.
LLM-Optimized Documentation Files
The sync process automatically generates two files for LLM consumption:
docs/llms.txt: A simple directory listing all page names, categories, and URLs (~35 KB)docs/llms_full.txt.gz: Complete documentation content in a single compressed file (~2.6 MB compressed, ~393 MB uncompressed)
These files are regenerated on every sync (both full and incremental) and committed to the repository, making it easy for LLMs to access the entire documentation corpus.
Automated Sync
Set up a cron job for weekly updates:
# Add to crontab (runs every Sunday at 2 AM)
0 2 * * 0 cd /path/to/alliance-docs-mcp && uv run python scripts/sync_docs.py --incremental
This repository also ships with .github/workflows/weekly-sync.yml, which performs the same incremental sync on Sundays using GitHub Actions and pushes any changes back to main.
Configuration
Environment Variables
Set the following environment variables (via .env, shell exports, or your hosting platform's secret manager) to customize behavior:
MEDIAWIKI_API_URL(defaulthttps://docs.alliancecan.ca/mediawiki/api.php)DOCS_DIR(default./docs, or/data/docsin the container)USER_AGENT(defaultAllianceDocsMCP/1.0)SEARCH_INDEX_DIR(optional; overrides defaultDOCS_DIR/search_index)DISABLE_SEARCH_INDEX(set to1/true/yesto force title-only fallback)RELATED_INDEX_DIR(optional; overrides defaultDOCS_DIR/related_index)RELATED_MODEL_NAME(sentence-transformer model, defaultall-MiniLM-L6-v2)RELATED_BACKEND(defaultchroma)DISABLE_RELATED_INDEX(set to1/true/yesto skip related-page embeddings)
Server Configuration
The MCP server can be configured with command-line arguments:
uv run python -m alliance_docs_mcp.server --help
Options:
--host: Host to bind to (default: localhost)--port: Port to bind to (default: 8000)--docs-dir: Documentation directory (default: ./docs)
Docker Deployment
The provided Docker image ships with a pre-synced documentation cache baked into /app/docs_seed. When the container starts, the entrypoint primes the configured DOCS_DIR from this seed (if empty) and then launches the MediaWiki sync in the background so the MCP server begins accepting connections immediately. You can configure startup behavior with:
RUN_SYNC_ON_START=0to skip the background sync (useful when running in read-only environments)SYNC_MODE=fullto force a full resync instead of the default incremental sync- The container starts the server via
fastmcp run server_entrypoint.py:mcp --transport http --path /mcp/ --port 8080, so any additional FastMCP CLI flags can be injected by overridingCMDin your own image if needed. - A lightweight
/healthendpoint is exposed for platform probes; point load balancer checks there instead of MCP protocol paths.
Project Structure
alliance-docs-mcp/
├── src/
│ └── alliance_docs_mcp/
│ ├── __init__.py
│ ├── server.py # FastMCP server implementation
│ ├── mirror.py # MediaWiki API client
│ ├── converter.py # WikiText to Markdown converter
│ └── storage.py # File storage and retrieval
├── docs/ # Mirrored markdown files
│ ├── pages/ # Organized by category
│ └── index.json # Page metadata index
├── scripts/
│ └── sync_docs.py # Synchronization script
├── tests/ # Test files
├── pyproject.toml # Project configuration
└── README.md
Development
Running Tests
uv run pytest
Code Formatting
uv run black src/
uv run ruff check src/
Deployment Options
FastMCP Cloud (managed)
- Sign in at fastmcp.cloud with your GitHub account and create a project that points at this repository.
- Use
server_entrypoint.py:mcpas the entrypoint so the platform runs the exported FastMCP server instance. - Configure environment variables (e.g.,
MEDIAWIKI_API_URL,DOCS_DIR,USER_AGENT) via the project settings; the service installs dependencies directly frompyproject.toml. - Push to
mainto trigger deployments; each pull request automatically gets its own preview environment for testing changes.
Self-managed container/VM
- Build the Docker image in this repo and run it anywhere that can expose HTTP on port
8080. - Provide the same environment variables via your scheduler or container runtime.
- Point load balancer health checks at
/healthand connect MCP clients to the/mcp/path served byfastmcp run.
Adding New Features
- New MCP Tools: Add new tool functions to
server.py - Storage Enhancements: Extend
storage.pyfor new functionality - API Improvements: Modify
mirror.pyfor different API interactions
Troubleshooting
Common Issues
- Sync Failures: Check API access and network connectivity
- Missing Pages: Verify MediaWiki API responses
- Conversion Errors: Ensure
beautifulsoup4/wikitextparserare installed and valid HTML is being stripped (use--no-strip-htmlto disable)
Logs
Check the sync.log file for synchronization issues:
tail -f sync.log
Debug Mode
Run with verbose logging:
uv run python scripts/sync_docs.py --verbose
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Digital Research Alliance of Canada for providing the documentation
- FastMCP for the MCP server framework
- uv for Python package management