NachoG2000/gmat-docs-mcp-server
If you are the rightful owner of gmat-docs-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The GMAT Docs MCP Server provides semantic search capabilities over GMAT documentation using the Model Context Protocol (MCP). It scrapes, parses, embeds, and caches documentation to allow MCP-compatible clients to query them efficiently.
GMAT Docs MCP Server
Semantic search over the GMAT documentation via the Model Context Protocol (MCP). This server scrapes, parses, embeds, and caches GMAT docs so any MCP-compatible client (e.g., Cursor, Claude Desktop, custom apps) can query them with the searchDocs tool.
Features
- searchDocs tool: semantic search with OpenAI embeddings
- Local cache: embeddings stored in
data/embeddings.json - Deterministic pipeline: scrape → parse/chunk → embed → cache
Requirements
- Node.js 18+ (ESM, OpenAI SDK v5)
- pnpm (project uses
pnpm@10perpackage.json) - An OpenAI API key with access to
text-embedding-3-small
Quick Start
- Clone the repo
git clone https://github.com/your-org/gmat-docs-mcp-server.git
cd gmat-docs-mcp-server
- Install dependencies
pnpm install
- Configure environment
Create a
.env.local(used at runtime) and/or.env(also read by setup) file at the repo root:
echo "OPENAI_API_KEY=your_api_key_here" > .env.local
Optional variables you can add (defaults shown):
CACHE_DIR(default:./data)BASE_URL(default:https://documentation.help/gmat/)
- Build the project
pnpm build
- Generate the local cache (scrape, parse/chunk, embed)
pnpm run setup
This produces data/embeddings.json (or ${CACHE_DIR}/embeddings.json).
- Start the MCP server
pnpm start
The server runs over stdio and exposes the searchDocs tool to your MCP client.
Scripts
- pnpm build: compile TypeScript to
dist/ - pnpm start: run server from
dist/index.js(loads.env.local) - pnpm dev: run server in watch mode with
ts-node - pnpm run setup: build cache from live docs (requires OpenAI API key)
- pnpm run setup:test: build a smaller test cache using
pages-test.json
Pass --force to setup to rebuild the cache from scratch:
pnpm run setup -- --force
Environment Variables
- OPENAI_API_KEY (required): used for embeddings
- CACHE_DIR (optional): directory for
embeddings.json(default:./data) - BASE_URL (optional): docs base URL (default:
https://documentation.help/gmat/) - NODE_ENV (optional): set to
testto usepages-test.jsonduring setup - MCP_PORT (optional): for wrappers/adapters that expose this stdio server via TCP/SSE. This server itself communicates over stdio and does not bind to a port; some clients or adapters may read
MCP_PORTto decide which port to listen on.
Files read for env values:
- Setup reads both
.envand.env.local - Runtime reads
.env.local(viapnpm start) or your shell env
Using with MCP Clients
This server communicates via stdio. Point your MCP client to execute the server in your project directory. Two common approaches:
Option A: Use the start script
pnpm start
Your MCP client should spawn this command in the repo root (ensures .env.local is picked up).
Option B: Use the wrapper
There is a convenience wrapper that ensures env loading, then starts the compiled server:
node start-mcp.js
Note: If you run the server behind an adapter that serves MCP over SSE/TCP, you can set MCP_PORT to guide that adapter. The server code here still talks over stdio.
Tool: searchDocs
Inputs:
query(string, required)topK(number, default 10, 1–50)minScore(number, default 0.1, 0–1)
Output: formatted text with page name, source URL, similarity score, and extracted content.
Data and Cache
- Cache file:
data/embeddings.json(or${CACHE_DIR}/embeddings.json) - To rebuild:
pnpm run setup -- --force - To use a smaller test set:
pnpm run setup:test
Customizing Pages
The list of pages to scrape is defined in:
pages.json(full set)pages-test.json(smaller set for tests)
You can edit these files to change the crawl scope. The parser attempts to extract meaningful sections by headings and convert them to Markdown for embedding.
Troubleshooting
- Error: OPENAI_API_KEY environment variable is required
- Create
.env.local(and optionally.env) withOPENAI_API_KEY
- Create
- Cache not found at data/embeddings.json. Run setup first.
- Run
pnpm build && pnpm run setupto generate the cache
- Run
- Network timeouts while scraping
- The scraper retries with exponential backoff; rerun
setupor adjust your network
- The scraper retries with exponential backoff; rerun
- MCP client can’t see tools
- Ensure the server is started from the project directory and connected via stdio
- Confirm
pnpm startlogs show the server is running and the cache is loaded
Project Structure
src/
index.ts # MCP server entry (stdio)
setup.ts # Setup pipeline: scrape → parse/chunk → embed → cache
tools/ # MCP tool definitions and handlers
utils/ # scraper, parser, embedder, cache, search
data/ # Default cache directory (embeddings.json)
pages.json # Full list of pages to scrape
pages-test.json # Smaller list for testing
dist/ # Compiled JavaScript (after pnpm build)
start-mcp.js # Wrapper to load env and run the server
License
ISC