mcp-research-powerpack by yigitkonur - MCP Server

MCP server that gives your AI assistant research tools. Google search, Reddit deep-dives, web scraping with LLM extraction, and multi-model deep research — all as MCP tools that chain into each other.

npx mcp-researchpowerpack

five tools, zero config to start. each API key you add unlocks more capabilities.

tools

tool	what it does	requires
`web_search`	parallel Google search across 3-100 keywords, CTR-weighted ranking, consensus detection	`SERPER_API_KEY`
`search_reddit`	same engine but filtered to reddit.com, 10-50 queries in parallel	`SERPER_API_KEY`
`get_reddit_post`	fetches 2-50 Reddit posts with full comment trees, optional LLM extraction	`REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET`
`scrape_links`	scrapes 1-50 URLs with JS rendering fallback, HTML-to-markdown, optional LLM extraction	`SCRAPEDO_API_KEY`
`deep_research`	sends questions to research-capable models (Grok, Gemini) with web search enabled, supports local file attachments	`OPENROUTER_API_KEY`

tools are designed to chain: web_search suggests calling scrape_links, which suggests search_reddit, which suggests get_reddit_post, which suggests deep_research for synthesis.

install

Claude Desktop / Claude Code

add to your MCP config:

{
  "mcpServers": {
    "research-powerpack": {
      "command": "npx",
      "args": ["mcp-researchpowerpack"],
      "env": {
        "SERPER_API_KEY": "...",
        "OPENROUTER_API_KEY": "...",
        "SCRAPEDO_API_KEY": "..."
      }
    }
  }
}

from source

git clone https://github.com/yigitkonur/mcp-research-powerpack.git
cd mcp-research-powerpack
pnpm install && pnpm build
pnpm start

HTTP mode

MCP_TRANSPORT=http MCP_PORT=3000 npx mcp-researchpowerpack

exposes /mcp (POST/GET/DELETE with session headers) and /health.

API keys

each key unlocks a capability. missing keys silently disable their tools — the server never crashes.

required keys — without these, core functionality is unavailable:

variable	enables	free tier
`SERPER_API_KEY`	`web_search`, `search_reddit`	2,500 searches/mo at serper.dev
`SCRAPEDO_API_KEY`	`scrape_links` (scraping pipeline with JS rendering fallback)	1,000 credits/mo at scrape.do
`OPENROUTER_API_KEY`	`deep_research`, LLM extraction in scrape/reddit	pay-per-token at openrouter.ai

optional keys — add these for Reddit-specific capabilities:

variable	enables	free tier
`REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET`	`get_reddit_post`	unlimited (reddit.com/prefs/apps, "script" type)

configuration

optional tuning via environment variables:

variable	default	description
`RESEARCH_MODEL`	`x-ai/grok-4-fast`	primary deep research model
`RESEARCH_FALLBACK_MODEL`	`google/gemini-2.5-flash`	fallback if primary fails
`LLM_EXTRACTION_MODEL`	`openai/gpt-oss-120b:nitro`	default model for scrape/reddit LLM extraction (can be overridden per-request via the `model` parameter in `scrape_links`)
`DEFAULT_REASONING_EFFORT`	`high`	research depth (`low`, `medium`, `high`)
`DEFAULT_MAX_URLS`	`100`	max search results per research question (10-200)
`API_TIMEOUT_MS`	`1800000`	request timeout in ms (default 30 min)
`MCP_TRANSPORT`	`stdio`	`stdio` or `http`
`MCP_PORT`	`3000`	port for HTTP mode

how it works

search ranking

results from multiple queries are deduplicated by normalized URL and scored using CTR-weighted position values (position 1 = 100.0, position 10 = 12.56). URLs appearing across multiple queries get a consensus marker. threshold tries >= 3, falls back to >= 2, then >= 1.

Reddit comment budget

global budget of 1,000 comments, max 200 per post. after the first pass, surplus from posts with fewer comments is redistributed to truncated posts in a second fetch pass.

scraping pipeline

three-mode fallback per URL: basic → JS rendering → JS + US geo-targeting. results go through HTML-to-markdown conversion (turndown), then optional LLM extraction with a 100k char input cap and 8,000 token output per URL. the extraction model defaults to openai/gpt-oss-120b:nitro (configurable via LLM_EXTRACTION_MODEL env var) and can be overridden per-request using the model parameter.

deep research

32,000 token budget divided across questions (1 question = 32k, 10 questions = 3.2k each). Gemini models get google_search tool access. Grok/Perplexity get search_parameters with citations. primary model fails → automatic fallback.

file attachments

deep_research can read local files and include them as context. files over 600 lines are smart-truncated (first 500 + last 100 lines). line numbers preserved.

concurrency

operation	parallel limit
web search keywords	8
Reddit search queries	8
Reddit post fetches per batch	5 (batches of 10)
URL scraping per batch	10 (batches of 30)
LLM extraction	3
deep research questions	3

all clients use manual retry with exponential backoff and jitter. the OpenAI SDK's built-in retry is disabled (maxRetries: 0).

project structure

src/
  index.ts                — entry point, STDIO + HTTP transport, signal handling
  worker.ts               — Cloudflare Workers entry (Durable Objects)
  config/
    index.ts              — env parsing (lazy Proxy objects), capability detection
    loader.ts             — YAML → Zod → JSON Schema pipeline, cached
    yaml/tools.yaml       — single source of truth for all tool definitions
  schemas/
    deep-research.ts      — Zod validation for research questions + file attachments
    scrape-links.ts       — Zod validation for URLs, timeout, LLM options
    web-search.ts         — Zod validation for keyword arrays
  tools/
    registry.ts           — tool lookup → capability check → validate → execute
    search.ts             — web_search handler
    reddit.ts             — search_reddit + get_reddit_post handlers
    scrape.ts             — scrape_links handler
    research.ts           — deep_research handler
  clients/
    search.ts             — Serper API client
    reddit.ts             — Reddit OAuth + comment fetching
    scraper.ts            — scrape.do client with fallback modes
    research.ts           — OpenRouter client with model-specific handling
  services/
    llm-processor.ts      — shared LLM extraction (singleton OpenAI client)
    markdown-cleaner.ts   — HTML → markdown via turndown
    file-attachment.ts    — local file reading with line ranges
  utils/
    concurrency.ts        — bounded parallel execution (pMap, pMapSettled)
    url-aggregator.ts     — CTR-weighted scoring and consensus detection
    errors.ts             — error classification, fetchWithTimeout
    logger.ts             — MCP logging protocol
    response.ts           — standardized output formatting

deploy

Cloudflare Workers

npx wrangler deploy

uses Durable Objects with SQLite storage. YAML-based tool definitions are replaced with inline definitions in the worker entry since there's no filesystem.

license

MIT