hmalfee/web-content-mcp
If you are the rightful owner of web-content-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Web Content MCP is a server designed to efficiently retrieve web content through search-based and direct URL access.
Web Content MCP Server
A powerful Model Context Protocol (MCP) server that provides intelligent web search and content extraction capabilities. This server enables AI assistants to search Google and fetch webpage content, automatically converting everything to clean, readable markdown format.
🚀 Features
- Google Search Integration: Perform Google searches with customizable result counts
- Content Extraction: Automatically extracts main content from web pages
- Markdown Conversion: Converts HTML content to clean, readable markdown
- Smart Content Filtering: Removes ads, navigation, and irrelevant elements
- Multiple Content Strategies: Uses intelligent selectors to find the main content
- Error Handling: Robust error handling with fallback mechanisms
- Type Safety: Built with TypeScript and Zod validation
🛠️ Available Tools
1. Web Search (web_search
)
Search Google and return formatted results with extracted content.
Parameters:
query
(string, required): The search query to executenumResults
(number, optional): Number of results to return (1-20, default: 5)
Example:
{
"name": "web_search",
"arguments": {
"query": "TypeScript best practices",
"numResults": 3
}
}
2. Fetch Webpage (fetch_webpage
)
Fetch and extract content from a specific webpage URL.
Parameters:
url
(string, required): The URL of the webpage to fetch
Example:
{
"name": "fetch_webpage",
"arguments": {
"url": "https://example.com/article"
}
}
📋 Prerequisites
- Node.js 18+
- SerpAPI Key (for Google search functionality)
🔧 Installation
-
Clone the repository:
git clone <repository-url> cd web-content-mcp
-
Install dependencies:
npm install
-
Set up environment variables: Create a
.env
file in the root directory:SERPAPI_KEY=your_serpapi_key_here
Get your SerpAPI key from https://serpapi.com/
🚀 Usage
Running the Server
Simply run the server using npx tsx
directly:
npx tsx --env-file=.env src/index.ts
This command:
- Uses
tsx
to run TypeScript directly without compilation - Loads environment variables from
.env
file - Starts the MCP server immediately
As an MCP Server
This server is designed to be used with MCP-compatible clients. Configure your MCP client to connect to this server via stdio transport.
Example MCP client configuration:
{
"mcpServers": {
"web-content": {
"command": "npx",
"args": [
"tsx",
"--env-file=/path/to/web-content-mcp/.env",
"/path/to/web-content-mcp/src/index.ts"
],
"env": {
"SERPAPI_KEY": "your_api_key"
}
}
}
}
🏗️ Architecture
The server follows a clean, modular architecture:
src/
├── index.ts # Main server setup and request handling
├── tools.ts # Tool implementations (WebSearch, FetchWebpage)
├── types.ts # TypeScript types and Zod schemas
└── env.ts # Environment variable configuration
Key Components
- Server: Main MCP server instance with stdio transport
- Tools: Modular tool classes with abstract base class
- Content Extraction: Smart HTML-to-markdown conversion
- Type Safety: Comprehensive Zod validation schemas
- Error Handling: Graceful error handling with informative messages
🔍 Content Extraction Strategy
The server uses an intelligent content extraction approach:
-
Content Selectors: Tries multiple selectors to find main content:
main
,article
,.content
,.main-content
.post-content
,.entry-content
,#content
.container
(as fallback)
-
Content Cleaning: Removes unwanted elements:
- Scripts, styles, navigation
- Headers, footers, sidebars
- Advertisements and promotional content
-
Markdown Conversion:
- Converts HTML to clean markdown
- Preserves images with alt text
- Maintains proper heading structure
- Limits content length to prevent overflow
⚙️ Configuration
The server includes several configurable options in tools.ts
:
const config = {
serpApiKey: env.SERPAPI_KEY,
timeout: 10000, // Request timeout (ms)
maxContentLength: 5000, // Max content length per page
userAgent: "Mozilla/5.0 (compatible; WebContentMCP/1.0)",
};
🔒 Environment Variables
Variable | Description | Required |
---|---|---|
SERPAPI_KEY | Your SerpAPI key for Google search | Yes |
📄 Output Format
Both tools return content in markdown format:
Web Search Results
# Web Search Results for: "query"
Found X results:
## 1. Page Title
**URL:** https://example.com
**Snippet:** Brief description...
**Content:**
Main page content in markdown...
---
Webpage Content
# Webpage Content
**URL:** https://example.com
**Content:**
Extracted page content in markdown...
🛡️ Error Handling
The server includes comprehensive error handling:
- Network errors: Timeout and connection failures
- API errors: SerpAPI rate limits and authentication
- Content errors: Invalid URLs and parsing failures
- Validation errors: Invalid input parameters
📞 Support
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Include error messages and environment details
Built with ❤️ using TypeScript and the Model Context Protocol