MCP-Server-OSHA

FINBYTES-algo/MCP-Server-OSHA

3.2

If you are the rightful owner of MCP-Server-OSHA and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

OSHA server for the construction and healthcare industries, providing compliance and safety management solutions.

MCP-Server-OSHA

OSHA server for the construction and healthcare

MCP MongoDB OSHA Server

An MCP server that provides semantic search capabilities across two MongoDB collections containing OSHA rules and regulations.

Features

  • Semantic search across OSHA health and construction databases
  • Vector embeddings using xAI Grok API
  • Cosine similarity for relevance scoring
  • Detailed rule retrieval

Setup

  1. Install dependencies:
pip install -e .
###################Documentaions###
MCP OSHA Vector Search Server - Implementation Documentation
📋 Project Overview
Problem Statement
Create an MCP (Model Context Protocol) server that provides semantic search capabilities across OSHA regulations stored in MongoDB, using vector embeddings for natural language queries.
Key Objectives
Connect to MongoDB with OSHA health and construction regulations
Implement semantic search using vector embeddings
Create MCP-compliant server for integration with AI assistants
Deploy as accessible web service
🏗️ Architecture Summary
System Components
text
MCP OSHA Server
├── MongoDB Connection (2 databases)
│   ├── osha_health_vector.rules
│   └── osha_construction_vector.rules
├── Vector Embedding Engine
│   └── SentenceTransformers (all-MiniLM-L6-v2)
├── MCP Protocol Layer
│   ├── Tools (search_osha_rules, search_by_identifier, get_rule_details)
│   └── Resources (info, sample-queries)
└── Deployment Target
    └── Google Cloud Run
Data Flow
User Query → Natural language input
Embedding Generation → SentenceTransformers creates query vector
Vector Search → Cosine similarity with stored document vectors
Results Ranking → Sort by similarity score
MCP Response → Formatted results via MCP protocol
🔧 Technical Implementation
Core Technologies
MCP Protocol: Standardized AI tool protocol
Python 3.10: Backend implementation
MongoDB: Document storage with vector embeddings
SentenceTransformers: Local embedding generation
FastAPI: HTTP wrapper for web deployment
Google Cloud Run: Serverless deployment platform
Key Code Components
1. MCP Server Core (server.py)
python
class MongoDBOSHAProcessor:
    def __init__(self):
        # MongoDB connections
        self.health_rules = client['osha_health_vector']['rules']
        self.construction_rules = client['osha_construction_vector']['rules']
        # Embedding model
        self.model = SentenceTransformer('all-MiniLM-L6-v2')
    
    async def search_rules_semantic(self, query: str, database: str, limit: int):
        # Generate query embedding
        query_embedding = self.get_embedding(query)
        # Calculate cosine similarity
        similarity = self.cosine_similarity(query_embedding, doc_vector)
        # Return ranked results
2. MCP Tools Implementation
search_osha_rules: Semantic search with natural language
search_by_identifier: Exact match search by regulation codes
get_rule_details: Retrieve full document details
Vector Search Strategy
Embedding Model: all-MiniLM-L6-v2 (384 dimensions)
Similarity Metric: Cosine similarity
Threshold: > 0.1 similarity score
Local Processing: No external API dependencies
🚀 Deployment Journey
Development Phases
Phase 1: Local MCP Server
✅ Basic MCP server structure
✅ MongoDB connection testing
✅ Vector search implementation
✅ MCP Inspector integration
Phase 2: Cloud Deployment
✅ Docker containerization
✅ Cloud Run configuration
✅ HTTP/SSE transport layer
✅ Web client interface
Phase 3: Production Readiness
✅ Error handling and logging
✅ Health checks
✅ CORS configuration
✅ Resource optimization
Deployment Options Evaluated
Google Cloud Run (Selected)
Pros: Serverless, auto-scaling, cost-effective
Cons: Cold start times, execution time limits
Google Compute Engine
Pros: Full control, persistent storage
Cons: Manual scaling, higher maintenance
Google Kubernetes Engine
Pros: High scalability, container orchestration
Cons: Complex setup, overkill for this use case
📊 Key Challenges & Solutions
Challenge 1: MCP Protocol Understanding
Problem: Initial confusion between REST API and MCP stdio/SSE protocols
Solution: Implemented proper MCP server with stdio transport, added SSE for web compatibility
Challenge 2: Vector Search Integration
Problem: xAI API dependencies and reliability issues
Solution: Switched to local SentenceTransformers for consistent, fast embeddings
Challenge 3: Deployment Architecture
Problem: MCP servers traditionally use stdio, not HTTP
Solution: Created HTTP proxy and SSE bridge for cloud deployment
Challenge 4: CORS and Web Integration
Problem: Browser security blocking local MCP Inspector calls
Solution: Implemented HTTP server with CORS headers and web client interface
🎯 Key Features Implemented
Core Features
Dual Database Search: Simultaneous search across health and construction regulations
Semantic Understanding: Natural language query processing
Hybrid Search: Both semantic and identifier-based search
Local Embeddings: No external API dependencies
MCP Compliance: Works with Claude Desktop and other MCP clients
Search Capabilities
Natural Language: "construction safety regulations"
Specific Codes: "1926", "29", "1910"
Database Selection: Both, health-only, or construction-only
Result Ranking: By relevance score with similarity metrics
🔌 Integration Points
MCP Clients Supported
Claude Desktop
MCP Inspector
Custom web clients
Direct HTTP API
API Endpoints
text
GET  /health                    # Service health check
GET  /sse                      # MCP over Server-Sent Events
POST /search                   # Semantic search
POST /search-by-identifier     # Code-based search
POST /mcp-proxy               # MCP protocol proxy
📈 Performance Characteristics
Resource Requirements
Memory: 2GB recommended (embedding model loading)
CPU: 2 vCPUs for optimal performance
Storage: Minimal (stateless, connects to MongoDB)
Expected Performance
Embedding Generation: ~100ms per query
Vector Search: ~500ms for 15K documents
Cold Start: 10-15 seconds (model loading)
Warm Performance: <1 second response time
🛠️ Usage Examples
MCP Tool Calls
json
{
  "method": "tools/call",
  "params": {
    "name": "search_osha_rules",
    "arguments": {
      "query": "fall protection requirements",
      "database": "construction",
      "limit": 5
    }
  }
}
Web Client Usage
javascript
// Semantic search
await searchRules("hazardous materials handling", "both", 10);

// Identifier search  
await searchByIdentifier("1926", "construction");
🔮 Future Enhancements
Planned Improvements
Caching Layer: Redis for frequent queries
Advanced Filtering: Date ranges, regulation types
Multi-modal Search: Combine semantic and keyword search
User Analytics: Search pattern tracking
API Rate Limiting: Production-grade throttling
Scalability Considerations
Sharding: Split databases by regulation type
CDN: Cache static resources and common queries
Load Balancing: Multiple region deployment
Monitoring: Comprehensive logging and metrics
💡 Key Learnings
Technical Insights
MCP Protocol: Powerful for AI tool standardization but requires proper transport layer understanding
Vector Search: Local embeddings provide reliability and cost savings over APIs
Cloud Deployment: Stateless design crucial for serverless platforms
Web Integration: SSE bridges gap between MCP stdio and web protocols
Development Process
Iterative Testing: MCP Inspector invaluable for protocol debugging
Modular Design: Separation of concerns enabled smooth cloud migration
Documentation: Comprehensive docs essential for complex protocol implementations
📋 Setup & Deployment Checklist
Prerequisites
MongoDB Atlas cluster with OSHA data
Google Cloud Project
Python 3.10+ environment
MCP client (Claude Desktop or Inspector)
Deployment Steps
Environment configuration (.env file)
Local testing with MCP Inspector
Docker image building and testing
Cloud Run deployment
Web client deployment
Integration testing with target clients
🎉 Success Metrics
Implementation Success
✅ MCP protocol compliance achieved
✅ Vector search functionality working
✅ Cloud deployment operational
✅ Web accessibility accomplished
✅ Performance requirements met
Business Value
Accessibility: Web-based access to OSHA regulations
Usability: Natural language query understanding
Integration: Seamless AI assistant integration
Scalability: Cloud-native, cost-effective deployment

This documentation captures the complete journey from concept to deployed production MCP server, highlighting both technical implementation and strategic decisions made throughout the development process.

##############Example for the MCP sever
https://auth0.com/blog/build-python-mcp-server-for-blog-search/#Testing-the-MCP-Server-with-the-MCP-Inspector