openapi-context-engine by xinthink - MCP Server

Enterprise API Context Engine (MCP)

A sophisticated Model Context Protocol (MCP) Server that acts as a "Context Engine" for e-commerce platforms, providing AI agents with precise, progressively disclosed API specifications using multi-layered hybrid search.

🚀 Features

Multi-Layered Search Pipeline: Hard Filtering Hybrid Retrieval Reranking
Hybrid Embeddings: Dense semantic + Sparse keyword vectors for comprehensive matching
Progressive Disclosure: Find relevant endpoints first, then inspect full specifications
Service-Oriented Architecture: Organized by microservices with rich metadata
AI Agent Friendly: Natural language queries with intelligent result ranking
Local Processing: Uses FastEmbed for efficient on-device embedding generation

🏗️ Architecture

Search Pipeline

Query  Hard Filtering (Metadata)  Hybrid Search (Dense + Sparse)  Reranking  Results

Hard Filtering: Pre-computation metadata filters (service, method, path, tags)
Hybrid Retrieval: Combines dense embeddings (semantic) with sparse vectors (keywords)
Reranking: Cross-encoder scoring for final relevance ranking

Technology Stack

Language: Python 3.11+
Vector Database: Qdrant with named vectors for hybrid search
Embeddings: FastEmbed (BAAI/bge-small-en-v1.5 + prithivida/Splade_PP_en_v1)
Orchestration: LlamaIndex for RAG pipeline management
Server: FastAPI with MCP SDK integration
Reranking: BAAI/bge-reranker-base for result refinement

🚀 Quick Start

Prerequisites

Python 3.11+
Docker (for Qdrant)
UV package manager

Setup

# Clone and setup
git clone <repository>
cd enterprise-api-context-engine

# Start Qdrant vector database
docker compose up -d

# Install dependencies
uv sync

# Index OpenAPI specifications
uv run python app/ingest.py

# Start the server
uv run main.py

Test the System

# Run validation tests
uv run python scripts/test_agent.py

# Test specific search queries
uv run python scripts/test_agent.py --test-search "calculate shipping costs"

# Validate collection data integrity
uv run python scripts/test_collection.py

📋 API Endpoints

Search Endpoints

POST /api/search - Search for API endpoints
GET /api/services - List available services
GET /api/specs/{service} - Get full OpenAPI specification

MCP Tool Simulation

POST /tools/find-api-endpoints - Find endpoints by intent
GET /tools/get-service-spec/{service} - Get service specification

Example Queries

# Search for shipping-related endpoints
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "calculate shipping costs", "service": "logistics"}'

# List all services
curl http://localhost:8000/api/services

# Get full checkout service specification
curl http://localhost:8000/api/specs/checkout-payment

📁 Project Structure

enterprise-api-context-engine/
 app/                    # Core application modules
    engine.py          # Multi-layer search engine
    ingest.py          # OpenAPI ingestion pipeline
    server.py          # FastAPI server with MCP tools
 data/specs/            # OpenAPI specifications
    product-catalog.yaml
    cart-service.yaml
    checkout-payment.yaml
    logistics-shipping.yaml
 scripts/               # Utility scripts
    test_agent.py      # Validation test script
    test_collection.py # Collection validation
 docker-compose.yml     # Qdrant configuration
 pyproject.toml         # Project dependencies

🔍 How It Works

1. Data Ingestion

Parses OpenAPI YAML specifications
Creates one document per API endpoint
Extracts rich metadata (service, method, path, parameters, responses)
Generates both dense and sparse embeddings
Stores in Qdrant with metadata filtering support

2. Search Process

Intent Recognition: Natural language query processing
Service Filtering: Optional service-specific search
Hybrid Matching: Combines semantic similarity with keyword matching
Result Ranking: Reranks based on cross-encoder scoring
Progressive Disclosure: Returns endpoint summaries, then full specs on demand

3. Agent Interaction Pattern

Agent: "I need to calculate shipping costs"
System: Returns POST /shipping/rates endpoint
Agent: Gets full specification for implementation details

🛠️ Development

Adding New API Specifications

Create YAML file in data/specs/ with x-service-name metadata
Run ingestion: uv run python app/ingest.py
Test search functionality

Modifying Search Behavior

Adjust retrieval parameters in app/engine.py
Change embedding models in app/ingest.py
Modify reranking logic in app/engine.py

Performance Tuning

Vector dimensions: 384 (dense) for memory efficiency
Retrieval: Top 20 results for high recall
Reranking: Top 5 results for precision
Batch processing for ingestion efficiency

🔧 Configuration

Qdrant Settings

Collection: api_endpoints
Dense Vectors: text-dense (384 dims, cosine distance)
Sparse Vectors: text-sparse-new (SPLADE encoding)
Port: 6333 (HTTP), 6334 (gRPC)

Embedding Models

Dense: BAAI/bge-small-en-v1.5 (semantic similarity)
Sparse: prithivida/Splade_PP_en-v1 (keyword matching)
Reranker: BAAI/bge-reranker-base (result scoring)

🐛 Troubleshooting

Common Issues

Qdrant Connection: Ensure Docker container is running (docker compose ps)
Model Downloads: Network issues may affect initial model downloads
Sparse Encoder: Custom FastEmbed integration bypasses torch dependency
Collection Structure: Validate data integrity with test_collection.py

Debug Commands

# Check Qdrant status
docker compose logs qdrant

# Test ingestion with verbose logging
uv run python -u app/ingest.py

# Validate collection data integrity
uv run python scripts/test_collection.py

# Validate search functionality
uv run python scripts/test_agent.py --verbose

📊 Performance Metrics

Indexing Speed: ~22 documents/second (depends on model download)
Search Latency: <100ms for typical queries
Memory Usage: Efficient 384-dim vectors
Accuracy: Hybrid approach balances precision and recall

🤝 Contributing

Follow the existing code structure and patterns
Add comprehensive logging for debugging
Test with scripts/test_agent.py and scripts/test_collection.py before submitting changes
Update documentation for new features

📄 License

[Add your license information here]

🙏 Acknowledgments

LlamaIndex for the RAG framework
Qdrant for vector database capabilities
FastEmbed for efficient local embeddings
FastAPI for the web framework