ajay1133/mcp-server-job-description-complexity-score
If you are the rightful owner of mcp-server-job-description-complexity-score and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Complexity Scorer is a server that evaluates programming requirements and job descriptions to provide complexity scores, using Replit Agent 3 as a baseline.
Technology Extractor MCP Server
A lightweight MCP (Model Context Protocol) server that extracts required technologies from job descriptions and resume files and provides difficulty ratings with alternatives.
Features
- Technology Detection: Automatically detects technologies mentioned in text or resume files
- Resume File Support: Parse .txt, .docx, .pdf, and .doc resume files
- Difficulty Ratings: Provides difficulty scores (1-10 scale) for each technology
- Experience Tracking: Three-tier experience validation:
experience_mentioned_in_prompt: Years specified in job requirementsexperience_accounted_for_in_resume: Years extracted from resumeexperience_validated_via_github: GitHub-based verification (placeholder for future)
- Smart Alternatives: Suggests alternative technologies with their difficulty ratings
- Simple Schema: Returns clean, structured JSON with no overhead
- Request Logging: Automatic per-request logging with timing, CPU, and memory metrics
Installation
# Clone the repository
git clone https://github.com/ajay1133/mcp-server-job-description-complexity-score.git
cd mcp-server-job-description-complexity-score
# Install dependencies (includes resume parsing libraries)
pip install -e .
# OR with uv:
uv pip install -e .
Resume parsing dependencies automatically installed:
python-docxfor .docx filesPyPDF2for .pdf filespython-magic-binfor Windows file type detection
Usage
As MCP Server
python mcp_server/server.py
Self-Test
python mcp_server/server.py --self-test
As Python Module
from mcp_server.simple_tech_extractor import SimpleTechExtractor
extractor = SimpleTechExtractor()
result = extractor.extract_technologies("Senior Full-Stack Engineer with React and Node.js")
print(result)
# {
# "technologies": {
# "react": {
# "difficulty": 5.2,
# "experience_required": 2.5,
# "mentioned_in_prompt": true,
# "category": "frontend",
# "alternatives": {
# "vue": {"difficulty": 4.8, "experience_required": 2.0},
# "angular": {"difficulty": 6.5, "experience_required": 3.0}
# }
# },
# "node": {
# "difficulty": 5.0,
# "experience_required": 2.5,
# "mentioned_in_prompt": true,
# "category": "backend",
# "alternatives": {
# "python_fastapi": {"difficulty": 4.5, "experience_required": 2.0}
# }
# }
# }
# }
Response Schema
{
"technologies": {
"<tech_name>": {
"difficulty": 5.2, // 1-10 scale
"category": "frontend", // category
"mentioned_explicitly": true, // true if tech name appears in prompt/resume
"experience_estimate": "5 years", // Estimated from resume (priority) or prompt; can be years or a level like "senior"/"junior"
"alternatives": {
"<alt_tech_name>": {
"difficulty": 4.8,
"experience_estimate": "5 years" // Propagated global estimate when present
}
},
"experience_mentioned_in_prompt": 5.0, // If explicit years for this tech in prompt (legacy compatibility)
"experience_accounted_for_in_resume": 3.0,// If explicit years for this tech in resume (legacy compatibility)
"experience_validated_via_github": null // Placeholder for future
}
}
}
Notes:
experience_estimateprefers resume over prompt. If explicit years are not available, seniority terms like "senior", "mid", or "junior" are used when present.- Alternatives receive the global estimate (overall years or seniority) when available; tech-specific years are not applied to alternatives.
Supported Technologies
Frontend
- React, Vue, Angular, Next.js, Svelte, TypeScript
Backend
- Node.js, FastAPI, Flask, Django, Golang, Java Spring, Ruby on Rails
Database
- PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, Cassandra
Infrastructure
- Docker, Kubernetes, AWS, Lambda
Messaging
- Kafka, RabbitMQ
Search
- Elasticsearch
License
MIT python active_learning_loop.py
4) (Optional) Analyze real GitHub repos for actual LOC/tech/hours
```powershell
python analyze_github_repos.py
- Merge all sources and de-duplicate by text
python merge_training_data.py --out data\merged_training_data.jsonl
- Train models on merged dataset
python train_software_models.py --data data\merged_training_data.jsonl --out models\software
- Validate and run server
python test_new_schema.py
python mcp_server\server.py --self-test
python mcp_server\server.py # long-running MCP server
Training with System Design Patterns (Recommended)
NEW: Generate high-quality training data from system design patterns:
# 1. Generate training data from patterns (47 examples covering 10 major apps)
python generate_training_from_patterns.py --output data/training_from_patterns.jsonl
# 2. Merge with existing data
python merge_training_data.py --inputs data/training_from_patterns.jsonl data/software_training_data.jsonl --out data/merged_training_data.jsonl
# 3. Train models with enriched dataset
python train_software_models.py --data data/merged_training_data.jsonl --out models/software
# 4. Test the improved models
python run_requirements_cli.py --text "Build a Twitter clone with real-time feeds"
This approach uses the comprehensive system design knowledge base to generate realistic training examples for:
- Twitter, Instagram, YouTube, WhatsApp, Uber, Netflix, Airbnb, E-commerce, Slack, TikTok patterns
- Comprehensive tech stacks (15-22 technologies per example)
- Production-ready microservice architectures (10-15 services)
- Accurate complexity estimates (500-1200 hours for platforms)
See for complete details.
Current Status
✅ Models trained with system design knowledge - Ready to use!
- 401 training examples (pattern-based + existing data)
- 49 technology labels (including infrastructure: kafka, redis, docker, cdn, monitoring, etc.)
- 10 application patterns recognized (Twitter, YouTube, Uber, etc.)
- Production-ready architecture recommendations
Test the system:
python test_software_scorer.py
The MCP server (server.py) is already configured to use SoftwareComplexityScorer but will fail until models are trained.
Optional: Hiring vs Build classifier (binary)
To cleanly separate output schemas for job descriptions vs build requirements, you can train a small binary classifier:
Dataset (JSONL): one object per line with fields { "text": str, "label": int } where label=1 for hiring/job-description and label=0 for build/implementation. An example is in data/hiring_build_training_data.example.jsonl.
Train (PowerShell):
$env:HIRING_BUILD_DATA = "data/hiring_build_training_data.jsonl"
python train_hiring_classifier.py
This writes models/software/hiring_build_classifier.joblib. When present, SoftwareComplexityScorer will use it to detect hiring prompts with confidence thresholds (>=0.65 → hiring, <=0.35 → build) and fall back to the existing heuristic when confidence is low.
Recommended: curate ~500–1,000 labeled examples for good separation. You can bootstrap with heuristics and then hand-correct.
Workflow: evaluation, active learning, and threshold tuning
1. Evaluate classifier vs heuristic baseline:
After training with at least 100–200 examples, measure performance:
$env:HIRING_BUILD_DATA = "data/hiring_build_training_data.jsonl"
python evaluate_hiring_classifier.py --test-size 0.2
This generates:
- Precision/recall/F1 report for model and heuristic
logs/hiring_classifier_pr_curve.pngandlogs/hiring_classifier_roc_curve.pnglogs/hiring_classifier_evaluation.jsonwith AUC, AP, recommended thresholds
2. Active learning to grow dataset efficiently:
Surface uncertain examples (probabilities 0.35–0.65) for manual labeling:
python active_learning_hiring.py --unlabeled data/unlabeled_prompts.txt --limit 50 --out data/uncertain_samples.jsonl
Manually edit data/uncertain_samples.jsonl to add "label": 0 or "label": 1, then merge:
type data\hiring_build_training_data.jsonl data\uncertain_samples.jsonl > data\merged.jsonl
$env:HIRING_BUILD_DATA = "data\merged.jsonl"
python train_hiring_classifier.py
3. Tune decision threshold to minimize misclassification cost:
If false negatives (hiring → build) are more expensive than false positives (build → hiring), tune the threshold:
python tune_hiring_threshold.py --data data/hiring_build_training_data.jsonl --cost-fp 1.0 --cost-fn 2.0 --write-config
This finds the optimal threshold and writes it to config/hiring_threshold.json. Update _predict_is_hiring in the scorer to use the new threshold (default is 0.65 for hiring, 0.35 for build).
Example: if tuning suggests 0.72 for hiring, change:
if proba >= 0.72: # was 0.65
return True, proba, "model"
Iterative improvement loop:
- Train initial model with ~100 examples
- Evaluate and identify weak areas
- Use active learning to label uncertain examples
- Retrain with expanded data
- Tune threshold for production cost function
- Repeat until precision/recall targets are met
Legacy Multi-Profession Scorer (deprecated)
The original ComplexityScorer handled both software and non-software jobs with online search heuristics and profession categorization. This approach is being phased out.
An MCP (Model Context Protocol) server that predicts the complexity of programming tasks and job requirements using a machine learning model. Scores are calibrated around a baseline of 100 (roughly "Replit Agent 3" difficulty) and include estimated completion time.
Features (legacy scorer)
- ML-based predictions for complexity score and time-to-complete
- Calibrated scoring (baseline 100) with human-friendly difficulty labels
- Detected factor hints (frontend, backend, database, real-time, etc.)
- Time estimates in hours/days/weeks with uncertainty range
- Duration extraction: Automatically detects time requirements from user prompts (e.g., "couple of days", "3 weeks")
- Smart time calculation: Distinguishes between project deadlines (8-hour workdays) and continuous care (24/7)
- Job categorization: Automatically deduces job category and sub-category from requirements
- Extended profession database: Falls back to comprehensive profession database (100+ professions) when primary categorization fails
- MCP tool integration for assistants that support MCP
Prerequisites
- Python: Version 3.11 or higher
- pip: Python package installer (usually comes with Python)
- uv (recommended): Fast Python package installer and resolver
- Install via PowerShell:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" - Or via pip:
pip install uv
- Install via PowerShell:
Installation
-
Clone the repository (if not already done):
git clone https://github.com/ajay1133/mcp-server-job-description-complexity-score.git cd mcp-server-job-description-complexity-score -
Install dependencies using uv (recommended):
uv pip install -e .
Or using pip:
pip install -e .
Usage
1) Train the model (first-time or after updating data)
Models are not committed; you must train locally before use:
uv run train-model
Alternative:
python train_model.py
This creates the following files under models/:
tfidf_vectorizer.joblibscore_model.joblibtime_model.joblib
2) Running the MCP Server
Start the MCP server by running:
uv run mcp-server
Alternative:
python -m mcp_server.server
If models are missing, the server will warn you to run train_model.py first.
2b) Inspect the server with MCP Inspector (recommended)
Requires Node.js with npx:
uv run mcp-inspect-server
This launches the MCP Inspector and spawns the server via uv (stdio), ensuring it uses the same Python environment and dependencies managed by uv. Try the score_complexity tool interactively.
3) Using as a Standalone Tool
You can also import and use the complexity scorer in your Python code:
from mcp_server.complexity_scorer import ComplexityScorer
scorer = ComplexityScorer()
result = scorer.analyze_text("Build a full-stack web application with React frontend and Django backend")
print(f"Complexity Score: {result['complexity_score']}")
print(f"Difficulty: {result['difficulty_rating']}")
print(f"Summary: {result['summary']}")
4) Running Tests
Run the test suite:
python test_scoring.py
5) Running the Demo
See the time estimation feature in action:
uv run demo
This prints several examples from simple to expert-level and shows predicted time ranges.
MCP Tool: score_complexity
Description
Analyzes programming requirements or job descriptions and provides a complexity score calibrated against Replit Agent 3's capabilities.
Parameters
requirement(string): A text description of the programming requirement or job description
Returns
A dictionary containing:
complexity_score: Numerical score (baseline reference = 100)detected_factors: Map of factors and relevance signals (e.g., matches, relevance)task_size: simple | moderate | complex | very_complex | expertdifficulty_rating: Human-friendly descriptionjob_category: Deduced job category (e.g., "Software Developer", "Doctor", "Plumber")job_sub_category: Deduced job specialization (e.g., "Full Stack Developer (React + Node.js)", "Gastroenterologist", "General Plumber")category_lookup_method: How the category was determined - "primary_pattern", "extended_database", "online_search", or "default_fallback"estimated_completion_time: Object withhours,days,weeks,best_estimate,time_range,assumptionssummary: Brief summary including timemodel_type: Always"machine_learning"for this version
Example Request
{
"requirement": "Create a RESTful API with PostgreSQL database, user authentication, and real-time notifications"
}
Example Response
{
"complexity_score": 109.8,
"baseline_reference": 100,
"detected_factors": {
"database": {"matches": 1, "relevance": 0.22},
"api_integration": {"matches": 1, "relevance": 0.18},
"security": {"matches": 1, "relevance": 0.14}
},
"task_size": "complex",
"difficulty_rating": "Similar to Replit Agent 3 capabilities",
"job_category": "Software Developer",
"job_sub_category": "Backend Developer",
"category_lookup_method": "primary_pattern",
"estimated_completion_time": {
"hours": 9.1,
"days": 1.14,
"weeks": 0.23,
"best_estimate": "1.1 days",
"time_range": "1.1-1.4 days",
"assumptions": "Time estimate based on task complexity and typical completion times for similar requirements"
},
"summary": "Complexity score: 109.80. Complex task. Primary complexity factors: database, api integration, security. Estimated completion time: 1.1 days.",
"model_type": "machine_learning"
}
Complexity Factors
The model reports hints for the following factor categories (non-exhaustive examples):
- Basic Web: HTML, CSS, static sites
- Database: PostgreSQL, MySQL, MongoDB, ORMs
- API Integration: REST, GraphQL, webhooks, OAuth
- Frontend: React, Vue, Angular, TypeScript
- Backend: Django, Flask, FastAPI, Node.js
- Real-time: WebSockets, streaming, collaborative features
- AI/ML: ML pipelines, model training, OpenAI, NLP
- Deployment: CI/CD, Docker, Kubernetes, cloud platforms
- Security: Auth, encryption, JWT, RBAC
- Testing: Unit, integration, E2E, coverage
- Scalability: Caching, queues, load balancing, distributed systems
Configure factor categories (no hardcoded lists)
You can customize the factor categories without code changes:
- Edit
config/complexity_factors.jsonto add/remove categories or keywords - Or set
MCP_COMPLEXITY_FACTORSto point to a custom JSON file - Or pass a mapping directly when constructing the scorer:
ComplexityScorer(complexity_factors=...)
If no config is found, sensible built-in defaults are used.
Job Categorization
The scorer automatically deduces the job category and sub-category from the requirement text. This feature works for both software development roles and general professions.
Supported Categories
Software Development:
- Full Stack Developer (React + Node.js, Vue.js, Angular, MERN Stack)
- Frontend Developer (React, Vue, Angular)
- Backend Developer (Node.js, Django, Flask, FastAPI)
- Mobile Developer (React Native, Flutter, iOS, Android)
- AI/ML Developer
- Data Scientist
- DevOps Engineer
Healthcare:
- Doctor (Gastroenterologist, Cardiologist, Neurologist, Orthopedic Surgeon, Dermatologist, Pediatrician, Ophthalmologist, General Physician)
- Nurse (Registered Nurse, Licensed Practical Nurse)
Trades & Services:
- Plumber (Emergency Plumber, General Plumber)
- Electrician
- Carpenter
Child & Home Care:
- Child Care Provider (Nanny, Babysitter, Child Care Specialist)
- Housekeeper
- Caregiver (Home Health Aide, Home Health Aide with Housekeeping)
Professional Services:
- Lawyer (Criminal Defense, Corporate, General Practice)
- Teacher (Mathematics, Science, Language Arts)
- Accountant (CPA, General)
- Driver (Ride-share, Commercial, Personal)
Extended Professions (100+ supported via fallback database):
- Medical: Veterinarian, Dentist, Therapist, Psychologist, Pharmacist, Paramedic
- Creative: Photographer, Videographer, Graphic Designer, Writer, Musician, Artist
- Culinary: Chef, Cook, Baker, Bartender
- Trades: Mechanic, HVAC Technician, Welder, Mason, Roofer, Painter
- Services: Hairdresser, Barber, Massage Therapist, Personal Trainer
- Real Estate: Architect, Engineer, Surveyor, Contractor, Realtor
- Business: Consultant, Analyst, Banker, Broker
- Security: Security Guard, Firefighter, Police Officer
- Transportation: Pilot, Flight Attendant, Delivery Driver
- Agriculture: Farmer, Gardener, Landscaper, Florist
- And many more...
Example Job Category Deductions
| Requirement | Job Category | Job Sub-Category |
|---|---|---|
| "I need a software developer who can develop a video streaming application in React Js and Node Js" | Software Developer | Full Stack Developer (React + Node.js) |
| "I have problems with my liver" | Doctor | Gastroenterologist |
| "I need someone who can look at my child while I am gone for work" | Child Care Provider | Child Care Specialist |
| "Looking for a data scientist with machine learning experience" | Data Scientist | Machine Learning Specialist |
| "Need a mobile app developer for iOS and Android using Flutter" | Software Developer | Mobile Developer (Flutter) |
| "I need someone to look after my dad, he can barely walk due to diabetes and cannot cook his meals" | Caregiver | Home Health Aide with Housekeeping |
Category Lookup Method
The category_lookup_method field tracks how the job category was determined, providing transparency and auditability.
Lookup Methods
| Method | Description | Icon | Example |
|---|---|---|---|
primary_pattern | Detected using primary pattern matching for common professions with context-aware subcategories | 🎯 | Software Developer → "Full Stack Developer (React + Node.js)" |
extended_database | Found in extended profession database containing 100+ professions across multiple domains | 📚 | Veterinarian, Photographer, Chef, Mechanic |
online_search | Retrieved via online keyword matching fallback when primary patterns and extended database don't match | 🌐 | Uncommon professions detected via action keywords (repair, design, install, etc.) |
default_fallback | No match found in any database or online search, using generic categorization | ❓ | Very uncommon or vague requirements |
Online Search Capability
When the primary patterns and extended database fail to categorize a profession, the system automatically uses online search logic:
Triggers:
- When no primary pattern matches the requirement
- When
detected_factorsis empty (no technical keywords found) - For default fallback cases before settling on "General Professional"
Search Strategy:
- Primary Search: Attempts DuckDuckGo API instant answer lookup (when available)
- Keyword Fallback: Matches action-based keywords in the text:
- Action verbs: repair, fix, install, design, build, create, organize, plan, manage, coordinate, help, service
- Rare professions: sommelier, curator, librarian, auctioneer, appraiser, jeweler, locksmith, upholsterer, taxidermist, mortician, interpreter, translator, stenographer, actuary, statistician, meteorologist, geologist, astronomer, botanist, zoologist, ecologist
Examples:
- "I need someone to repair my antique clock" →
Repair Technician(via "repair" keyword) ✅ online_search - "Need a sommelier for my restaurant" →
Sommelier(via profession keyword) ✅ online_search - "Help me organize my event" →
Event Organizer(via "organize" keyword) ✅ online_search - "I need to design a logo" →
Designer(via "design" keyword) ✅ online_search
Usage
The category_lookup_method field helps you:
- Track categorization performance: See which method was used for each request
- Identify gaps: Find professions frequently hitting extended database or fallback
- Audit accuracy: Verify categorization logic is working as expected
- Improve coverage: Add frequently-requested professions to primary patterns
Example
{
"job_category": "Veterinarian",
"job_sub_category": "Animal Healthcare Professional",
"category_lookup_method": "extended_database"
}
This indicates the profession was found in the extended database (Tier 2), not through primary pattern matching.
Duration Extraction
The scorer automatically detects duration requirements mentioned in the user's prompt and adjusts time estimates accordingly.
Supported Duration Patterns
Numeric Patterns:
- "3 hours", "2 days", "4 weeks", "2 months"
Word-based Patterns:
- "couple of days" = 2 days
- "few days" = 3 days
- "couple of weeks" = 2 weeks
- "weekend" = 2 days
Special Cases:
- "overnight" = 12 hours
- "all day" / "full day" = 8 hours
- "half day" = 4 hours
Project Deadlines vs Continuous Care
The system intelligently distinguishes between:
-
Project Deadlines (8-hour workdays):
- Detected when phrases like "needs to be done in", "deadline", "complete in" are present
- Example: "Build a React app, needs to be done in 3 days" = 24 work hours (3 days × 8 hours)
-
Continuous Care (24/7):
- Applied to Caregiver, Nurse, Child Care Provider, Housekeeper categories
- Example: "Need someone to look after my dad for couple of days" = 48 hours (2 days × 24 hours)
Duration Extraction Examples
| Prompt | Detected Duration | Job Category | Time Estimate |
|---|---|---|---|
| "I need someone to look after my dad... I will be gone for a couple of days" | "couple of day" | Caregiver | 2.0 days (continuous care) |
| "Build a React web app, needs to be done in 3 days" | "3 day" | Software Developer | 3.0 days |
| "Need a babysitter for the weekend" | "weekend" | Child Care Provider | 2.0 days (continuous care) |
| "Need a nurse for my mother's recovery, will need help for 2 weeks" | "2 week" | Nurse | 2.0 weeks (continuous care) |
Scoring Interpretation
- < 50: Much easier than Replit Agent 3 capabilities
- 50-80: Easier than Replit Agent 3 capabilities
- 80-120: Similar to Replit Agent 3 capabilities (baseline)
- 120-150: More challenging than Replit Agent 3 capabilities
- > 150: Significantly more challenging than Replit Agent 3 capabilities
Time Estimation
The scorer provides completion time estimates based on the complexity score, assuming the developer is skilled in using AI coding agents like Replit.
Estimation Logic
- Baseline: A task with a complexity score of 100 is estimated at 8 hours (1 working day)
- Linear Scaling: Time scales proportionally with complexity score
- Task Size Adjustments: Non-linear adjustments based on task complexity:
- Simple tasks: 0.6x multiplier (complete faster than linear)
- Moderate tasks: 0.8x multiplier
- Complex tasks: 1.0x multiplier (linear)
- Very complex tasks: 1.3x multiplier (extra coordination overhead)
- Expert tasks: 1.6x multiplier (significant architectural overhead)
- Time Range: Provides a range (best estimate to 1.3x) to account for uncertainty
Time Format
The estimate automatically selects the most appropriate time unit:
- Minutes: For tasks under 1 hour
- Hours: For tasks 1-8 hours
- Days: For tasks 1-5 days (8-hour workdays)
- Weeks: For tasks over 5 days (5-day work weeks)
Example Time Estimates
| Complexity Score | Task Size | Estimated Time |
|---|---|---|
| 30 | Simple | ~1.4 hours |
| 75 | Moderate | ~4.8 hours |
| 100 | Complex | ~8 hours (1 day) |
| 150 | Very Complex | ~2.4 days |
| 200 | Expert | ~3.2 days |
Project Structure
mcp_complexity_scorer/
├── mcp_server/
│ ├── __init__.py
│ ├── server.py # MCP server implementation
│ └── complexity_scorer.py # Core ML-based scoring logic
├── complexity_mcp_project/
│ ├── __init__.py
│ ├── settings.py # Django settings
│ ├── urls.py
│ ├── wsgi.py
│ └── asgi.py
├── models/ # Trained artifacts (created by training)
│ ├── tfidf_vectorizer.joblib
│ ├── score_model.joblib
│ └── time_model.joblib
├── train_model.py # Train TF-IDF + regressors
├── training_data.py # Labeled examples and validation ranges
├── demo_time_estimation.py # Demo runner printing examples
├── pyproject.toml # Project dependencies
├── test_scoring.py # Scripted tests (ranges)
└── README.md # This file
Development
Add or refine training data
- Edit
training_data.pyand append new labeled examples. - Retrain models:
uv run train-model
- Re-run demo/tests.
License
See repository for license information.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Repository
https://github.com/ajay1133/mcp-server-job-description-complexity-score
Docker Deployment
The project includes Docker support for containerized deployment with both MCP server and Flask API modes.
Building the Docker Image
# Build the image
docker build -t mcp-complexity-scorer:latest .
# Or build with a specific tag
docker build -t your-dockerhub-username/mcp-complexity-scorer:dev .
Running with Docker
MCP Server Mode (default):
docker run -p 8000:8000 mcp-complexity-scorer:latest
Flask API Mode:
docker run -p 8000:8000 -e FLASK_MODE=1 mcp-complexity-scorer:latest
With custom configuration:
docker run -p 8000:8000 `
-e FLASK_MODE=1 `
-e HOST=0.0.0.0 `
-e PORT=8000 `
-v ${PWD}/logs:/app/logs `
mcp-complexity-scorer:latest
Docker Compose
For local development with logs mounted:
docker-compose up
The docker-compose.yml includes:
- Volume mounts for logs persistence
- Environment variables for configuration
- Port mapping to localhost:8000
Flask API Endpoints
When running in Flask mode (FLASK_MODE=1):
-
GET
/health- Health check endpointcurl http://localhost:8000/health -
POST
/score- Analyze complexitycurl -X POST http://localhost:8000/score \ -H "Content-Type: application/json" \ -d '{"requirement": "Build a React dashboard with Stripe payments"}'
CI/CD Pipeline
The project uses GitHub Actions for continuous integration and deployment across multiple environments.
Pipeline Overview
Triggers:
- Push to:
master,development,qa,uatbranches - Pull requests to any of these branches
Jobs:
- Test - Runs on Python 3.10, 3.11, 3.12
- Lint - Code quality checks (flake8, black, isort)
- Docker - Build and push images (on push only)
- Deploy - Environment-specific deployment (placeholder)
Branch → Environment Mapping
| Branch | Environment | Docker Tag | Description |
|---|---|---|---|
master | Production | prod-{sha}, prod-latest | Stable production releases |
development | Development | dev-{sha}, dev-latest | Active development |
qa | QA | qa-{sha}, qa-latest | Quality assurance testing |
uat | UAT | uat-{sha}, uat-latest | User acceptance testing |
Setting Up CI/CD
1. Configure Docker Hub Secrets
Add these secrets to your GitHub repository (Settings → Secrets and variables → Actions):
DOCKER_USERNAME: Your Docker Hub usernameDOCKER_PASSWORD: Docker Hub access token (recommended) or password
To create a Docker Hub access token:
- Log in to https://hub.docker.com
- Go to Account Settings → Security → Access Tokens
- Click "New Access Token"
- Set permissions: Read, Write, Delete
- Copy the token and add as
DOCKER_PASSWORDsecret
2. Workflow Steps
The pipeline automatically:
- ✅ Runs tests across Python 3.10, 3.11, 3.12
- ✅ Checks code style with flake8, black, isort
- ✅ Builds Docker images with environment-specific tags
- ✅ Pushes to Docker Hub on successful builds
- ℹ️ Notifies deployment (customize for your infrastructure)
3. Running Locally
Test the CI steps locally before pushing:
# Install dev dependencies
pip install pytest pytest-cov flake8 black isort
# Run tests
pytest tests/ -v --cov=mcp_server
# Check linting
flake8 mcp_server/
black --check mcp_server/
isort --check-only mcp_server/
# Build Docker
docker build -t mcp-complexity-scorer:test .
Deployment
After Docker images are pushed, customize the deploy job in .github/workflows/ci-cd.yml to:
- Deploy to Kubernetes clusters
- Update ECS task definitions
- Trigger Azure Container Instances
- Or your preferred container orchestration platform
Example Kubernetes deployment:
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/complexity-scorer \
app=${{ secrets.DOCKER_USERNAME }}/mcp-complexity-scorer:${{ env.IMAGE_TAG }} \
--namespace=${{ env.ENV_NAME }}
Monitoring CI/CD
- View workflow runs: Repository → Actions tab
- Check build logs: Click on any workflow run
- Docker images: https://hub.docker.com/r/YOUR_USERNAME/mcp-complexity-scorer