cv-mcp-server-demo

nudro/cv-mcp-server-demo

3.1

If you are the rightful owner of cv-mcp-server-demo and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Computer Vision MCP Demo is a comprehensive showcase of real-time person detection, tracking, and activity recognition using YOLO models, designed for integration with the GitHub MCP server and as a standalone demo for local computer vision tasks.

Computer Vision MCP Demo

A comprehensive computer vision demo showcasing real-time person detection, tracking, and activity recognition using YOLO models. This repo is designed for use with the GitHub MCP server (as provided by Cursor IDE) and as a standalone demo for local computer vision tasks.

šŸš€ Features

  • Real-time Person Detection: YOLO-based person detection with webcam integration
  • Activity Recognition: X-CLIP powered action recognition for 15+ activities
  • Object Tracking: Persistent person tracking across video frames
  • Multiple Demo Scripts: Standalone scripts for different use cases

šŸ“‹ Prerequisites

  • Python 3.8+
  • Apple Silicon Mac (M1/M2/M3) for optimal performance
  • Webcam access
  • Git

šŸ› ļø Installation

  1. Clone the repository:

    git clone https://github.com/nudro/cv-mcp-server-demo.git
    cd cv-mcp-server-demo
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download YOLO models (automatically on first run):

    • YOLO11s.pt (high accuracy)
    • YOLO11n.pt (fast inference)

šŸŽÆ Quick Start

Person Detection Only

python direct_webcam_person_detection.py

Activity Recognition

python direct_activity_recognition.py

MCP Server (for advanced users)

If you want to run your own MCP server (not required for most Cursor users):

python cv_mcp_server_advanced.py

Note: For most users, the MCP server is provided by Cursor IDE and you do not need to run your own server. Use the local scripts for direct computer vision demos.

šŸ“ Project Structure

cv-mcp-server-demo/
ā”œā”€ā”€ README.md                           # This file
ā”œā”€ā”€ requirements.txt                    # Python dependencies
ā”œā”€ā”€ cv_mcp_server_advanced.py           # (Optional) Standalone MCP server implementation
ā”œā”€ā”€ direct_webcam_person_detection.py   # Person detection demo
ā”œā”€ā”€ direct_activity_recognition.py      # Activity recognition demo
ā”œā”€ā”€ ultralytics_action_recognition.py   # Ultralytics action recognition example
ā”œā”€ā”€ ultralytics_interactive_tracker.py  # Ultralytics interactive tracker example
ā”œā”€ā”€ cursor_mcp_config.json              # Cursor IDE MCP configuration
ā”œā”€ā”€ mcp_config.json                     # Standard MCP configuration
ā”œā”€ā”€ LICENSE                             # MIT License
ā”œā”€ā”€ .gitignore                          # Git ignore file

šŸŽ® Usage

Person Detection

  • Purpose: Real-time person detection using YOLO
  • Features:
    • Person-only detection (class 0)
    • Bounding box visualization
    • Confidence scores
    • Center point tracking
  • Controls: Press 'q' to quit

Activity Recognition

  • Purpose: Detect and classify human activities
  • Supported Actions:
    • walking, running, sitting, standing, dancing
    • jumping, waving, clapping, lying down, cooking
    • reading, writing, typing, exercising, stretching
  • Features:
    • Person tracking with unique IDs
    • Action classification with confidence scores
    • Temporal analysis (8-frame sequences)
  • Controls: Press 'q' to quit

MCP Server (Advanced)

  • Purpose: Model Context Protocol server for AI tool integration
  • Features:
    • Tool registration for computer vision tasks
    • Resource management
    • Async processing capabilities
  • Configuration: Use provided config files for IDE integration
  • Note: Most users should use the MCP server provided by Cursor IDE.

šŸ”§ Configuration

Cursor IDE Integration

Copy cursor_mcp_config.json to your Cursor configuration directory if you want to use your own server:

cp cursor_mcp_config.json ~/.cursor/mcp_servers/

Standard MCP Configuration

Use mcp_config.json for standard MCP client integration.

šŸ“Š Performance

  • Detection Speed: ~15ms per frame (60+ FPS)
  • Action Recognition: Every 10 frames for efficiency
  • Device: Optimized for Apple Silicon (MPS)
  • Memory: Efficient GPU memory usage

šŸ› ļø Development

Adding New Actions

Edit direct_activity_recognition.py and modify the ACTION_LABELS list:

ACTION_LABELS = [
    "walking", "running", "sitting", "standing", "dancing", 
    "jumping", "waving", "clapping", "lying down", "cooking",
    "reading", "writing", "typing", "exercising", "stretching",
    "your_new_action"  # Add here
]

Customizing Detection

Modify confidence thresholds and detection parameters in the demo scripts:

# Person detection confidence
results = yolo.track(frame, persist=True, classes=[0], conf=0.5)

# Action recognition confidence
if confidence > 0.3:  # Adjust threshold

šŸ¤ Contributing & Extending

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

Extending with MCP Tools

If you want to register your own tools with the GitHub MCP server (as used by Cursor), see the Cursor documentation or GitHub MCP server docs for plugin/tool API details.

šŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

šŸ™ Acknowledgments

  • Ultralytics: YOLO models and examples
  • Microsoft: X-CLIP action recognition model
  • OpenCV: Computer vision framework
  • PyTorch: Deep learning framework

šŸ› Troubleshooting

Common Issues

  1. Webcam not found:

    • Ensure webcam permissions are granted
    • Check if another application is using the webcam
  2. Model download issues:

    • Check internet connection
    • Models download automatically on first run
  3. Performance issues:

    • Ensure you're using Apple Silicon for optimal performance
    • Close other GPU-intensive applications
  4. MCP server errors:

    • If using Cursor, ensure the GitHub MCP server is running (usually automatic)
    • If running your own server, check MCP library version compatibility and configuration files

Support

For issues and questions:

  1. Check the troubleshooting section
  2. Review the code comments
  3. Open an issue on GitHub

šŸ“ˆ Future Enhancements

  • Multi-person activity recognition
  • Gesture recognition
  • Pose estimation integration
  • Video file processing
  • Real-time analytics dashboard
  • Mobile app integration
  • Cloud deployment support