Skip to content

Add GPU monitoring & forecast for VMs #7221

@josepselga

Description

@josepselga

Description
This feature adds GPU-level telemetry collection inside VMs by leveraging the QEMU Guest Agent to run nvidia-smi. It enables OpenNebula to monitor critical GPU metrics such as utilization, memory usage, temperature, and power consumption.

Use case
Users running AI/ML workloads in OpenNebula need visibility into GPU performance and health to ensure efficient scheduling, workload balancing, and troubleshooting.

Interface Changes
CLI: GPU metrics will be included in onevm show.

(Pending) Sunstone: GPU monitoring panel/tab in the VM view.

Metrics collection will skip VMs without assigned NVIDIA devices.

Additional Context
Default metrics gathered:

  • gpu_count – Number of GPUs
  • utilization.gpu – GPU core usage (%)
  • utilization.memory – Memory bandwidth utilization (%)
  • memory.free – Free GPU memory (MiB)
  • power.draw – Power draw (Watts)

Progress Status

  • Code committed
  • Testing - QA
  • Documentation (Release notes - resolved issues, compatibility, known issues)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions