← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2065927] [NEW] cpu power management can fail with OSError: [Errno 16] Device or resource busy

 

Public bug reported:

as reported downstream in https://issues.redhat.com/browse/OSPRH-7103

if you create a vm, reboot the host, start the vm, 
and finally delete it.

that may fail

May 16 15:54:26 edpm-compute-0 nova_compute[3396]: Traceback (most recent call last):
May 16 15:54:26 edpm-compute-0 nova_compute[3396]:   File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 57, in write_sys
May 16 15:54:26 edpm-compute-0 nova_compute[3396]:     fd.write(data)
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: OSError: [Errno 16] Device or resource busy

this prevents the VM from being deleted on the inial request but it can
then be deleted if you try again

this race condition with the kernel is unlikely to happen and appeared
to be timing related.

i.e. there is a short period of time where onlineing or offlining of a
CPU may not be possible


to mitigation this nova should retry the operation with a backoff and then eventually squash the error allowing the vm to delete without failing if we cant offline the core.


power management of the core should never block or cause the vm delete to fail.

** Affects: nova
     Importance: Low
     Assignee: sean mooney (sean-k-mooney)
         Status: Triaged

** Affects: nova/2024.1
     Importance: Low
         Status: Triaged

** Affects: nova/antelope
     Importance: Low
         Status: Triaged

** Affects: nova/bobcat
     Importance: Low
         Status: Triaged


** Tags: libvirt

** Changed in: nova
     Assignee: (unassigned) => sean mooney (sean-k-mooney)

** Also affects: nova/bobcat
   Importance: Undecided
       Status: New

** Also affects: nova/antelope
   Importance: Undecided
       Status: New

** Also affects: nova/2024.1
   Importance: Undecided
       Status: New

** Changed in: nova/2024.1
       Status: New => Triaged

** Changed in: nova/2024.1
   Importance: Undecided => Low

** Changed in: nova/antelope
       Status: New => Triaged

** Changed in: nova/antelope
   Importance: Undecided => Low

** Changed in: nova/bobcat
       Status: New => Triaged

** Changed in: nova/bobcat
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2065927

Title:
  cpu power management can fail  with OSError: [Errno 16] Device or
  resource busy

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) 2024.1 series:
  Triaged
Status in OpenStack Compute (nova) antelope series:
  Triaged
Status in OpenStack Compute (nova) bobcat series:
  Triaged

Bug description:
  as reported downstream in https://issues.redhat.com/browse/OSPRH-7103

  if you create a vm, reboot the host, start the vm, 
  and finally delete it.

  that may fail

  May 16 15:54:26 edpm-compute-0 nova_compute[3396]: Traceback (most recent call last):
  May 16 15:54:26 edpm-compute-0 nova_compute[3396]:   File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 57, in write_sys
  May 16 15:54:26 edpm-compute-0 nova_compute[3396]:     fd.write(data)
  May 16 15:54:26 edpm-compute-0 nova_compute[3396]: OSError: [Errno 16] Device or resource busy

  this prevents the VM from being deleted on the inial request but it
  can then be deleted if you try again

  this race condition with the kernel is unlikely to happen and appeared
  to be timing related.

  i.e. there is a short period of time where onlineing or offlining of a
  CPU may not be possible

  
  to mitigation this nova should retry the operation with a backoff and then eventually squash the error allowing the vm to delete without failing if we cant offline the core.

  
  power management of the core should never block or cause the vm delete to fail.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2065927/+subscriptions



Follow ups