yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94173
[Bug 2065927] Re: cpu power management can fail with OSError: [Errno 16] Device or resource busy
Reviewed: https://review.opendev.org/c/openstack/nova/+/920203
Committed: https://opendev.org/openstack/nova/commit/44c1b48b3121682cf959c90b3adaf2a3f92e318c
Submitter: "Zuul (22348)"
Branch: master
commit 44c1b48b3121682cf959c90b3adaf2a3f92e318c
Author: Sean Mooney <work@xxxxxxxxxxxxxxx>
Date: Wed May 22 18:59:02 2024 +0100
retry write_sys call on device busy
This change adds a retry_if_busy decorator
to the read_sys and write_sys functions in the filesystem
module that will retry reads and writes up to 5 times with
an linear backoff.
This allows nova to tolerate short periods of time where
sysfs retruns device busy. If the reties are exausted
and offlineing a core fails a warning is log and the failure is
ignored. onling a core is always treated as a hard error if
retries are exausted.
Closes-Bug: #2065927
Change-Id: I2a6a9f243cb403167620405e167a8dd2bbf3fa79
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2065927
Title:
cpu power management can fail with OSError: [Errno 16] Device or
resource busy
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) 2024.1 series:
Triaged
Status in OpenStack Compute (nova) antelope series:
Triaged
Status in OpenStack Compute (nova) bobcat series:
Triaged
Bug description:
as reported downstream in https://issues.redhat.com/browse/OSPRH-7103
if you create a vm, reboot the host, start the vm,
and finally delete it.
that may fail
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: Traceback (most recent call last):
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: File "/usr/lib/python3.9/site-packages/nova/filesystem.py", line 57, in write_sys
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: fd.write(data)
May 16 15:54:26 edpm-compute-0 nova_compute[3396]: OSError: [Errno 16] Device or resource busy
this prevents the VM from being deleted on the inial request but it
can then be deleted if you try again
this race condition with the kernel is unlikely to happen and appeared
to be timing related.
i.e. there is a short period of time where onlineing or offlining of a
CPU may not be possible
to mitigation this nova should retry the operation with a backoff and then eventually squash the error allowing the vm to delete without failing if we cant offline the core.
power management of the core should never block or cause the vm delete to fail.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2065927/+subscriptions
References