yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87233
[Bug 1944759] [NEW] confirm resize fails with CPUUnpinningInvalid
Public bug reported:
Nova has a race condition between resize_instance() compute manager call
and the update_available_resources periodic job. If they overlap at the
right place, when resize_instance calls finish_resize, then periodic job
will not track the migration nor the instance on the source host. It
causes that the PCPU allocation on the source host is dropped in the
resource tracker (not in placement). Then when the resize is confirmed
nova tries to free the pinned cpus again on the source host and fails
with CPUUnpinningInvalid as they are already freed.
I've pushed a reproduction test:
https://review.opendev.org/c/openstack/nova/+/810763
It is reproducible at least on master, xena, wallaby, and victoria
** Affects: nova
Importance: Medium
Assignee: Balazs Gibizer (balazs-gibizer)
Status: New
** Tags: compute numa race-condition resize
** Changed in: nova
Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)
** Changed in: nova
Importance: Undecided => Medium
** Description changed:
Nova has a race condition between resize_instance() compute manager call
and the update_available_resources periodic job. If they overlap at the
right place, when resize_instance calls finish_resize, then periodic job
will not track the migration nor the instance on the source host. It
causes that the PCPU allocation on the source host is dropped in the
resource tracker (not in placement). Then when the resize is confirmed
nova tries to free the pinned cpus again on the source host and fails
with CPUUnpinningInvalid as they are already freed.
I will push a reproduction test soon.
+
+ It is reproducible at least on master, xena, wallaby, and victoria
** Tags added: compute numa race-condition resize
** Description changed:
Nova has a race condition between resize_instance() compute manager call
and the update_available_resources periodic job. If they overlap at the
right place, when resize_instance calls finish_resize, then periodic job
will not track the migration nor the instance on the source host. It
causes that the PCPU allocation on the source host is dropped in the
resource tracker (not in placement). Then when the resize is confirmed
nova tries to free the pinned cpus again on the source host and fails
with CPUUnpinningInvalid as they are already freed.
- I will push a reproduction test soon.
+ I've pushed a reproduction test:
+ https://review.opendev.org/c/openstack/nova/+/810763
It is reproducible at least on master, xena, wallaby, and victoria
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1944759
Title:
confirm resize fails with CPUUnpinningInvalid
Status in OpenStack Compute (nova):
New
Bug description:
Nova has a race condition between resize_instance() compute manager
call and the update_available_resources periodic job. If they overlap
at the right place, when resize_instance calls finish_resize, then
periodic job will not track the migration nor the instance on the
source host. It causes that the PCPU allocation on the source host is
dropped in the resource tracker (not in placement). Then when the
resize is confirmed nova tries to free the pinned cpus again on the
source host and fails with CPUUnpinningInvalid as they are already
freed.
I've pushed a reproduction test:
https://review.opendev.org/c/openstack/nova/+/810763
It is reproducible at least on master, xena, wallaby, and victoria
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1944759/+subscriptions
Follow ups