← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1774252] [NEW] Resize confirm fails if nova-compute is restarted after resize

 

Public bug reported:

Originally reported in RH bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1584315

Reproduced on OSP12 (Pike).

After resizing an instance but before confirm, update_available_resource
will fail on the source compute due to bug 1774249. If nova compute is
restarted at this point before the resize is confirmed, the
update_available_resource period task will never have succeeded, and
therefore ResourceTracker's compute_nodes dict will not be populated at
all.

When confirm calls _delete_allocation_after_move() it will fail with
ComputeHostNotFound because there is no entry for the current node in
ResourceTracker. The error looks like:

2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [req-4f7d5d63-fc05-46ed-b505-41050d889752 09abbd4893bb45eea8fb1d5e40635339 d4483d13a6ef41b2ae575ddbd0c59141 - default default] [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Setting instance vm_state to ERROR: ComputeHostNotFound: Compute host compute-1.localdomain could not be found.
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Traceback (most recent call last):
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7445, in _error_out_instance_on_exception
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     yield
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3757, in _confirm_resize
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     migration.source_node)
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3790, in _delete_allocation_after_move
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     cn_uuid = rt.get_node_uuid(nodename)
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 155, in get_node_uuid
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     raise exception.ComputeHostNotFound(host=nodename)
2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] ComputeHostNotFound: Compute host compute-1.localdomain could not be found.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1774252

Title:
  Resize confirm fails if nova-compute is restarted after resize

Status in OpenStack Compute (nova):
  New

Bug description:
  Originally reported in RH bugzilla:
  https://bugzilla.redhat.com/show_bug.cgi?id=1584315

  Reproduced on OSP12 (Pike).

  After resizing an instance but before confirm,
  update_available_resource will fail on the source compute due to bug
  1774249. If nova compute is restarted at this point before the resize
  is confirmed, the update_available_resource period task will never
  have succeeded, and therefore ResourceTracker's compute_nodes dict
  will not be populated at all.

  When confirm calls _delete_allocation_after_move() it will fail with
  ComputeHostNotFound because there is no entry for the current node in
  ResourceTracker. The error looks like:

  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [req-4f7d5d63-fc05-46ed-b505-41050d889752 09abbd4893bb45eea8fb1d5e40635339 d4483d13a6ef41b2ae575ddbd0c59141 - default default] [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Setting instance vm_state to ERROR: ComputeHostNotFound: Compute host compute-1.localdomain could not be found.
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] Traceback (most recent call last):
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7445, in _error_out_instance_on_exception
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     yield
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3757, in _confirm_resize
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     migration.source_node)
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3790, in _delete_allocation_after_move
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     cn_uuid = rt.get_node_uuid(nodename)
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]   File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 155, in get_node_uuid
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0]     raise exception.ComputeHostNotFound(host=nodename)
  2018-05-30 13:42:19.239 1 ERROR nova.compute.manager [instance: 1374133a-2c08-4a8f-94f6-729d4e58d7e0] ComputeHostNotFound: Compute host compute-1.localdomain could not be found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1774252/+subscriptions


Follow ups