yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #73743
[Bug 1754360] Re: no unquiesce for volume backed on quiesce failure
** Changed in: nova
Assignee: Matt Riedemann (mriedem) => Eric M Gonzalez (egrh3)
** Also affects: nova/pike
Importance: Undecided
Status: New
** Also affects: nova/ocata
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
** Changed in: nova/ocata
Status: New => Confirmed
** Changed in: nova/pike
Status: New => Confirmed
** Changed in: nova/queens
Status: New => Confirmed
** Changed in: nova/ocata
Importance: Undecided => Medium
** Changed in: nova/queens
Importance: Undecided => Medium
** Changed in: nova/pike
Importance: Undecided => Medium
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1754360
Title:
no unquiesce for volume backed on quiesce failure
Status in OpenStack Compute (nova):
In Progress
Status in OpenStack Compute (nova) ocata series:
Confirmed
Status in OpenStack Compute (nova) pike series:
Confirmed
Status in OpenStack Compute (nova) queens series:
In Progress
Bug description:
Extension of bug #1731986;
The above bug and fix catches errors that occur during the snapshot of
an instance's volumes. I later discovered that a failure can occur
during the call to quisce_instance() that raises an uncaught
Exceptions through snapshot_volume_backed() that can leave the
instance frozen / quiesced.
Replication is tricky; my failures result during the RPC call to the
compute host and a MessagingTimeout waiting for a reply. I have not
found a way to handily replicate this. My compute combination is: Nova
Mitaka, Libvirt-1.3.1, & Ceph Jewel
Similar to the above bug, this condition was discovered in Mitaka and
the issue remains in Queens.
My proposed patch adds a blanket Exception catch around the call to
rpcapi.quiesce_instance(), logs the caught exception, and issues an
immediate rpcapi.unquiesce_instance() in order to thaw the instance.
Stack trace from nova-api-os container, responsible for quiesce /
unquiesce of instance during snapshot:
[req-6229d689-dcc3-41ca-99b5-3dfc04e1e994 50505ffa89754660b4e6f7ebf69532b5 24bfcdab70714b85b5cb9f5f8270a414 - - -] Unexpected exception in API method
Traceback (most recent call last):
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/openstack/extensions.py", line 478, in wrapped
return f(*args, **kwargs)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/openstack/common.py", line 391, in inner
return f(*args, **kwargs)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 73, in wrapper
return func(*args, **kwargs)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 73, in wrapper
return func(*args, **kwargs)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 1108, in _action_create_image
metadata)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/api.py", line 140, in inner
return f(self, context, instance, *args, **kw)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/api.py", line 2389, in snapshot_volume_backed
mapping=None)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
self.force_reraise()
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
six.reraise(self.type_, self.value, self.tb)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/api.py", line 2368, in snapshot_volume_backed
self.compute_rpcapi.quiesce_instance(context, instance)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 1041, in quiesce_instance
return cctxt.call(ctxt, 'quiesce_instance', instance=instance)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 158, in call
retry=self.retry)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send
timeout=timeout, retry=retry)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
retry=retry)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send
result = self._waiter.wait(msg_id, timeout)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait
message = self.waiters.get(msg_id, timeout=timeout)
File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get
'to message ID %s' % msg_id)
MessagingTimeout: Timed out waiting for a reply to message ID 70ee5f80284b4b68a289bf232b89325c
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1754360/+subscriptions
References