yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #48791
[Bug 1276214] Re: Live migration failure in API doesn't revert task_state to None
Reviewed: https://review.openstack.org/168916
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f2a1f00829e849e78f850a73489864e57cbd86b3
Submitter: Jenkins
Branch: master
commit f2a1f00829e849e78f850a73489864e57cbd86b3
Author: Maciej Szankin <maciej.szankin@xxxxxxxxx>
Date: Fri Feb 26 11:18:51 2016 +0100
Live migration failure in API leaves VM in MIGRATING state
When nova-api calls nova-conductor a RPC MessagingTimeout might
occur. In such case we shouldn't leave VM in MIGRATING state. Possible
scenarios are:
* nova-conductor received message but failed to respond, no additional
exceptions raised - live migration will start, VM will be moved to
destination host
* nova-conductor received message but failed to respond, additional
exception raised (e.g., LibvirtError) - LM will not start
* nova-api couldn't reach nova-conductor - LM will not start
Because we can't predict in API layer what happened below, this patch
writes instance fault to database when MessagingTimeout is caught.
Co-Authored-By: Pawel Koniszewski <pawel.koniszewski@xxxxxxxxx>
Bartosz Fic <bartosz.fic@xxxxxxxxx>
Closes-Bug: #1276214
Change-Id: Id800e925fbb689d20e7907b698b67c92fd3da979
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1276214
Title:
Live migration failure in API doesn't revert task_state to None
Status in OpenStack Compute (nova):
Fix Released
Bug description:
If API times out on a RPC during the processing of a migrate_server it
does not revert the task_state back to NULL before or after sending
the error response back to the user. This can prevent further API
operations on the VM and leave a good VMs in non-operable state with
the exception of perhaps a delete.
This is one possible reproducer. I'm not sure if this is always true,
and I'd appreciate if someone else confirm it.
1. Somehow make RPC requests hang
2. Issue a live migration request
3. The call should return an HTTP error (409 perhaps)
4. Check VM. It should be in a good state but the task_state stuck in 'migrating'
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1276214/+subscriptions
References