yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #50753
[Bug 1567336] Re: instance_info_cache_update() is not retried on deadlock
Reviewed: https://review.openstack.org/302714
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=51d46d38a1d0b5ee7023acc627d4694a0d67cce3
Submitter: Jenkins
Branch: master
commit 51d46d38a1d0b5ee7023acc627d4694a0d67cce3
Author: Roman Podoliaka <rpodolyaka@xxxxxxxxxxxx>
Date: Thu Apr 7 13:25:03 2016 +0300
db: retry instance_info_cache_update() on deadlock
If a Galera cluster is used in multi-writer mode it's possible, that
instance_info_cache_update() will be executed concurrently on two
different MySQL hosts for the very same row, which causes a deadlock
exception for one of the callers due to how Galera works internally.
This can affect operations like association or disassociation of
floating IPs, which will fail, if instance_info_cache_update() does
not handle deadlocks gracefully, i.e. is not retried.
Closes-Bug: #1567336
Change-Id: Ib5abffd94d2480dfbcc8b6cca7b1c73ce39e7d10
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1567336
Title:
instance_info_cache_update() is not retried on deadlock
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
=========
When Galera is used in multi-writer mode it's possible that
instance_info_cache_update() DB API method will be called for the
very same database row concurrently on two different MySQL servers.
Due to how Galera works internally, it will cause a deadlock exception
for one of the callers (see http://www.joinfu.com/2015/01
/understanding-reservations-concurrency-locking-in-nova/ for details).
instance_info_cache_update() is not currently retried on deadlock.
Should it happen an operation in question may fail, e.g. association
of a floating IP.
Steps to reproduce
===============
1. Deploy Galera cluster in multi-writer mode.
2. Ensure there is at least two nova-conductor using two different MySQL servers in the Galera cluster.
3. Create an instance.
4. Associate / disassociate floating IPs concurrently (e.g. via Rally)
Expected result
=============
All associate / disassociate operations succeed.
Actual result
==========
One or more operations fail with an exception in python-novaclient:
File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 662, in remove_floating_ip
self._action('removeFloatingIp', server, {'address': address})
File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1279, in _action
return self.api.client.post(url, body=body)
File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 449, in post
return self._cs_request(url, 'POST', **kwargs)
File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 424, in _cs_request
resp, body = self._time_request(url, method, **kwargs)
File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 397, in _time_request
resp, body = self.request(url, method, **kwargs)
File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 391, in request
raise exceptions.from_response(resp, body, url, method)
ClientException: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_db.exception.DBDeadlock'> (HTTP 500) (Request-ID: req-ac412e1c-afcf-4ef3-accc-b5463805ca74)
Environment
==========
OpenStack Liberty
Galera cluster (3 nodes) running in multiwriter mode
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1567336/+subscriptions
References