← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1567336] [NEW] instance_info_cache_update() is not retried on deadlock

 

Public bug reported:

Description
=========

When Galera is used in multi-writer mode it's possible that
instance_info_cache_update()  DB API method will be called for the very
same database row concurrently on two different MySQL servers. Due to
how Galera works internally, it will cause a deadlock exception for one
of the callers (see http://www.joinfu.com/2015/01/understanding-
reservations-concurrency-locking-in-nova/ for details).

instance_info_cache_update() is not currently retried on deadlock.
Should it happen an operation in question may fail, e.g. association of
a floating IP.


Steps to reproduce
===============

1. Deploy Galera cluster in multi-writer mode.
2. Ensure there is at least two nova-conductor using two different MySQL servers in the Galera cluster.
3. Create an instance.
4. Associate / disassociate floating IPs concurrently (e.g. via Rally)


Expected result
=============

All associate / disassociate operations succeed.


Actual result
==========

One or more operations fail with an exception in python-novaclient:

  File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 662, in remove_floating_ip
    self._action('removeFloatingIp', server, {'address': address})
  File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1279, in _action
    return self.api.client.post(url, body=body)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 449, in post
    return self._cs_request(url, 'POST', **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 424, in _cs_request
    resp, body = self._time_request(url, method, **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 397, in _time_request
    resp, body = self.request(url, method, **kwargs)
  File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 391, in request
    raise exceptions.from_response(resp, body, url, method)
ClientException: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'oslo_db.exception.DBDeadlock'> (HTTP 500) (Request-ID: req-ac412e1c-afcf-4ef3-accc-b5463805ca74)


Environment
==========

OpenStack Liberty
Galera cluster (3 nodes) running in multiwriter mode

** Affects: nova
     Importance: Medium
     Assignee: Roman Podoliaka (rpodolyaka)
         Status: In Progress


** Tags: db

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1567336

Title:
  instance_info_cache_update() is not retried on deadlock

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Description
  =========

  When Galera is used in multi-writer mode it's possible that
  instance_info_cache_update()  DB API method will be called for the
  very same database row concurrently on two different MySQL servers.
  Due to how Galera works internally, it will cause a deadlock exception
  for one of the callers (see http://www.joinfu.com/2015/01
  /understanding-reservations-concurrency-locking-in-nova/ for details).

  instance_info_cache_update() is not currently retried on deadlock.
  Should it happen an operation in question may fail, e.g. association
  of a floating IP.

  
  Steps to reproduce
  ===============

  1. Deploy Galera cluster in multi-writer mode.
  2. Ensure there is at least two nova-conductor using two different MySQL servers in the Galera cluster.
  3. Create an instance.
  4. Associate / disassociate floating IPs concurrently (e.g. via Rally)

  
  Expected result
  =============

  All associate / disassociate operations succeed.

  
  Actual result
  ==========

  One or more operations fail with an exception in python-novaclient:

    File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 662, in remove_floating_ip
      self._action('removeFloatingIp', server, {'address': address})
    File "/usr/lib/python2.7/site-packages/novaclient/v2/servers.py", line 1279, in _action
      return self.api.client.post(url, body=body)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 449, in post
      return self._cs_request(url, 'POST', **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 424, in _cs_request
      resp, body = self._time_request(url, method, **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 397, in _time_request
      resp, body = self.request(url, method, **kwargs)
    File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 391, in request
      raise exceptions.from_response(resp, body, url, method)
  ClientException: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
  <class 'oslo_db.exception.DBDeadlock'> (HTTP 500) (Request-ID: req-ac412e1c-afcf-4ef3-accc-b5463805ca74)

  
  Environment
  ==========

  OpenStack Liberty
  Galera cluster (3 nodes) running in multiwriter mode

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1567336/+subscriptions


Follow ups