yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1253455] Re: Compute node stats update may lead to DBDeadlock

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Thierry Carrez <thierry.carrez+lp@xxxxxxxxx>
Date: Wed, 05 Mar 2014 13:07:50 -0000
Reply-to: Bug 1253455 <1253455@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: nova
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1253455

Title:
  Compute node stats update may lead to DBDeadlock

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  During a tempest run, when a compute node's usage stats are updated on
  the DB as part of resource claiming for an instance spawn, we hit a
  DBDeadlock exception:

  File ".../nova/compute/manager.py", line 1002, in _build_instance
   with rt.instance_claim(context, instance, limits):
      File ".../nova/openstack/common/lockutils.py", line 248, in inner
   return f(*args, **kwargs)
      File ".../nova/compute/resource_tracker.py", line 126, in instance_claim
   self._update(elevated, self.compute_node)
      File ".../nova/compute/resource_tracker.py", line 429, in _update
   context, self.compute_node, values, prune_stats)
      File ".../nova/conductor/api.py", line 240, in compute_node_update
   prune_stats)
      File ".../nova/conductor/rpcapi.py", line 363, in compute_node_update
   prune_stats=prune_stats)
      File ".../nova/rpcclient.py", line 85, in call
   return self._invoke(self.proxy.call, ctxt, method, **kwargs)
      File ".../nova/rpcclient.py", line 63, in _invoke
   return cast_or_call(ctxt, msg, **self.kwargs)
      File ".../nova/openstack/common/rpc/proxy.py", line 126, in call
   result = rpc.call(context, real_topic, msg, timeout)
      File ".../nova/openstack/common/rpc/__init__.py", line 139, in call
   return _get_impl().call(CONF, context, topic, msg, timeout)
      File ".../nova/openstack/common/rpc/impl_kombu.py", line 816, in call
   rpc_amqp.get_connection_pool(conf, Connection))
      File ".../nova/openstack/common/rpc/amqp.py", line 574, in call
   rv = list(rv)
      File ".../nova/openstack/common/rpc/amqp.py", line 539, in __iter__
   raise result
    RemoteError: Remote error: DBDeadlock (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE compute_nodes SET updated_at=%s, hypervisor_version=%s WHERE compute_nodes.id = %s' (datetime.datetime(2013, 11, 20, 18, 28, 19, 525920), u'5.1.0 1)

  (A more complete log is at http://paste.openstack.org/raw/53702/)

  Can someone characterize the conditions under which this type of
  errors can occur?

  Perhaps sqlchemy.api.compute_node_update() needs the
  @_retry_on_deadlock treatment?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1253455/+subscriptions