yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #10650
[Bug 1253455] Re: Compute node stats update may lead to DBDeadlock
** Changed in: nova
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1253455
Title:
Compute node stats update may lead to DBDeadlock
Status in OpenStack Compute (Nova):
Fix Released
Bug description:
During a tempest run, when a compute node's usage stats are updated on
the DB as part of resource claiming for an instance spawn, we hit a
DBDeadlock exception:
File ".../nova/compute/manager.py", line 1002, in _build_instance
with rt.instance_claim(context, instance, limits):
File ".../nova/openstack/common/lockutils.py", line 248, in inner
return f(*args, **kwargs)
File ".../nova/compute/resource_tracker.py", line 126, in instance_claim
self._update(elevated, self.compute_node)
File ".../nova/compute/resource_tracker.py", line 429, in _update
context, self.compute_node, values, prune_stats)
File ".../nova/conductor/api.py", line 240, in compute_node_update
prune_stats)
File ".../nova/conductor/rpcapi.py", line 363, in compute_node_update
prune_stats=prune_stats)
File ".../nova/rpcclient.py", line 85, in call
return self._invoke(self.proxy.call, ctxt, method, **kwargs)
File ".../nova/rpcclient.py", line 63, in _invoke
return cast_or_call(ctxt, msg, **self.kwargs)
File ".../nova/openstack/common/rpc/proxy.py", line 126, in call
result = rpc.call(context, real_topic, msg, timeout)
File ".../nova/openstack/common/rpc/__init__.py", line 139, in call
return _get_impl().call(CONF, context, topic, msg, timeout)
File ".../nova/openstack/common/rpc/impl_kombu.py", line 816, in call
rpc_amqp.get_connection_pool(conf, Connection))
File ".../nova/openstack/common/rpc/amqp.py", line 574, in call
rv = list(rv)
File ".../nova/openstack/common/rpc/amqp.py", line 539, in __iter__
raise result
RemoteError: Remote error: DBDeadlock (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE compute_nodes SET updated_at=%s, hypervisor_version=%s WHERE compute_nodes.id = %s' (datetime.datetime(2013, 11, 20, 18, 28, 19, 525920), u'5.1.0 1)
(A more complete log is at http://paste.openstack.org/raw/53702/)
Can someone characterize the conditions under which this type of
errors can occur?
Perhaps sqlchemy.api.compute_node_update() needs the
@_retry_on_deadlock treatment?
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1253455/+subscriptions