← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1283522] Re: DB lock timeout errors with parallel operations

 

** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1283522

Title:
  DB lock timeout errors with parallel operations

Status in OpenStack Neutron (virtual network service):
  Fix Released

Bug description:
  Since the neutron full job has been enabled in non-voting mode a worrying number of lock timeout errors are appearing.
  An analysis of 60 random failures revealed that this errors are responsible for 15 (25%) of failures of the full jobs.

  Some examples here:
  http://paste.openstack.org/show/68417/
  http://paste.openstack.org/show/68413/

  It is worth noting that offending queries are seldom the same, and
  that the root cause apparently lies in the well-known eventlet/mysql
  deadlock condition, which is exacerbated by the fact that the there
  are now a consistent number of agents reporting to the neutron server.

  This bug should be regarded as an "umbrella bug" whose main purpose is to track failure frequency with elastic recheck.
  Feel free to submit new bugs to fix specific lock timeout issues; or use this bug report specifying "partial-bug" in the commit message.

  A rough logstash query is here:
  http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiKE9wZXJhdGlvbmFsRXJyb3IpICgxMjA1LCAnTG9jayB3YWl0IHRpbWVvdXQgZXhjZWVkZWQ7IHRyeSByZXN0YXJ0aW5nIHRyYW5zYWN0aW9uJylcIiBBTkQgTk9UIG1lc3NhZ2U6XCJUcmFjZWJhY2sgKG1vc3QgcmVjZW50IGNhbGwgbGFzdFwiIEFORCBidWlsZF9uYW1lOlwiY2hlY2stdGVtcGVzdC1kc3ZtLW5ldXRyb24tZnVsbFwiIEFORCBidWlsZF9icmFuY2g6XCJtYXN0ZXJcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiMTcyODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MzA5NDgyOTUzMiwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

  The query (as of now) reports 106 hits in 48 hours. In some tests the
  failure happens multiple times; scoring by build_uuid reveals that
  there are 25 failing builds, which is still a lot.

  This bug need an elastic-recheck query

  Here is a flow-chart describing the eventlet/mysql deadlock.
  https://docs.google.com/drawings/d/13A2x4AWbf8zmzeGApUmYVlBrW8CMTPFTCBGSP_nTzDA

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1283522/+subscriptions


References