yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #12719
[Bug 1283522] Re: DB lock timeout errors with parallel operations
** Changed in: neutron
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1283522
Title:
DB lock timeout errors with parallel operations
Status in OpenStack Neutron (virtual network service):
Fix Released
Bug description:
Since the neutron full job has been enabled in non-voting mode a worrying number of lock timeout errors are appearing.
An analysis of 60 random failures revealed that this errors are responsible for 15 (25%) of failures of the full jobs.
Some examples here:
http://paste.openstack.org/show/68417/
http://paste.openstack.org/show/68413/
It is worth noting that offending queries are seldom the same, and
that the root cause apparently lies in the well-known eventlet/mysql
deadlock condition, which is exacerbated by the fact that the there
are now a consistent number of agents reporting to the neutron server.
This bug should be regarded as an "umbrella bug" whose main purpose is to track failure frequency with elastic recheck.
Feel free to submit new bugs to fix specific lock timeout issues; or use this bug report specifying "partial-bug" in the commit message.
A rough logstash query is here:
http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiKE9wZXJhdGlvbmFsRXJyb3IpICgxMjA1LCAnTG9jayB3YWl0IHRpbWVvdXQgZXhjZWVkZWQ7IHRyeSByZXN0YXJ0aW5nIHRyYW5zYWN0aW9uJylcIiBBTkQgTk9UIG1lc3NhZ2U6XCJUcmFjZWJhY2sgKG1vc3QgcmVjZW50IGNhbGwgbGFzdFwiIEFORCBidWlsZF9uYW1lOlwiY2hlY2stdGVtcGVzdC1kc3ZtLW5ldXRyb24tZnVsbFwiIEFORCBidWlsZF9icmFuY2g6XCJtYXN0ZXJcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiMTcyODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MzA5NDgyOTUzMiwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==
The query (as of now) reports 106 hits in 48 hours. In some tests the
failure happens multiple times; scoring by build_uuid reveals that
there are 25 failing builds, which is still a lot.
This bug need an elastic-recheck query
Here is a flow-chart describing the eventlet/mysql deadlock.
https://docs.google.com/drawings/d/13A2x4AWbf8zmzeGApUmYVlBrW8CMTPFTCBGSP_nTzDA
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1283522/+subscriptions
References