← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1283522] [NEW] DB lock timeout errors with parallel operations

 

Public bug reported:

Since the neutron full job has been enabled in non-voting mode a worrying number of lock timeout errors are appearing.
An analysis of 60 random failures revealed that this errors are responsible for 15 (25%) of failures of the full jobs.

Some examples here:
http://paste.openstack.org/show/68417/
http://paste.openstack.org/show/68413/

It is worth noting that offending queries are seldom the same, and that
the root cause apparently lies in the well-known eventlet/mysql deadlock
condition, which is exacerbated by the fact that the there are now a
consistent number of agents reporting to the neutron server.

This bug should be regarded as an "umbrella bug" whose main purpose is to track failure frequency with elastic recheck.
Feel free to submit new bugs to fix specific lock timeout issues; or use this bug report specifying "partial-bug" in the commit message.


A rough logstash query is here: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiKE9wZXJhdGlvbmFsRXJyb3IpICgxMjA1LCAnTG9jayB3YWl0IHRpbWVvdXQgZXhjZWVkZWQ7IHRyeSByZXN0YXJ0aW5nIHRyYW5zYWN0aW9uJylcIiBBTkQgTk9UIG1lc3NhZ2U6XCJUcmFjZWJhY2sgKG1vc3QgcmVjZW50IGNhbGwgbGFzdFwiIEFORCBidWlsZF9uYW1lOlwiY2hlY2stdGVtcGVzdC1kc3ZtLW5ldXRyb24tZnVsbFwiIEFORCBidWlsZF9icmFuY2g6XCJtYXN0ZXJcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiMTcyODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MzA5NDgyOTUzMiwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

The query (as of now) reports 106 hits in 48 hours. In some tests the
failure happens multiple times; scoring by build_uuid reveals that there
are 25 failing builds, which is still a lot.

This bug need an elastic-recheck query

** Affects: neutron
     Importance: High
     Assignee: Salvatore Orlando (salvatore-orlando)
         Status: New


** Tags: neutron-core

** Changed in: neutron
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1283522

Title:
  DB lock timeout errors with parallel operations

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  Since the neutron full job has been enabled in non-voting mode a worrying number of lock timeout errors are appearing.
  An analysis of 60 random failures revealed that this errors are responsible for 15 (25%) of failures of the full jobs.

  Some examples here:
  http://paste.openstack.org/show/68417/
  http://paste.openstack.org/show/68413/

  It is worth noting that offending queries are seldom the same, and
  that the root cause apparently lies in the well-known eventlet/mysql
  deadlock condition, which is exacerbated by the fact that the there
  are now a consistent number of agents reporting to the neutron server.

  This bug should be regarded as an "umbrella bug" whose main purpose is to track failure frequency with elastic recheck.
  Feel free to submit new bugs to fix specific lock timeout issues; or use this bug report specifying "partial-bug" in the commit message.

  
  A rough logstash query is here: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiKE9wZXJhdGlvbmFsRXJyb3IpICgxMjA1LCAnTG9jayB3YWl0IHRpbWVvdXQgZXhjZWVkZWQ7IHRyeSByZXN0YXJ0aW5nIHRyYW5zYWN0aW9uJylcIiBBTkQgTk9UIG1lc3NhZ2U6XCJUcmFjZWJhY2sgKG1vc3QgcmVjZW50IGNhbGwgbGFzdFwiIEFORCBidWlsZF9uYW1lOlwiY2hlY2stdGVtcGVzdC1kc3ZtLW5ldXRyb24tZnVsbFwiIEFORCBidWlsZF9icmFuY2g6XCJtYXN0ZXJcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiMTcyODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MzA5NDgyOTUzMiwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

  The query (as of now) reports 106 hits in 48 hours. In some tests the
  failure happens multiple times; scoring by build_uuid reveals that
  there are 25 failing builds, which is still a lot.

  This bug need an elastic-recheck query

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1283522/+subscriptions


Follow ups

References