← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1494886] [NEW] Neutron DBDeadlocks a ridiculous amount in successful CI runs

 

Public bug reported:

This came up in the -qa channel when trying to figure out why a neutron
test failed and there is a big fat DBDeadlock in the q-svc logs:

http://logs.openstack.org/18/220218/5/check/gate-tempest-dsvm-neutron-
dvr/3899ebf/logs/screen-q-svc.txt.gz?level=ERROR#_2015-09-11_17_22_42_284

We find that this shows up a ton in a 7 day check/gate run:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2dldF9kbnNfbmFtZXNfZm9yX3BvcnRcIiBBTkQgbWVzc2FnZTpcIkRCRGVhZGxvY2tcIiBBTkQgbWVzc2FnZTpcImlwYXZhaWxhYmlsaXR5cmFuZ2VzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tcS1zdmMudHh0XCIgQU5EIChidWlsZF9xdWV1ZTpcImNoZWNrXCIgT1IgYnVpbGRfcXVldWU6XCJnYXRlXCIpIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDQxOTk2Mjk2ODcxfQ==

498 hits in 7 days, check and gate.

The interesting thing is that 85% of those are successful runs.

Like this was a successful run where the DBDeadlock shows up:

http://logs.openstack.org/20/195820/11/gate/gate-tempest-dsvm-neutron-
full/35f6716/logs/screen-q-svc.txt.gz?level=TRACE

This is a serviceability / QA issue for anyone trying to deploy neutron
at scale - when things go back, how is an operator supposed to be able
to cut through the noise in the logs to determine what's actually a real
failure and what can be ignored?

If these DBDeadlocks are just getting retried with a retry decorator,
there should be a way to only trace when we fail and raise up the
DBDeadlock error, we shouldn't be logging each time.  For example, if we
DBDeadlock and retry and then it's OK, don't trace that first DB error.
If we retry like 5 times and eventually punt, then trace the error.

** Affects: neutron
     Importance: Critical
     Assignee: Kevin Benton (kevinbenton)
         Status: Confirmed


** Tags: db

** Changed in: neutron
       Status: New => Confirmed

** Tags added: db

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1494886

Title:
  Neutron DBDeadlocks a ridiculous amount in successful CI runs

Status in neutron:
  Confirmed

Bug description:
  This came up in the -qa channel when trying to figure out why a
  neutron test failed and there is a big fat DBDeadlock in the q-svc
  logs:

  http://logs.openstack.org/18/220218/5/check/gate-tempest-dsvm-neutron-
  dvr/3899ebf/logs/screen-q-svc.txt.gz?level=ERROR#_2015-09-11_17_22_42_284

  We find that this shows up a ton in a 7 day check/gate run:

  http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2dldF9kbnNfbmFtZXNfZm9yX3BvcnRcIiBBTkQgbWVzc2FnZTpcIkRCRGVhZGxvY2tcIiBBTkQgbWVzc2FnZTpcImlwYXZhaWxhYmlsaXR5cmFuZ2VzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tcS1zdmMudHh0XCIgQU5EIChidWlsZF9xdWV1ZTpcImNoZWNrXCIgT1IgYnVpbGRfcXVldWU6XCJnYXRlXCIpIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDQxOTk2Mjk2ODcxfQ==

  498 hits in 7 days, check and gate.

  The interesting thing is that 85% of those are successful runs.

  Like this was a successful run where the DBDeadlock shows up:

  http://logs.openstack.org/20/195820/11/gate/gate-tempest-dsvm-neutron-
  full/35f6716/logs/screen-q-svc.txt.gz?level=TRACE

  This is a serviceability / QA issue for anyone trying to deploy
  neutron at scale - when things go back, how is an operator supposed to
  be able to cut through the noise in the logs to determine what's
  actually a real failure and what can be ignored?

  If these DBDeadlocks are just getting retried with a retry decorator,
  there should be a way to only trace when we fail and raise up the
  DBDeadlock error, we shouldn't be logging each time.  For example, if
  we DBDeadlock and retry and then it's OK, don't trace that first DB
  error.  If we retry like 5 times and eventually punt, then trace the
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1494886/+subscriptions


Follow ups