← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1494886] Re: Neutron DBDeadlocks a ridiculous amount in successful CI runs

 

** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1494886

Title:
  Neutron DBDeadlocks a ridiculous amount in successful CI runs

Status in neutron:
  Fix Released

Bug description:
  This came up in the -qa channel when trying to figure out why a
  neutron test failed and there is a big fat DBDeadlock in the q-svc
  logs:

  http://logs.openstack.org/18/220218/5/check/gate-tempest-dsvm-neutron-
  dvr/3899ebf/logs/screen-q-svc.txt.gz?level=ERROR#_2015-09-11_17_22_42_284

  We find that this shows up a ton in a 7 day check/gate run:

  http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2dldF9kbnNfbmFtZXNfZm9yX3BvcnRcIiBBTkQgbWVzc2FnZTpcIkRCRGVhZGxvY2tcIiBBTkQgbWVzc2FnZTpcImlwYXZhaWxhYmlsaXR5cmFuZ2VzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tcS1zdmMudHh0XCIgQU5EIChidWlsZF9xdWV1ZTpcImNoZWNrXCIgT1IgYnVpbGRfcXVldWU6XCJnYXRlXCIpIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDQxOTk2Mjk2ODcxfQ==

  498 hits in 7 days, check and gate.

  The interesting thing is that 85% of those are successful runs.

  Like this was a successful run where the DBDeadlock shows up:

  http://logs.openstack.org/20/195820/11/gate/gate-tempest-dsvm-neutron-
  full/35f6716/logs/screen-q-svc.txt.gz?level=TRACE

  This is a serviceability / QA issue for anyone trying to deploy
  neutron at scale - when things go back, how is an operator supposed to
  be able to cut through the noise in the logs to determine what's
  actually a real failure and what can be ignored?

  If these DBDeadlocks are just getting retried with a retry decorator,
  there should be a way to only trace when we fail and raise up the
  DBDeadlock error, we shouldn't be logging each time.  For example, if
  we DBDeadlock and retry and then it's OK, don't trace that first DB
  error.  If we retry like 5 times and eventually punt, then trace the
  error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1494886/+subscriptions


References