yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #38398
[Bug 1494886] [NEW] Neutron DBDeadlocks a ridiculous amount in successful CI runs
Public bug reported:
This came up in the -qa channel when trying to figure out why a neutron
test failed and there is a big fat DBDeadlock in the q-svc logs:
http://logs.openstack.org/18/220218/5/check/gate-tempest-dsvm-neutron-
dvr/3899ebf/logs/screen-q-svc.txt.gz?level=ERROR#_2015-09-11_17_22_42_284
We find that this shows up a ton in a 7 day check/gate run:
http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2dldF9kbnNfbmFtZXNfZm9yX3BvcnRcIiBBTkQgbWVzc2FnZTpcIkRCRGVhZGxvY2tcIiBBTkQgbWVzc2FnZTpcImlwYXZhaWxhYmlsaXR5cmFuZ2VzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tcS1zdmMudHh0XCIgQU5EIChidWlsZF9xdWV1ZTpcImNoZWNrXCIgT1IgYnVpbGRfcXVldWU6XCJnYXRlXCIpIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDQxOTk2Mjk2ODcxfQ==
498 hits in 7 days, check and gate.
The interesting thing is that 85% of those are successful runs.
Like this was a successful run where the DBDeadlock shows up:
http://logs.openstack.org/20/195820/11/gate/gate-tempest-dsvm-neutron-
full/35f6716/logs/screen-q-svc.txt.gz?level=TRACE
This is a serviceability / QA issue for anyone trying to deploy neutron
at scale - when things go back, how is an operator supposed to be able
to cut through the noise in the logs to determine what's actually a real
failure and what can be ignored?
If these DBDeadlocks are just getting retried with a retry decorator,
there should be a way to only trace when we fail and raise up the
DBDeadlock error, we shouldn't be logging each time. For example, if we
DBDeadlock and retry and then it's OK, don't trace that first DB error.
If we retry like 5 times and eventually punt, then trace the error.
** Affects: neutron
Importance: Critical
Assignee: Kevin Benton (kevinbenton)
Status: Confirmed
** Tags: db
** Changed in: neutron
Status: New => Confirmed
** Tags added: db
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1494886
Title:
Neutron DBDeadlocks a ridiculous amount in successful CI runs
Status in neutron:
Confirmed
Bug description:
This came up in the -qa channel when trying to figure out why a
neutron test failed and there is a big fat DBDeadlock in the q-svc
logs:
http://logs.openstack.org/18/220218/5/check/gate-tempest-dsvm-neutron-
dvr/3899ebf/logs/screen-q-svc.txt.gz?level=ERROR#_2015-09-11_17_22_42_284
We find that this shows up a ton in a 7 day check/gate run:
http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiX2dldF9kbnNfbmFtZXNfZm9yX3BvcnRcIiBBTkQgbWVzc2FnZTpcIkRCRGVhZGxvY2tcIiBBTkQgbWVzc2FnZTpcImlwYXZhaWxhYmlsaXR5cmFuZ2VzXCIgQU5EIHRhZ3M6XCJzY3JlZW4tcS1zdmMudHh0XCIgQU5EIChidWlsZF9xdWV1ZTpcImNoZWNrXCIgT1IgYnVpbGRfcXVldWU6XCJnYXRlXCIpIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDQxOTk2Mjk2ODcxfQ==
498 hits in 7 days, check and gate.
The interesting thing is that 85% of those are successful runs.
Like this was a successful run where the DBDeadlock shows up:
http://logs.openstack.org/20/195820/11/gate/gate-tempest-dsvm-neutron-
full/35f6716/logs/screen-q-svc.txt.gz?level=TRACE
This is a serviceability / QA issue for anyone trying to deploy
neutron at scale - when things go back, how is an operator supposed to
be able to cut through the noise in the logs to determine what's
actually a real failure and what can be ignored?
If these DBDeadlocks are just getting retried with a retry decorator,
there should be a way to only trace when we fail and raise up the
DBDeadlock error, we shouldn't be logging each time. For example, if
we DBDeadlock and retry and then it's OK, don't trace that first DB
error. If we retry like 5 times and eventually punt, then trace the
error.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1494886/+subscriptions
Follow ups