yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #90340
[Bug 1844929] Fix included in openstack/nova rocky-eol
This issue was fixed in the openstack/nova rocky-eol release.
** Changed in: nova/rocky
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844929
Title:
grenade jobs failing due to "Timed out waiting for response from cell"
in scheduler
Status in grenade:
Invalid
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) queens series:
Fix Released
Status in OpenStack Compute (nova) rocky series:
Fix Released
Status in OpenStack Compute (nova) stein series:
Fix Released
Status in OpenStack Compute (nova) train series:
Fix Released
Bug description:
Seen here:
https://zuul.opendev.org/t/openstack/build/d53346210978403f888b85b82b2fe0c7/log/logs/screen-
n-sch.txt.gz?severity=3#2368
Sep 22 00:50:54.174385 ubuntu-bionic-ovh-gra1-0011664420 nova-
scheduler[18043]: WARNING nova.context [None
req-1929039e-1517-4326-9700-738d4b570ba6 tempest-
AttachInterfacesUnderV243Test-2009753731 tempest-
AttachInterfacesUnderV243Test-2009753731] Timed out waiting for
response from cell 8acfb79b-2e40-4e1c-bc3d-d404dac6db90
Looks like something is causing timeouts reaching cell1 during grenade
runs. The only errors I see in the rabbit logs are these for the uwsgi
(API) servers:
=ERROR REPORT==== 22-Sep-2019::00:35:30 ===
closing AMQP connection <0.1511.0> (217.182.141.188:48492 ->
217.182.141.188:5672 -
uwsgi:19453:72e08501-61ca-4ade-865e-f0605979ed7d):
missed heartbeats from client, timeout: 60s
--
It looks like we don't have mysql logs in this grenade run, maybe we
need a fix like this somewhere for grenade:
https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f8489419a2217dc
logstash shows 1101 hits in the last 7 days, since Sept 17 actually:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Timed%20out%20waiting%20for%20response%20from%20cell%5C%22%20AND%20tags%3A%5C%22screen-
n-sch.txt%5C%22&from=7d
check and gate queues, all failures. It also appears to only show up
on fortnebula and OVH nodes, primarily fortnebula. I wonder if there
is a performing/timing issue if those nodes are slower and we aren't
waiting for something during the grenade upgrade before proceeding.
To manage notifications about this bug go to:
https://bugs.launchpad.net/grenade/+bug/1844929/+subscriptions
References