yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88807
[Bug 1971563] Re: evacuation failure causes POST_FAILURE in nova-ovs-hybrid-plug job
Reviewed: https://review.opendev.org/c/openstack/nova/+/840446
Committed: https://opendev.org/openstack/nova/commit/a3a593ad55480d240d0dd17b0dcf1185ea1498b8
Submitter: "Zuul (22348)"
Branch: master
commit a3a593ad55480d240d0dd17b0dcf1185ea1498b8
Author: Balazs Gibizer <gibi@xxxxxxxxxx>
Date: Wed May 4 14:20:56 2022 +0200
Enable live_migration_events in nova-ovs-hybrid-plug
The flag was added to prevent the source host neutron agents to trigger
a vif-plugged event too early during a live migration. But actually it
can be used in a more generic sense as the code filter on migration_to
binding profile attribute.
We saw too early vif-plugged events from neutron during evacuation in
post part of the nova-ovs-hybrid-plug job. So this patch enables the
workaround flag for this job too.
Closes-Bug: #1971563
Change-Id: Ifd20ece3a4f126da16f077247c2f1e072edb7163
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1971563
Title:
evacuation failure causes POST_FAILURE in nova-ovs-hybrid-plug job
Status in neutron:
Confirmed
Status in OpenStack Compute (nova):
Fix Released
Bug description:
It is similar on the surface to
https://bugs.launchpad.net/nova/+bug/1970642 but in that cinder
returned http 500 but here neutron vif plug timed out.
This is the instance that failed: https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/job-output.txt#57033
And this is the vif timeout: https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/controller/logs/screen-n-cpu.txt#15986
May 03 16:53:41.879039 ubuntu-focal-ovh-bhs1-0029531414 nova-
compute[96742]: WARNING nova.compute.manager [None
req-6fcb7ce5-c9d7-4f0c-8547-c71bd0f09496 demo admin] [instance:
3a81145d-d263-4e1d-8ec3-faf38fed34f2] Timeout waiting for ['network-
vif-plugged-b6dc2b79-ed38-4907-86e2-bdff1c5a9b9f'] for instance with
vm_state error and task_state rebuild_spawning. Event states are:
network-vif-plugged-b6dc2b79-ed38-4907-86e2-bdff1c5a9b9f: timed out
after 300.00 seconds: eventlet.timeout.Timeout: 300 seconds
I see an ACTIVE -> ACTIVE port status transition in the neutron log
https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/controller/logs/screen-q-svc.txt#17035 that seems to fit the timing when the plug happened based on
https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/controller/logs/screen-n-cpu.txt#15700
So I think neutron failed to sent the vif plug.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1971563/+subscriptions
References