← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1971563] Re: evacuation failure causes POST_FAILURE in nova-ovs-hybrid-plug job

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/840446
Committed: https://opendev.org/openstack/nova/commit/a3a593ad55480d240d0dd17b0dcf1185ea1498b8
Submitter: "Zuul (22348)"
Branch:    master

commit a3a593ad55480d240d0dd17b0dcf1185ea1498b8
Author: Balazs Gibizer <gibi@xxxxxxxxxx>
Date:   Wed May 4 14:20:56 2022 +0200

    Enable live_migration_events in nova-ovs-hybrid-plug
    
    The flag was added to prevent the source host neutron agents to trigger
    a vif-plugged event too early during a live migration. But actually it
    can be used in a more generic sense as the code filter on migration_to
    binding profile attribute.
    
    We saw too early vif-plugged events from neutron during evacuation in
    post part of the nova-ovs-hybrid-plug job. So this patch enables the
    workaround flag for this job too.
    
    Closes-Bug: #1971563
    Change-Id: Ifd20ece3a4f126da16f077247c2f1e072edb7163


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1971563

Title:
  evacuation failure causes POST_FAILURE in nova-ovs-hybrid-plug job

Status in neutron:
  Confirmed
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  It is similar on the surface to
  https://bugs.launchpad.net/nova/+bug/1970642 but in that cinder
  returned http 500 but here neutron vif plug timed out.

  This is the instance that failed: https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/job-output.txt#57033
  And this is the vif timeout: https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/controller/logs/screen-n-cpu.txt#15986

  May 03 16:53:41.879039 ubuntu-focal-ovh-bhs1-0029531414 nova-
  compute[96742]: WARNING nova.compute.manager [None
  req-6fcb7ce5-c9d7-4f0c-8547-c71bd0f09496 demo admin] [instance:
  3a81145d-d263-4e1d-8ec3-faf38fed34f2] Timeout waiting for ['network-
  vif-plugged-b6dc2b79-ed38-4907-86e2-bdff1c5a9b9f'] for instance with
  vm_state error and task_state rebuild_spawning. Event states are:
  network-vif-plugged-b6dc2b79-ed38-4907-86e2-bdff1c5a9b9f: timed out
  after 300.00 seconds: eventlet.timeout.Timeout: 300 seconds

  I see an ACTIVE -> ACTIVE port status transition in the neutron log
  https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/controller/logs/screen-q-svc.txt#17035 that seems to fit the timing when the plug happened based on 
  https://zuul.opendev.org/t/openstack/build/518f8641b9a7419391b0f99f795f26bd/log/controller/logs/screen-n-cpu.txt#15700

  So I think neutron failed to sent the vif plug.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1971563/+subscriptions



References