← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1963899] Re: Rebuild server - notification about port UP isn't sent to nova properly

 

Reviewed:  https://review.opendev.org/c/openstack/neutron/+/834852
Committed: https://opendev.org/openstack/neutron/commit/e7edcec2608fe1cb30d7458a882eb81e058eb76e
Submitter: "Zuul (22348)"
Branch:    master

commit e7edcec2608fe1cb30d7458a882eb81e058eb76e
Author: Slawek Kaplonski <skaplons@xxxxxxxxxx>
Date:   Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP
    
    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.
    
    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.
    
    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1963899

Title:
  Rebuild server - notification about port UP isn't sent to nova
  properly

Status in neutron:
  Fix Released

Bug description:
  It happened at least in the gate:
  https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_275/819147/3/check/neutron-
  ovs-tempest-dvr-ha-multinode-full/275f51f/testr_results.html - test
  failed due to instance rebuild timeout.

  After some investigation it seems that there was some race condition
  there and port which belongs to the vm wasn't set to DOWN and then UP
  during the rebuild. As it was UP all the time, neutron didn't send
  notification to nova compute and there was timeout while waiting for
  the notification from neutron in nova. Details:

  Rebuild started around 20:11:04

  Feb 25 20:11:04.155853 ubuntu-focal-rax-ord-0028617419 nova-compute[45640]: INFO nova.virt.libvirt.driver [-] [instance: f11fa638-af9d-44a4-88d5-a5a9211a67ce] Instance destroyed successfully.
  Feb 25 20:11:04.157143 ubuntu-focal-rax-ord-0028617419 nova-compute[45640]: DEBUG nova.virt.libvirt.vif [None req-39913996-9bcb-40fe-96c4-46805d82a11c tempest-ServerDiskConfigTestJSON-565705927 tempest-ServerDiskConfigTestJSON-565705927-project] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=True,availability_zone='nova',cell_name=None,cleaned=False,config_drive='',created_at=2022-02-25T20:10:47Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=<?>,disable_terminate=False,display_description='tempest-ServerDiskConfigTestJSON-s
  erver-1146230707',display_name='tempest-ServerDiskConfigTestJSON-server-1146230707',ec2_ids=<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(11),hidden=False,host='ubuntu-focal-rax-ord-0028617419',hostname='tempest-serverdiskconfigtestjson-server-1146230707',id=87,image_ref='ec702e94-7def-4279-8cf5-1b874dc76a6d',info_cache=InstanceInfoCache,instance_type_id=11,kernel_id='',key_data=None,key_name=None,keypairs=<?>,launch_index=0,launched_at=2022-02-25T20:11:01Z,launched_on='ubuntu-focal-rax-ord-0028617419',locked=False,locked_by=None,memory_mb=128,metadata={},migration_context=None,new_flavor=None,node='ubuntu-focal-rax-ord-0028617419',numa_topology=None,old_flavor=None,os_type=None,pci_devices=PciDeviceList,pci_requests=InstancePCIRequests,power_state=1,progress=0,project_id='a75e3dd9c1444a11bfa3db1be433820c',ramdisk_id='',reservation_id='r-ak002cpg',resources=None,root_device_name='/dev/vda',root_gb=1,security_groups=SecurityGroupList,services=<?>,shutdown_terminate=False,system_metadata={boot_roles='member,reader',image_base_image_ref='ec702e94-7def-4279-8cf5-1b874dc76a6d',image_container_format='bare',image_disk_format='qcow2',image_hw_rng_model='virtio',image_min_disk='1',image_min_ram='0',image_owner_specified.openstack.md5='',image_owner_specified.openstack.object='images/cirros-0.5.2-x86_64-disk',image_owner_specified.openstack.sha256='',owner_project_name='tempest-ServerDiskConfigTestJSON-565705927',owner_user_name='tempest-ServerDiskConfigTestJSON-565705927-project'},tags=<?>,task_state='rebuilding',terminated_at=None,trusted_certs=None,updated_at=2022-02-25T20:11:03Z,user_data=None,user_id='a99989e1c306447a80393de985672536',uuid=f11fa638-af9d-44a4-88d5-a5a9211a67ce,vcpu_model=<?>,vcpus=1,vm_mode=None,vm_state='active') vif={"id": "7017c67f-ba63-4787-80d5-18db52567ef2", "address": "fa:16:3e:66:b3:80", "network": {"id": "1eff860a-a017-4839-a4d2-8891b4327cde", "bridge": "br-int", "label": "tempest-ServerDiskConfigTestJSON-1413670902-network", "subnets": [{"cidr": "10.1.0.0/28", "dns": [], "gateway": {"address": "10.1.0.1", "type": "gateway", "version": 4, "meta": {}}, "ips": [{"address": "10.1.0.11", "type": "fixed", "version": 4, "meta": {}, "floating_ips": []}], "routes": [], "version": 4, "meta": {}}], "meta": {"injected": false, "tenant_id": "a75e3dd9c1444a11bfa3db1be433820c", "mtu": 1380, "physical_network": null, "tunneled": true}}, "type": "ovs", "details": {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int", "bound_drivers": {"0": "openvswitch"}}, "devname": "tap7017c67f-ba", "ovs_interfaceid": "7017c67f-ba63-4787-80d5-18db52567ef2", "qbh_params": null, "qbg_params": null, "active": false, "vnic_type": "normal", "profile": {}, "preserve_on_delete": false, "delegate_create": true, "meta": {}} {{(pid=45640) unplug /opt/stack/nova/nova/virt/libvirt/vif.py:828}}

  
  On the OVS agent's side:

  Feb 25 20:11:04.184822 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["11f71fd0-900a-4350-b2b6-720d41775059","delete","tap7017c67f-ba",-1,["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/neutron/agent/common/async_pro
  cess.py:264}}
  Feb 25 20:11:04.756254 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["07c1170a-686e-4e30-9b9e-60fd64c21b2c","insert","tap7017c67f-ba",["set",[]],["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/neutron/agent/common/a
  sync_process.py:264}}
  Feb 25 20:11:04.761952 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["07c1170a-686e-4e30-9b9e-60fd64c21b2c","old",null,["set",[]],null],["","new","tap7017c67f-ba",-1,["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/n
  eutron/agent/common/async_process.py:264}}
  Feb 25 20:11:04.829126 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["07c1170a-686e-4e30-9b9e-60fd64c21b2c","old",null,-1,null],["","new","tap7017c67f-ba",169,["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/neutron/
  agent/common/async_process.py:264}}

  
  And there was only report that device is UP:

  Feb 25 20:11:05.796177 ubuntu-focal-rax-ord-0028617419 neutron-
  openvswitch-agent[43371]: DEBUG
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None
  req-7c5a8fe9-a203-4f1d-a655-18381910397e None None] Setting status for
  7017c67f-ba63-4787-80d5-18db52567ef2 to UP {{(pid=43371) _bind_devices
  /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1251}}

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1963899/+subscriptions



References