← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1892405] Re: Removing router interface causes router to stop routing between all

 

Verified on groovy and focal. The test case is successful.
Attached the verification logs.

** Tags removed: verification-needed-focal verification-needed-groovy
** Tags added: verification-done-focal verification-done-groovy

** Also affects: cloud-archive/train
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/stein
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1892405

Title:
  Removing router interface causes router to stop routing between all

Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu Cloud Archive stein series:
  New
Status in Ubuntu Cloud Archive train series:
  New
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Triaged
Status in neutron:
  In Progress
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  Fix Committed
Status in neutron source package in Groovy:
  Fix Committed
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  Stumbled upon an issue where removing a DVR HA router interface renders all other subnets connected to that router to stop routing. VMs can't reach the HA port (IP) of the router (ping).

  Worked around this by:
  openstack router set --disabled <ROUTER>
  openstack router set --enable <ROUTER>

  This has happened more than once in the current deployment
   - cloud:bionic-stein
   - neutron 2:14.0.4-0ubuntu1~cloud1

  [Test Case]
  1. Reproducing the issue

  1a. Deploy openstack using stsstack-bundles
      https://launchpad.net/stsstack-bundles

  1b. Run the test script lp1892405_reproducer from comment #10

      The script does the following (Detailed steps in comment #4)
      - Create 3 projects P1, P2, P3
      - Create a router and network in each project, say R1,R2,R3 and
        N1,N2,N3
      - Cross-connect networks by adding ports to router.
      - Launch VMs on N1, N2 (Ensure VMs are landed on 2 different compute
        nodes)
      - ping from VM1 -> VM2 should be successful
      - Detach leg from N1 -> N3
      - Check for any packet loss during ping from VM1 -> VM2

      The script output shows the ping output from VM1 -> VM2 and there will
      be packet loss

  2. Install the package with fixed code

  3. Confirm bug have been fixed

  3a. Cleanup of projects P1,P2,P3 and associated resources created in 1b
      Re-enable the hypervisor which is disabled as part of 1b script.
      Commands for the cleanup:
      openstack server list --all-projects -c ID -f value | xargs openstack server delete
      openstack router remove port P2-router to-n2
      openstack router remove port P1-router from-n2
      openstack router remove port P1-router from-n3
      for i in P1 P2 P3; do openstack subnet list --project $i -c ID -f value | xargs openstack router remove subnet $i-router; done
      for i in P1 P2 P3; do openstack router delete $i-router; done
      for i in P1 P2 P3; do openstack network list --project $i -c ID -f value | xargs openstack network delete; done
      openstack floating ip list -c ID -f value | xargs openstack floating ip delete
      for i in P1 P2 P3; do openstack project delete $i; done
      openstack compute service list --service nova-compute | grep disabled | awk '{print $6}' | xargs -I {} openstack compute service set --enable {} nova-compute

  
  3b. Re-run the script 1b

      The script output shows the ping output from VM1 -> VM2 and there
      should not be any packet loss

  [Where problems could occur]

  Upstream CI ran all the functional and tempest test cases that involves deletion of DVR port connected to router which should cover the scenarios involving the code change.
  Installation of new package will result in restart of neutron-openvswitch service and will take a few milliseconds to repopulate all the OVS flows.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1892405/+subscriptions


References