← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1975674] Re: [SRU] Neutron agent blocks during VM deletion when a remote security group is involved

 

** Also affects: neutron (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: cloud-archive
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/ussuri
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/victoria
   Importance: Undecided
       Status: New

** Changed in: cloud-archive
       Status: New => Invalid

** Changed in: cloud-archive/ussuri
       Status: New => Triaged

** Changed in: cloud-archive/victoria
       Status: New => Triaged

** Changed in: neutron (Ubuntu Focal)
       Status: New => Triaged

** Changed in: neutron (Ubuntu)
       Status: New => Invalid

** Changed in: neutron (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: cloud-archive/ussuri
   Importance: Undecided => Medium

** Changed in: cloud-archive/victoria
   Importance: Undecided => Medium

** Description changed:

  When deleting a VM that has a security group referring to a remote
  security group, the neutron agent will block for as long as it takes to
  remove the respective flows. This happens when the remote security group
  contains many (thousands) ports referring to other VMs.
  
  Steps to reproduce:
    - Create a VM with security group A
    - Add a rule to security group A allowing access from a remote security group B
    - Add a large number or ports to security group B (e.g. 2000)
      - The respective ovs flows will be added
    - Delete the VM
      - The ovs flows will be removed
  
  Expected:
    - VM and flow to be deleted within seconds
    - No impact to other VMs on the same hypervisor
  
  Actual:
    - Flow deletion takes a long time, sometimes up to 10 minutes
    - While flows are being deleted, no VMs can be created on the same hypervisor
  
  The reason for this behavior is that under the hood the agent calls ovs-
  ofctl (via execve()) once for each port in the remote security group.
  These calls quickly add up to minutes if there are many ports.
  
  The proposed solution would be to use deferred execution for the flow
  deletion. In that case it becomes a bulk operation and around 400 flows
  are deleted in one call. In addition it runs in the background and does
  not block the agent for other operations.
  
- 
  [Impact]
  Please see LP bug description for full details.
  
  [Test Plan]
  Please see the section 'Steps to reproduce in LP bug description.
  
  [Regression Potential]
- The fix[1] is already in the upstream stable/ussuri, here's just SRU into 16.4.2, so it's a clean backport.
+ This is fixed in ubuntu jammy and in cloud archive wallaby+ releases. The SRU will include fixes for usuri/victoria cloud archives and ubuntu focal. The fix[1] is already in the upstream stable branches.
  
  [1]
  https://opendev.org/openstack/neutron/commit/30ef996f8aa0b0bc57a280690871f1081946ffee

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1975674

Title:
  [SRU] Neutron agent blocks during VM deletion when a remote security
  group is involved

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive ussuri series:
  Triaged
Status in Ubuntu Cloud Archive victoria series:
  Triaged
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Invalid
Status in neutron source package in Focal:
  Triaged

Bug description:
  When deleting a VM that has a security group referring to a remote
  security group, the neutron agent will block for as long as it takes
  to remove the respective flows. This happens when the remote security
  group contains many (thousands) ports referring to other VMs.

  Steps to reproduce:
    - Create a VM with security group A
    - Add a rule to security group A allowing access from a remote security group B
    - Add a large number or ports to security group B (e.g. 2000)
      - The respective ovs flows will be added
    - Delete the VM
      - The ovs flows will be removed

  Expected:
    - VM and flow to be deleted within seconds
    - No impact to other VMs on the same hypervisor

  Actual:
    - Flow deletion takes a long time, sometimes up to 10 minutes
    - While flows are being deleted, no VMs can be created on the same hypervisor

  The reason for this behavior is that under the hood the agent calls
  ovs-ofctl (via execve()) once for each port in the remote security
  group. These calls quickly add up to minutes if there are many ports.

  The proposed solution would be to use deferred execution for the flow
  deletion. In that case it becomes a bulk operation and around 400
  flows are deleted in one call. In addition it runs in the background
  and does not block the agent for other operations.

  [Impact]
  Please see LP bug description for full details.

  [Test Plan]
  Please see the section 'Steps to reproduce in LP bug description.

  [Regression Potential]
  This is fixed in ubuntu jammy and in cloud archive wallaby+ releases. The SRU will include fixes for usuri/victoria cloud archives and ubuntu focal. The fix[1] is already in the upstream stable branches.

  [1]
  https://opendev.org/openstack/neutron/commit/30ef996f8aa0b0bc57a280690871f1081946ffee

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1975674/+subscriptions



References