← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1501735] [NEW] Network interface allocation corrupts instance info cache

 

Public bug reported:

Allocation of network interfaces for an instance can result in
corruption of the instance info cache in Nova. The result is that the
cache may contain duplicate entries for network interfaces. This can
cause failure to boot nodes, as seen with the Libvirt driver.

Seen on Ubuntu / devstack / commit
b0013d93ffeaed53bc28d9558def26bdb7041ed7.

The issue can be reproduced using an instance with a large number of
interfaces, for example using the heat stack in the attached YAML file
heat-stack-many-interfaces.yaml. For improved reproducibility, add a
short sleep in nova.network.neutronv2.api.API.allocate_for_instance,
just before the call to self.get_instance_nw_info.

This issue was found by SecurityFun23 when testing the fix for bug
#1467581.

The problem appears to be that in
nova.network.neutronv2.api.API.allocate_for_instance, after the Neutron
API calls to create/update ports, but before the instance info cache is
updated in get_instance_nw_info, it is possible for another request to
refresh the instance info cache. This will cause the new/updated ports
to be added to the cache as they are discovered in Neutron. Then, the
original request resumes, and unconditionally adds the new interfaces to
the cache. This results in duplicate entries. The most likely candidate
for another request is probably Neutron network-change notifications,
which are triggered by the port update/create operation. The allocation
of multiple interfaces is more likely to make the problem to occur, as
Neutron API requests are made serially for each of the ports, allowing
time for the notifications to arrive.

The perceived problem in a more visual form:

Request:
- Allocate interfaces for an instance (nova.network.neutronv2.api.API.allocate_for_instance)
- n x Neutron API port create/updates
------------------------------
Notification:
- External event notification from Neutron - network-changed (nova.compute.manager.ComputeManager.external_instance_event)
- Refresh instance network cache (network_api.get_instance_nw_info)
- Query ports for device in Neutron
- Add new ports to instance info cache
-------------------------------
Request:
- Refresh instance network cache with new interfaces (get_instance_nw_info)
- Unconditionally add duplicate interfaces to cache.

** Affects: nova
     Importance: Undecided
     Assignee: Mark Goddard (mgoddard)
         Status: New

** Attachment added: "Heat stack with many network interfaces"
   https://bugs.launchpad.net/bugs/1501735/+attachment/4480839/+files/heat-stack-many-interfaces.yaml

** Changed in: nova
     Assignee: (unassigned) => Mark Goddard (mgoddard)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1501735

Title:
  Network interface allocation corrupts instance info cache

Status in OpenStack Compute (nova):
  New

Bug description:
  Allocation of network interfaces for an instance can result in
  corruption of the instance info cache in Nova. The result is that the
  cache may contain duplicate entries for network interfaces. This can
  cause failure to boot nodes, as seen with the Libvirt driver.

  Seen on Ubuntu / devstack / commit
  b0013d93ffeaed53bc28d9558def26bdb7041ed7.

  The issue can be reproduced using an instance with a large number of
  interfaces, for example using the heat stack in the attached YAML file
  heat-stack-many-interfaces.yaml. For improved reproducibility, add a
  short sleep in nova.network.neutronv2.api.API.allocate_for_instance,
  just before the call to self.get_instance_nw_info.

  This issue was found by SecurityFun23 when testing the fix for bug
  #1467581.

  The problem appears to be that in
  nova.network.neutronv2.api.API.allocate_for_instance, after the
  Neutron API calls to create/update ports, but before the instance info
  cache is  updated in get_instance_nw_info, it is possible for another
  request to refresh the instance info cache. This will cause the
  new/updated ports to be added to the cache as they are discovered in
  Neutron. Then, the original request resumes, and unconditionally adds
  the new interfaces to the cache. This results in duplicate entries.
  The most likely candidate for another request is probably Neutron
  network-change notifications, which are triggered by the port
  update/create operation. The allocation of multiple interfaces is more
  likely to make the problem to occur, as Neutron API requests are made
  serially for each of the ports, allowing time for the notifications to
  arrive.

  The perceived problem in a more visual form:

  Request:
  - Allocate interfaces for an instance (nova.network.neutronv2.api.API.allocate_for_instance)
  - n x Neutron API port create/updates
  ------------------------------
  Notification:
  - External event notification from Neutron - network-changed (nova.compute.manager.ComputeManager.external_instance_event)
  - Refresh instance network cache (network_api.get_instance_nw_info)
  - Query ports for device in Neutron
  - Add new ports to instance info cache
  -------------------------------
  Request:
  - Refresh instance network cache with new interfaces (get_instance_nw_info)
  - Unconditionally add duplicate interfaces to cache.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1501735/+subscriptions


Follow ups