← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1733861] Re: VIFs not always detached from ironic nodes during termination

 

** Changed in: nova
       Status: In Progress => Invalid

** Changed in: nova
       Status: Invalid => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1733861

Title:
  VIFs not always detached from ironic nodes during termination

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========

  Sometimes when a baremetal instance is terminated, some VIFs are not
  detached from the node. This can lead to the node becoming unusable,
  with subsequent attempts to provision it fail during VIF attachment
  due to there being insufficient free ironic ports to attach the VIF
  to.

  Steps to reproduce
  ==================

  No reproduction procedure identified as yet, but will be something
  like:

  * boot one baremetal instance
  * do something to trigger the bug
  * delete the instance
  * boot a second instance on the same ironic node

  Expected results
  ================

  The second instance should boot successfully.

  Actual results
  ==============

  The second instance fails to boot, and the following error message is
  emitted by nova-compute:

  VirtualInterfacePlugException: Cannot attach VIF
  409830a5-b4de-4d1d-be22-5e6fe4ccd65b to the node
  3aaaf79e-99fb-42a3-b22e-b1a7fae44272 due to error: Unable to attach
  VIF 409830a5-b4de-4d1d-be22-5e6fe4ccd65b, not enough free physical
  ports. (HTTP 400)

  The neutron port has been deleted:

  $ openstack port show 7e567468-53a2-4fad-8bc9-a30a0e7218a0
  ResourceNotFound: No Port found for 7e567468-53a2-4fad-8bc9-a30a0e7218a0

  The ironic node's VIF is still attached:

  $ openstack baremetal node vif list <node>
  +--------------------------------------+
  | ID                                   |
  +--------------------------------------+
  | 7e567468-53a2-4fad-8bc9-a30a0e7218a0 |
  +--------------------------------------+

  Workaround
  ==========

  The VIF can be manually detached via ironic:

  $ openstack baremetal node vif detach <node>
  7e567468-53a2-4fad-8bc9-a30a0e7218a0

  This allows instances to be deployed on the node.

  Environment
  ===========

  RDO Pike, deployed on CentOS 7 using kayobe & kolla-ansible.

  openstack-nova-api-16.0.0-1.el7.noarch

  Notes
  =====

  I've seen this happen on a number of occasions, and have spent some
  time investigating a few of them. Although they all have similarities,
  no two have been the same, so far as I can tell.

  Some things I've worked out along the way:

  * the VIF detach code in ironic is very simple, and just removes the
  tenant_vif_port_id field from the internal_info attribute of the
  ironic port to which the VIF is attached. This leads me to believe
  that nova is *not* calling this API during instance termination.

  * the nova ironic virt driver's terminate method always ends up
  calling _unplug_vifs, so either terminate has not been called, it has
  not completed successfully, or the VIF was not present in the provided
  network_info object. So far my investigations have suggested the
  latter - network_info does not contain the VIF.

  * there seems to be some level of raciness when deleting instances and
  their ports (VIFs) at similar times. The neutron vif unplugged event
  may not always call detach_interface[1] on the virt driver, but will
  remove the port from the instance info cache. This would cause the VIF
  to be absent from network_info during terminate.

  Given that there seem to be multiple causes for this issue, one way to
  avoid the node becoming unusable would be to query the attached VIFs
  from ironic, as well as those in network_info when terminating an
  instance. Any unexpected VIFs could then be detached.

  References
  ==========

  [1]
  https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1481

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1733861/+subscriptions



References