yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #92368
[Bug 1733861] Re: VIFs not always detached from ironic nodes during termination
** Changed in: nova
Status: In Progress => Invalid
** Changed in: nova
Status: Invalid => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1733861
Title:
VIFs not always detached from ironic nodes during termination
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Description
===========
Sometimes when a baremetal instance is terminated, some VIFs are not
detached from the node. This can lead to the node becoming unusable,
with subsequent attempts to provision it fail during VIF attachment
due to there being insufficient free ironic ports to attach the VIF
to.
Steps to reproduce
==================
No reproduction procedure identified as yet, but will be something
like:
* boot one baremetal instance
* do something to trigger the bug
* delete the instance
* boot a second instance on the same ironic node
Expected results
================
The second instance should boot successfully.
Actual results
==============
The second instance fails to boot, and the following error message is
emitted by nova-compute:
VirtualInterfacePlugException: Cannot attach VIF
409830a5-b4de-4d1d-be22-5e6fe4ccd65b to the node
3aaaf79e-99fb-42a3-b22e-b1a7fae44272 due to error: Unable to attach
VIF 409830a5-b4de-4d1d-be22-5e6fe4ccd65b, not enough free physical
ports. (HTTP 400)
The neutron port has been deleted:
$ openstack port show 7e567468-53a2-4fad-8bc9-a30a0e7218a0
ResourceNotFound: No Port found for 7e567468-53a2-4fad-8bc9-a30a0e7218a0
The ironic node's VIF is still attached:
$ openstack baremetal node vif list <node>
+--------------------------------------+
| ID |
+--------------------------------------+
| 7e567468-53a2-4fad-8bc9-a30a0e7218a0 |
+--------------------------------------+
Workaround
==========
The VIF can be manually detached via ironic:
$ openstack baremetal node vif detach <node>
7e567468-53a2-4fad-8bc9-a30a0e7218a0
This allows instances to be deployed on the node.
Environment
===========
RDO Pike, deployed on CentOS 7 using kayobe & kolla-ansible.
openstack-nova-api-16.0.0-1.el7.noarch
Notes
=====
I've seen this happen on a number of occasions, and have spent some
time investigating a few of them. Although they all have similarities,
no two have been the same, so far as I can tell.
Some things I've worked out along the way:
* the VIF detach code in ironic is very simple, and just removes the
tenant_vif_port_id field from the internal_info attribute of the
ironic port to which the VIF is attached. This leads me to believe
that nova is *not* calling this API during instance termination.
* the nova ironic virt driver's terminate method always ends up
calling _unplug_vifs, so either terminate has not been called, it has
not completed successfully, or the VIF was not present in the provided
network_info object. So far my investigations have suggested the
latter - network_info does not contain the VIF.
* there seems to be some level of raciness when deleting instances and
their ports (VIFs) at similar times. The neutron vif unplugged event
may not always call detach_interface[1] on the virt driver, but will
remove the port from the instance info cache. This would cause the VIF
to be absent from network_info during terminate.
Given that there seem to be multiple causes for this issue, one way to
avoid the node becoming unusable would be to query the attached VIFs
from ironic, as well as those in network_info when terminating an
instance. Any unexpected VIFs could then be detached.
References
==========
[1]
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1481
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1733861/+subscriptions
References