yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #69382
[Bug 1733861] [NEW] VIFs not always detached from ironic nodes during termination
Public bug reported:
Description
===========
Sometimes when a baremetal instance is terminated, some VIFs are not
detached from the node. This can lead to the node becoming unusable,
with subsequent attempts to provision it fail during VIF attachment due
to there being insufficient free ironic ports to attach the VIF to.
Steps to reproduce
==================
No reproduction procedure identified as yet, but will be something like:
* boot one baremetal instance
* do something to trigger the bug
* delete the instance
* boot a second instance on the same ironic node
Expected results
================
The second instance should boot successfully.
Actual results
==============
The second instance fails to boot, and the following error message is
emitted by nova-compute:
VirtualInterfacePlugException: Cannot attach VIF 409830a5-b4de-4d1d-
be22-5e6fe4ccd65b to the node 3aaaf79e-99fb-42a3-b22e-b1a7fae44272 due
to error: Unable to attach VIF 409830a5-b4de-4d1d-be22-5e6fe4ccd65b, not
enough free physical ports. (HTTP 400)
The neutron port has been deleted:
$ openstack port show 7e567468-53a2-4fad-8bc9-a30a0e7218a0
ResourceNotFound: No Port found for 7e567468-53a2-4fad-8bc9-a30a0e7218a0
The ironic node's VIF is still attached:
$ openstack baremetal node vif list <node>
+--------------------------------------+
| ID |
+--------------------------------------+
| 7e567468-53a2-4fad-8bc9-a30a0e7218a0 |
+--------------------------------------+
Workaround
==========
The VIF can be manually detached via ironic:
$ openstack baremetal node vif detach <node> 7e567468-53a2-4fad-
8bc9-a30a0e7218a0
This allows instances to be deployed on the node.
Environment
===========
RDO Pike, deployed on CentOS 7 using kayobe & kolla-ansible.
openstack-nova-api-16.0.0-1.el7.noarch
Notes
=====
I've seen this happen on a number of occasions, and have spent some time
investigating a few of them. Although they all have similarities, no two
have been the same, so far as I can tell.
Some things I've worked out along the way:
* the VIF detach code in ironic is very simple, and just removes the
tenant_vif_port_id field from the internal_info attribute of the ironic
port to which the VIF is attached. This leads me to believe that nova is
*not* calling this API during instance termination.
* the nova ironic virt driver's terminate method always ends up calling
_unplug_vifs, so either terminate has not been called, it has not
completed successfully, or the VIF was not present in the provided
network_info object. So far my investigations have suggested the latter
- network_info does not contain the VIF.
* there seems to be some level of raciness when deleting instances and
their ports (VIFs) at similar times. The neutron vif unplugged event may
not always call detach_interface[1] on the virt driver, but will remove
the port from the instance info cache. This would cause the VIF to be
absent from network_info during terminate.
Given that there seem to be multiple causes for this issue, one way to
avoid the node becoming unusable would be to query the attached VIFs
from ironic, as well as those in network_info when terminating an
instance. Any unexpected VIFs could then be detached.
References
==========
[1]
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1481
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1733861
Title:
VIFs not always detached from ironic nodes during termination
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
Sometimes when a baremetal instance is terminated, some VIFs are not
detached from the node. This can lead to the node becoming unusable,
with subsequent attempts to provision it fail during VIF attachment
due to there being insufficient free ironic ports to attach the VIF
to.
Steps to reproduce
==================
No reproduction procedure identified as yet, but will be something
like:
* boot one baremetal instance
* do something to trigger the bug
* delete the instance
* boot a second instance on the same ironic node
Expected results
================
The second instance should boot successfully.
Actual results
==============
The second instance fails to boot, and the following error message is
emitted by nova-compute:
VirtualInterfacePlugException: Cannot attach VIF 409830a5-b4de-4d1d-
be22-5e6fe4ccd65b to the node 3aaaf79e-99fb-42a3-b22e-b1a7fae44272 due
to error: Unable to attach VIF 409830a5-b4de-4d1d-be22-5e6fe4ccd65b,
not enough free physical ports. (HTTP 400)
The neutron port has been deleted:
$ openstack port show 7e567468-53a2-4fad-8bc9-a30a0e7218a0
ResourceNotFound: No Port found for 7e567468-53a2-4fad-8bc9-a30a0e7218a0
The ironic node's VIF is still attached:
$ openstack baremetal node vif list <node>
+--------------------------------------+
| ID |
+--------------------------------------+
| 7e567468-53a2-4fad-8bc9-a30a0e7218a0 |
+--------------------------------------+
Workaround
==========
The VIF can be manually detached via ironic:
$ openstack baremetal node vif detach <node> 7e567468-53a2-4fad-
8bc9-a30a0e7218a0
This allows instances to be deployed on the node.
Environment
===========
RDO Pike, deployed on CentOS 7 using kayobe & kolla-ansible.
openstack-nova-api-16.0.0-1.el7.noarch
Notes
=====
I've seen this happen on a number of occasions, and have spent some
time investigating a few of them. Although they all have similarities,
no two have been the same, so far as I can tell.
Some things I've worked out along the way:
* the VIF detach code in ironic is very simple, and just removes the
tenant_vif_port_id field from the internal_info attribute of the
ironic port to which the VIF is attached. This leads me to believe
that nova is *not* calling this API during instance termination.
* the nova ironic virt driver's terminate method always ends up
calling _unplug_vifs, so either terminate has not been called, it has
not completed successfully, or the VIF was not present in the provided
network_info object. So far my investigations have suggested the
latter - network_info does not contain the VIF.
* there seems to be some level of raciness when deleting instances and
their ports (VIFs) at similar times. The neutron vif unplugged event
may not always call detach_interface[1] on the virt driver, but will
remove the port from the instance info cache. This would cause the VIF
to be absent from network_info during terminate.
Given that there seem to be multiple causes for this issue, one way to
avoid the node becoming unusable would be to query the attached VIFs
from ironic, as well as those in network_info when terminating an
instance. Any unexpected VIFs could then be detached.
References
==========
[1]
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1481
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1733861/+subscriptions
Follow ups