yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94309
[Bug 2019977] Re: Ironic VIF unplug in destroy may fail due to node locked by cleaning
form a design point of view it is invalid for ironic to manage value
attachment or neutron ports assorcated with a nova instance. we even
locked down cinder to require a service token due to a cve about a year
ago where we reinforced the fact that only nova should ever manage
volume attachmes for a nova instance.
at the time we discussed that ironic was violating novas expectionat of
what a virt driver is allowed to do.
at some point we need to fix ironic to not modify neuton prot or volumes
attach to a nova instance when its acting as a virt driver. its fine
when its in standarlon mode but we need to have different code paths
when its is not.
** Also affects: ironic
Importance: Undecided
Status: New
** Changed in: nova
Status: In Progress => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2019977
Title:
Ironic VIF unplug in destroy may fail due to node locked by cleaning
Status in Ironic:
New
Status in OpenStack Compute (nova):
Invalid
Bug description:
This bug describes a race condition during deletion of a bare metal
(Ironic) instance when automated cleaning is enabled in Ironic (which
is the default).
# Steps to reproduce
As a race condition, this one is not easy to reproduce, although it
has been seen in the wild on several occasions. The easiest way to
reproduce it is in a development environment (e.g. devstack), tweaking
some configuration and adding some sleeps to particular operations in
nova and ironic.
nova.conf configuration: Reduce retries for ironic API polling. Needs
to be long enough to allow operations to complete.
[ironic]
api_max_retries = 10
sleep #1: nova-compute waits for the node to move to cleaning
(nova/virt/ironic/driver.py):
try:
if node.provision_state in _UNPROVISION_STATES:
self._unprovision(instance, node)
else:
# NOTE(hshiina): if spawn() fails before ironic starts
# provisioning, instance information should be
# removed from ironic node.
self._remove_instance_info_from_node(node, instance)
finally:
##### HERE #######
time.sleep(20)
##### HERE #######
# NOTE(mgoddard): We don't need to remove instance info at this
# point since we will have already done it. The destroy will only
# succeed if this method returns without error, so we will end up
# removing the instance info eventually.
self._cleanup_deploy(node, instance, network_info,
remove_instance_info=False)
sleep #2: ironic conductor holds onto the node lock:
@task_manager.require_exclusive_lock
def do_node_clean(task, clean_steps=None, disable_ramdisk=False):
"""Internal RPC method to perform cleaning of a node.
:param task: a TaskManager instance with an exclusive lock on its node
:param clean_steps: For a manual clean, the list of clean steps to
perform. Is None For automated cleaning (default).
For more information, see the clean_steps parameter
of :func:`ConductorManager.do_node_clean`.
:param disable_ramdisk: Whether to skip booting ramdisk for cleaning.
"""
##### HERE #####
import time
time.sleep(120)
##### HERE #####
node = task.node
manual_clean = clean_steps is not None
Next, create, then destroy a bare metal instance.
# Expected results
The node is deleted
# Actual results
The node moves to an ERROR state.
Logs from nova-compute:
2023-05-17 14:30:30.493 7 ERROR ironicclient.common.http [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Error contacting Ir
onic server: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6: ironicclient.common.apic
lient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Exception during m
essage handling: ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is complete
d. (HTTP 409)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 72, in wrapped
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server context, exc, binary)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server raise self.value
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 63, in wrapped
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10693, in external_instance_event
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server event.tag)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10498, in _process_instance_vif_deleted_event
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.driver.detach_interface(context, instance, vif)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1630, in detach_interface
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.unplug_vifs(instance, [vif])
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1591, in unplug_vifs
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self._unplug_vifs(node, instance, network_info)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1558, in _unplug_vifs
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server port_id)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/client_wrapper.py", line 180, in call
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._multi_getattr(client, method)(*args, **kwargs)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 512, in vif_detach
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 408, in delete
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/base.py", line 292, in _delete
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server headers=headers)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 426, in raw_request
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._http_request(url, method, **kwargs)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 287, in wrapper
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return func(self, url, method, **kwargs)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 387, in _http_request
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server error_json.get('debuginfo'), method, url)
2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409)
# Analysis
This issue is similar to https://bugs.launchpad.net/nova/+bug/1815799,
where instance_uuid and instance_info were being cleaned up. Ironic
deletes VIFs during tear down since Rocky [1], so we should also be
safe to skip this in the case where the node has been deprovisioned.
Debug logs confirm that the VIF cleanup is a noop:
2023-05-17 14:24:11.791 7 DEBUG nova.virt.ironic.driver [req-e419f330-fe14-4c4b-8d5a-b7a40bb77651 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] VIF a5c1622b-eb13-43
d3-a46d-aa5781a85cdb isn't attached to Ironic node 3152ff2d-a0cb-4e7a-bcc7-81266106fef2 _unplug_vifs /var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py:1561
[1] https://review.opendev.org/q/I8d683d2d506c97535b5a8f9a5de4c070c7e887df
To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/2019977/+subscriptions
References