yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #39715
[Bug 1503453] [NEW] unavailable ironic nodes being scheduled to
Public bug reported:
When the compute resource tracker checks nodes, the ironic driver checks
the node against a list of states that it should return resources for.
This is to prevent nodes in various ironic states, like our cleaning
process, that are not available from being scheduled to by nova.
The logic around this check (
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L334-L351
) looks for existing instances on the node, and if they aren't found it
then looks at the conditions for returning the node as unavailable.
The problem is when you have an orphaned instance on your node, one
which ironic sees as present but nova does not (usually nova lists it as
having been deleted).
The instance detection will return true, causing the memory_mb_used and
memory_mb values to be set to the retrieved value from
instance_info['memory_mb'].
The check for _node_resources_unavailable will not run as it is an elif.
This means that even if this node is in maintenance state, we won't
notice and return all zeros for resources as we normally would.
Once the resource tracker calls _update_usage_from_instance, it will not
find an instance associated with the node from nova's point of view and
will return all of the memory as available instead, causing builds to be
scheduled to this node.
Ironic will then fail the build attempt due to it showing an instance
already associated with the node.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1503453
Title:
unavailable ironic nodes being scheduled to
Status in OpenStack Compute (nova):
New
Bug description:
When the compute resource tracker checks nodes, the ironic driver
checks the node against a list of states that it should return
resources for. This is to prevent nodes in various ironic states, like
our cleaning process, that are not available from being scheduled to
by nova.
The logic around this check (
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L334-L351
) looks for existing instances on the node, and if they aren't found
it then looks at the conditions for returning the node as unavailable.
The problem is when you have an orphaned instance on your node, one
which ironic sees as present but nova does not (usually nova lists it
as having been deleted).
The instance detection will return true, causing the memory_mb_used
and memory_mb values to be set to the retrieved value from
instance_info['memory_mb'].
The check for _node_resources_unavailable will not run as it is an
elif. This means that even if this node is in maintenance state, we
won't notice and return all zeros for resources as we normally would.
Once the resource tracker calls _update_usage_from_instance, it will
not find an instance associated with the node from nova's point of
view and will return all of the memory as available instead, causing
builds to be scheduled to this node.
Ironic will then fail the build attempt due to it showing an instance
already associated with the node.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1503453/+subscriptions
Follow ups