yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67042
[Bug 1713786] [NEW] Allocations are not managed properly in all evacuate scenarios
Public bug reported:
Evacuate has some gaps with respect to dealing with resource allocations
in Placement:
1. If the user specifies a host with evacuate, conductor bypasses the
scheduler and we don't create allocations on the destination host:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749
This could eventually lead to the claim failing on the destination
compute:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795
This is similar to bug 1712008 where forcing a host during live
migration bypasses the scheduler so allocations are not created in
placement on the destination node. Before Pike this would be handled via
the update_available_resource periodic task in the compute service which
would 'heal' the allocations for instances tracked on a given node, but
that is no longer happening once all computes are running Pike code due
to this change: https://review.openstack.org/#/c/491012/
2. If the user does not specify a host with evacuate, conductor will ask
the scheduler to pick a host, which will also create allocations for
that host via the scheduler. If the claim (or rebuild) fails on the
destination node, we don't cleanup the allocation on the destination
node even if the instance isn't spawned on it:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795
^ is pretty obvious that we should cleanup because the claim for
resources failed.
This generic exception handling is harder to know if we should cleanup
though since we'd need to know if the guest was spawned on it:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812
But since we don't set the instance.host/node to the current host/node
it won't be reported there anyway:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824
** Affects: nova
Importance: High
Status: Triaged
** Tags: compute evacuate placement
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1713786
Title:
Allocations are not managed properly in all evacuate scenarios
Status in OpenStack Compute (nova):
Triaged
Bug description:
Evacuate has some gaps with respect to dealing with resource
allocations in Placement:
1. If the user specifies a host with evacuate, conductor bypasses the
scheduler and we don't create allocations on the destination host:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/conductor/manager.py#L749
This could eventually lead to the claim failing on the destination
compute:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795
This is similar to bug 1712008 where forcing a host during live
migration bypasses the scheduler so allocations are not created in
placement on the destination node. Before Pike this would be handled
via the update_available_resource periodic task in the compute service
which would 'heal' the allocations for instances tracked on a given
node, but that is no longer happening once all computes are running
Pike code due to this change: https://review.openstack.org/#/c/491012/
2. If the user does not specify a host with evacuate, conductor will
ask the scheduler to pick a host, which will also create allocations
for that host via the scheduler. If the claim (or rebuild) fails on
the destination node, we don't cleanup the allocation on the
destination node even if the instance isn't spawned on it:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2795
^ is pretty obvious that we should cleanup because the claim for
resources failed.
This generic exception handling is harder to know if we should cleanup
though since we'd need to know if the guest was spawned on it:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2812
But since we don't set the instance.host/node to the current host/node
it won't be reported there anyway:
https://github.com/openstack/nova/blob/16.0.0.0rc2/nova/compute/manager.py#L2822-L2824
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1713786/+subscriptions
Follow ups