yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #63371
[Bug 1683858] [NEW] Allocation records do not contain overhead information
Public bug reported:
Some virt drivers report additional overhead per instance for memory and
disk usage on a compute node. That is not reported in the allocations
records for a given instance on a resource provider (compute node),
however:
https://github.com/openstack/nova/blob/15.0.0/nova/scheduler/client/report.py#L157
It is used as part of the claim test on the compute when creating an
instance or moving an instance. For creating an instance, that's done
here:
https://github.com/openstack/nova/blob/15.0.0/nova/compute/resource_tracker.py#L144-L156
https://github.com/openstack/nova/blob/15.0.0/nova/compute/claims.py#L165
Where Claim.memory_mb is the instance.flavor.memory_mb + overhead:
https://github.com/openstack/nova/blob/15.0.0/nova/compute/claims.py#L106
So ultimately what we claim on the compute node is not what we report to
placement for allocations for that instance. This matters because when
the filter scheduler is asking placement for a list of resource
providers that can fit a given request memory_mb and disk_gb it relies
on the inventory for the compute node resource provider and the existing
usage (allocations) for that provider, and we aren't reporting the full
story to placement.
This could lead to placement telling the filter scheduler there is room
to place an instance on a given compute node when in fact that could
fail the claim once we get to the host, which would results in a retry
of the build on another host (which can be expensive).
Also, when we start having multi-cell support with a top-level conductor
that the computes can't reach, we won't have build retries anymore, so
you'd just fail the claim and the build would be done and the instance
would go to ERROR state. So it's critical that the placement service has
the proper information for making the correct decision on the first try.
** Affects: nova
Importance: High
Status: Triaged
** Tags: placement resource-tracker scheduler
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683858
Title:
Allocation records do not contain overhead information
Status in OpenStack Compute (nova):
Triaged
Bug description:
Some virt drivers report additional overhead per instance for memory
and disk usage on a compute node. That is not reported in the
allocations records for a given instance on a resource provider
(compute node), however:
https://github.com/openstack/nova/blob/15.0.0/nova/scheduler/client/report.py#L157
It is used as part of the claim test on the compute when creating an
instance or moving an instance. For creating an instance, that's done
here:
https://github.com/openstack/nova/blob/15.0.0/nova/compute/resource_tracker.py#L144-L156
https://github.com/openstack/nova/blob/15.0.0/nova/compute/claims.py#L165
Where Claim.memory_mb is the instance.flavor.memory_mb + overhead:
https://github.com/openstack/nova/blob/15.0.0/nova/compute/claims.py#L106
So ultimately what we claim on the compute node is not what we report
to placement for allocations for that instance. This matters because
when the filter scheduler is asking placement for a list of resource
providers that can fit a given request memory_mb and disk_gb it relies
on the inventory for the compute node resource provider and the
existing usage (allocations) for that provider, and we aren't
reporting the full story to placement.
This could lead to placement telling the filter scheduler there is
room to place an instance on a given compute node when in fact that
could fail the claim once we get to the host, which would results in a
retry of the build on another host (which can be expensive).
Also, when we start having multi-cell support with a top-level
conductor that the computes can't reach, we won't have build retries
anymore, so you'd just fail the claim and the build would be done and
the instance would go to ERROR state. So it's critical that the
placement service has the proper information for making the correct
decision on the first try.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1683858/+subscriptions
Follow ups