yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80358
[Bug 1847999] [NEW] vmware virt driver's report of VCPU can be inaccurate in some cases
Public bug reported:
caveat lector: This is a placeholder bug to record an issue with the
vmware virtdriver so that if a reasonable solution is determined it can
be contributed upstream. The challenge is that no solution is going to
be perfect so it may be easier to just leave things as they are, but I
wanted to get this in place to remember it. If a patch does happen, I'll
be doing it.
In the downstream version of the vmware driver more features are
exposed, based on various settings made on the individual esxi hosts and
the vcenter cluster manager. Some of these features consume available
resources (cpu, disk, memory) that needs to be accounted as overhead,
per esxi host. However, because the vmware driver has chosen to expose
the vcenter cluster as the unit of hypervisor, per esxi host differences
are difficult to manage in nova and placement. In some cases
compensation can be done by tweaking max_unit of a resource class (see
update_provider_tree in nova/virt/vmwareapi/driver.py for existing
examples) to have a value of the maximum available slice on any host (or
datastore) and regularly updating this (in the periodic job or after a
workload lands).
For VCPU resources there is a mismatch between how the esxi host reports
overhead and how nova and placement think of it. vmware talks Hz, nova
and placement in whole CPUs. For some NFV-related features, reserving a
"core" for network management (things which help a workload but are not
the workload itself) will lower the value of available Hz, but not
impact 'summary.hardware.numCpuThread', the attribute currently used to
calculate total and max_unit for the VCPU resource class.
A more accurate picture of available resources can be created by doing
some math across several hardware summary attributes: numCpuThreads,
cpuMhz, and numCpuCores. Probably with some "what features are turned
on" magic for extra accuracy.
The correct math is being researched, I'll hang it on this bug when it
is figured out.
** Affects: nova
Importance: Low
Assignee: Chris Dent (cdent)
Status: Triaged
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => Low
** Changed in: nova
Assignee: (unassigned) => Chris Dent (cdent)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1847999
Title:
vmware virt driver's report of VCPU can be inaccurate in some cases
Status in OpenStack Compute (nova):
Triaged
Bug description:
caveat lector: This is a placeholder bug to record an issue with the
vmware virtdriver so that if a reasonable solution is determined it
can be contributed upstream. The challenge is that no solution is
going to be perfect so it may be easier to just leave things as they
are, but I wanted to get this in place to remember it. If a patch does
happen, I'll be doing it.
In the downstream version of the vmware driver more features are
exposed, based on various settings made on the individual esxi hosts
and the vcenter cluster manager. Some of these features consume
available resources (cpu, disk, memory) that needs to be accounted as
overhead, per esxi host. However, because the vmware driver has chosen
to expose the vcenter cluster as the unit of hypervisor, per esxi host
differences are difficult to manage in nova and placement. In some
cases compensation can be done by tweaking max_unit of a resource
class (see update_provider_tree in nova/virt/vmwareapi/driver.py for
existing examples) to have a value of the maximum available slice on
any host (or datastore) and regularly updating this (in the periodic
job or after a workload lands).
For VCPU resources there is a mismatch between how the esxi host
reports overhead and how nova and placement think of it. vmware talks
Hz, nova and placement in whole CPUs. For some NFV-related features,
reserving a "core" for network management (things which help a
workload but are not the workload itself) will lower the value of
available Hz, but not impact 'summary.hardware.numCpuThread', the
attribute currently used to calculate total and max_unit for the VCPU
resource class.
A more accurate picture of available resources can be created by doing
some math across several hardware summary attributes: numCpuThreads,
cpuMhz, and numCpuCores. Probably with some "what features are turned
on" magic for extra accuracy.
The correct math is being researched, I'll hang it on this bug when it
is figured out.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1847999/+subscriptions