yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #74306
[Bug 1787298] [NEW] Numa object string representations patch
Public bug reported:
The nova scheduler includes a number of very complicated filters with
non-obvious failure modes (the NUMATopologyFilter in particular comes to mind).
It is possible to have a situation where a given instances fails to schedule,
and it is not immediately apparent as to what exactly caused the failure.
Accordingly, it is proposed that we allow for optional detailed messages about
precisely *why* a scheduler filter is failing.
Currently if we are logging at "info" level there is very little
information emitted if a scheduler operation fails. By comparison,
logging at "debug" level generates a large amount of logs, and most
of them aren't important.
If the nova scheduler fails to find a suitable compute node for an
instance, it is sometimes tricky to figure out what the problem was. For
simple filters (CPU/RAM/disk) the checks are fairly straightforward, but
for more complicated filters (PCI, IO ops, and especially NUMA-related
things like CPU pinning, huge pages, and hyperthreading) its difficult
to manually determine why things failed from looking at the logs. Even
with debug logging enabled, there are scenarios where the
NUMATopologyFilter filter can fail with no useful logs.
Partially implements blueprint improve-sched-logging
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1787298
Title:
Numa object string representations patch
Status in OpenStack Compute (nova):
New
Bug description:
The nova scheduler includes a number of very complicated filters with
non-obvious failure modes (the NUMATopologyFilter in particular comes to mind).
It is possible to have a situation where a given instances fails to schedule,
and it is not immediately apparent as to what exactly caused the failure.
Accordingly, it is proposed that we allow for optional detailed messages about
precisely *why* a scheduler filter is failing.
Currently if we are logging at "info" level there is very little
information emitted if a scheduler operation fails. By comparison,
logging at "debug" level generates a large amount of logs, and most
of them aren't important.
If the nova scheduler fails to find a suitable compute node for an
instance, it is sometimes tricky to figure out what the problem was.
For simple filters (CPU/RAM/disk) the checks are fairly
straightforward, but for more complicated filters (PCI, IO ops, and
especially NUMA-related things like CPU pinning, huge pages, and
hyperthreading) its difficult to manually determine why things failed
from looking at the logs. Even with debug logging enabled, there are
scenarios where the NUMATopologyFilter filter can fail with no useful
logs.
Partially implements blueprint improve-sched-logging
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1787298/+subscriptions