← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1787298] [NEW] Numa object string representations patch

 

Public bug reported:

The nova scheduler includes a number of very complicated filters with
non-obvious failure modes (the NUMATopologyFilter in particular comes to mind).
It is possible to have a situation where a given instances fails to schedule,
and it is not immediately apparent as to what exactly caused the failure.
Accordingly, it is proposed that we allow for optional detailed messages about
precisely *why* a scheduler filter is failing.

Currently if we are logging at "info" level there is very little
information emitted if a scheduler operation fails. By comparison,
logging at "debug" level generates a large amount of logs, and most
of them aren't important. 

If the nova scheduler fails to find a suitable compute node for an
instance, it is sometimes tricky to figure out what the problem was. For
simple filters (CPU/RAM/disk) the checks are fairly straightforward, but
for more complicated filters (PCI, IO ops, and especially NUMA-related
things like CPU pinning, huge pages, and hyperthreading) its difficult
to manually determine why things failed from looking at the logs. Even
with debug logging enabled, there are scenarios where the
NUMATopologyFilter filter can fail with no useful logs.

Partially implements blueprint improve-sched-logging

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1787298

Title:
   Numa object string representations patch

Status in OpenStack Compute (nova):
  New

Bug description:
  The nova scheduler includes a number of very complicated filters with
  non-obvious failure modes (the NUMATopologyFilter in particular comes to mind).
  It is possible to have a situation where a given instances fails to schedule,
  and it is not immediately apparent as to what exactly caused the failure.
  Accordingly, it is proposed that we allow for optional detailed messages about
  precisely *why* a scheduler filter is failing.

  Currently if we are logging at "info" level there is very little
  information emitted if a scheduler operation fails. By comparison,
  logging at "debug" level generates a large amount of logs, and most
  of them aren't important. 

  If the nova scheduler fails to find a suitable compute node for an
  instance, it is sometimes tricky to figure out what the problem was.
  For simple filters (CPU/RAM/disk) the checks are fairly
  straightforward, but for more complicated filters (PCI, IO ops, and
  especially NUMA-related things like CPU pinning, huge pages, and
  hyperthreading) its difficult to manually determine why things failed
  from looking at the logs. Even with debug logging enabled, there are
  scenarios where the NUMATopologyFilter filter can fail with no useful
  logs.

  Partially implements blueprint improve-sched-logging

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1787298/+subscriptions