yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1792985] [NEW] strict NUMA memory allocation for 4K pages leads to OOM-killer

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Chris Friesen <chris.friesen@xxxxxxxxxxxxx>
Date: Mon, 17 Sep 2018 16:50:33 -0000
Reply-to: Bug 1792985 <1792985@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

We've seen a case on a resource-constrained compute node where booting
multiple instances passed, but led to the following error messages from
the host kernel:

[ 731.911731] Out of memory: Kill process 133047 (nova-api) score 4 or sacrifice child
[ 731.920377] Killed process 133047 (nova-api) total-vm:374456kB, anon-rss:144708kB, file-rss:1892kB, shmem-rss:0kB

The problem appears to be that currently with libvirt an instance which
does not specify a NUMA topology (which implies "shared" CPUs and the
default memory pagesize) is allowed to float across the whole compute
node.  As such, we do not know which host NUMA node its memory is going
to be allocated from, and therefore we don't know how much memory is
remaining on each host NUMA node.

If we have a similar instance which *is* limited to a particular NUMA
node (due to adding a PCI device for example, or in the future by
specifying dedicated CPUs) then that allocation will currently use
"strict" NUMA affinity.  This allocation can fail if there isn't enough
memory available on that NUMA node (due to being "stolen" by a floating
instance, for example).

I think this means that we cannot use "strict" affinity for the default
page size even when we do have a numa_topology since we can't have
accurate per-NUMA-node accounting due to the fact that we don't know
which NUMA node floating instances allocated their memory from.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: compute

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1792985

Title:
  strict NUMA memory allocation for 4K pages leads to OOM-killer

Status in OpenStack Compute (nova):
  New

Bug description:
  We've seen a case on a resource-constrained compute node where booting
  multiple instances passed, but led to the following error messages
  from the host kernel:

  [ 731.911731] Out of memory: Kill process 133047 (nova-api) score 4 or sacrifice child
  [ 731.920377] Killed process 133047 (nova-api) total-vm:374456kB, anon-rss:144708kB, file-rss:1892kB, shmem-rss:0kB

  The problem appears to be that currently with libvirt an instance
  which does not specify a NUMA topology (which implies "shared" CPUs
  and the default memory pagesize) is allowed to float across the whole
  compute node.  As such, we do not know which host NUMA node its memory
  is going to be allocated from, and therefore we don't know how much
  memory is remaining on each host NUMA node.

  If we have a similar instance which *is* limited to a particular NUMA
  node (due to adding a PCI device for example, or in the future by
  specifying dedicated CPUs) then that allocation will currently use
  "strict" NUMA affinity.  This allocation can fail if there isn't
  enough memory available on that NUMA node (due to being "stolen" by a
  floating instance, for example).

  I think this means that we cannot use "strict" affinity for the
  default page size even when we do have a numa_topology since we can't
  have accurate per-NUMA-node accounting due to the fact that we don't
  know which NUMA node floating instances allocated their memory from.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1792985/+subscriptions