yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80072
[Bug 1844721] [NEW] Need NUMA aware RAM reservation to avoid OOM killing host processes
Public bug reported:
Description:
===========
CPU pinning is widely used in VNFs. When VM CPU is pinned, currently
there is no way to reserve memory on NUMA 0 for host processes:
> ram_allocation_ratio is ignored by the nova scheduler when VM CPU is pinned
> reserved_host_memory_mb is a global reservation, as long as there is memory available globally (on any NUMA node) VM is scheduled.
This leads to many VMs are scheduled on NUMA 0 (CPU pinned to NUMA 0)
while their memory needs are met "globally".
When the system starts to take load, VMs' memory start to get allocated
on NUMA 0 (because their are pinned to NUMA 0) to the extend that memory
shortage occurs on NUMA 0 and OOM kicks in to kill host processes.
Many mitigation are "invented", but those mitigation all have some form
of technical or operational "difficulties". One mitigation, for example,
is to enable huge pages, and put VMs on huge pages.
The right solution is for nova to support NUMA aware RAM reservation as
for the huge pages case, i.e.
reserved_host_memory=node:0, 20G
Steps to reproduce
==================
Create CPU pinned VMs. VMs are crowded on NUMA 0, until no more CPU cores are available on NUMA 0 then they are scheduled on NUMA 1. Stress the system.
Expected result
===============
The system stays operational.
Actual result
=============
OOM kicks to kill host process due to lacking of memory on NUMA 0, while there are tons of memory on NUMA 1.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1844721
Title:
Need NUMA aware RAM reservation to avoid OOM killing host processes
Status in OpenStack Compute (nova):
New
Bug description:
Description:
===========
CPU pinning is widely used in VNFs. When VM CPU is pinned, currently
there is no way to reserve memory on NUMA 0 for host processes:
> ram_allocation_ratio is ignored by the nova scheduler when VM CPU is pinned
> reserved_host_memory_mb is a global reservation, as long as there is memory available globally (on any NUMA node) VM is scheduled.
This leads to many VMs are scheduled on NUMA 0 (CPU pinned to NUMA 0)
while their memory needs are met "globally".
When the system starts to take load, VMs' memory start to get
allocated on NUMA 0 (because their are pinned to NUMA 0) to the extend
that memory shortage occurs on NUMA 0 and OOM kicks in to kill host
processes.
Many mitigation are "invented", but those mitigation all have some
form of technical or operational "difficulties". One mitigation, for
example, is to enable huge pages, and put VMs on huge pages.
The right solution is for nova to support NUMA aware RAM reservation
as for the huge pages case, i.e.
reserved_host_memory=node:0, 20G
Steps to reproduce
==================
Create CPU pinned VMs. VMs are crowded on NUMA 0, until no more CPU cores are available on NUMA 0 then they are scheduled on NUMA 1. Stress the system.
Expected result
===============
The system stays operational.
Actual result
=============
OOM kicks to kill host process due to lacking of memory on NUMA 0, while there are tons of memory on NUMA 1.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1844721/+subscriptions
Follow ups