yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81637
[Bug 1863757] [NEW] Insufficient memory for guest pages when using NUMA
Public bug reported:
This is a Queens / Bionic openstack deploy.
Compute nodes are using hugepages for nova instances (reserved at boot
time):
root@compute1:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 332
HugePages_Free: 184
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
There are two numa nodes, as follows:
root@compute1:~# lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79
Compute nodes are using DPDK, and memory for it has been reserved with
the following directive:
reserved-huge-pages: "node:0,size:1GB,count:8;node:1,size:1GB,count:8"
A number of instances have already been created on node "compute1", until the point that current memory usage is as follows:
root@compute1:~# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 0 kB
Node 0 ShmemHugePages: 0 kB
Node 0 HugePages_Total: 166
Node 0 HugePages_Free: 26
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 0 kB
Node 1 ShmemHugePages: 0 kB
Node 1 HugePages_Total: 166
Node 1 HugePages_Free: 158
Node 1 HugePages_Surp: 0
Problem:
When a new instance is created (8 cores and 32gb ram), nova tries to
schedule it on numa node 0 and fails with "Insufficient free host memory
pages available to allocate guest RAM", even though there is enough
memory available on numa node 1.
This behavior has been seem by other users also here (although the
solution on that bug seems to be more a coincidence than a proper
solution -- then classified as not a bug, which I don't believe is the
case):
https://bugzilla.redhat.com/show_bug.cgi?id=1517004
Flavor being used has nothing special except a property for
hw:mem_page_size='large'.
Instance is being forced to be created on "zone1::compute1", otherwise
no kind of pinning of cpus or other resources. All the forcing of vm
going to node0 seems to be nova's decision when instantiating it.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1863757
Title:
Insufficient memory for guest pages when using NUMA
Status in OpenStack Compute (nova):
New
Bug description:
This is a Queens / Bionic openstack deploy.
Compute nodes are using hugepages for nova instances (reserved at boot
time):
root@compute1:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 332
HugePages_Free: 184
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 1048576 kB
There are two numa nodes, as follows:
root@compute1:~# lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79
Compute nodes are using DPDK, and memory for it has been reserved with
the following directive:
reserved-huge-pages: "node:0,size:1GB,count:8;node:1,size:1GB,count:8"
A number of instances have already been created on node "compute1", until the point that current memory usage is as follows:
root@compute1:~# cat /sys/devices/system/node/node*/meminfo | grep -i huge
Node 0 AnonHugePages: 0 kB
Node 0 ShmemHugePages: 0 kB
Node 0 HugePages_Total: 166
Node 0 HugePages_Free: 26
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 0 kB
Node 1 ShmemHugePages: 0 kB
Node 1 HugePages_Total: 166
Node 1 HugePages_Free: 158
Node 1 HugePages_Surp: 0
Problem:
When a new instance is created (8 cores and 32gb ram), nova tries to
schedule it on numa node 0 and fails with "Insufficient free host
memory pages available to allocate guest RAM", even though there is
enough memory available on numa node 1.
This behavior has been seem by other users also here (although the
solution on that bug seems to be more a coincidence than a proper
solution -- then classified as not a bug, which I don't believe is the
case):
https://bugzilla.redhat.com/show_bug.cgi?id=1517004
Flavor being used has nothing special except a property for
hw:mem_page_size='large'.
Instance is being forced to be created on "zone1::compute1", otherwise
no kind of pinning of cpus or other resources. All the forcing of vm
going to node0 seems to be nova's decision when instantiating it.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1863757/+subscriptions
Follow ups