← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1464286] Re: NumaTopololgyFilter Not behaving as expected (returns 0 hosts)

 

So I have a system with 40 cores (2 sockets, 10 cores, hypethreading enabled).
The NUMA topology is as follows:

    $ numactl --hardware
    available: 2 nodes (0-1)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
    node 0 size: 32083 MB
    node 0 free: 16652 MB
    node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39
    node 1 size: 32237 MB
    node 1 free: 25386 MB
    node distances:
    node   0   1
      0:  10  21
      1:  21  10

I'm using OpenStack provisioned by DevStack on a Fedora 23 host:

    $ cat /etc/*-release*
    Fedora release 23 (Twenty Three)
    ...
    $ uname -r
    4.3.5-300.fc23.x86_64

    $ cd /opt/stack/nova
    $ git show --oneline
    8bafc99 Merge "remove the unnecessary parem of set_vm_state_and_notify"

I defined a flavor similar to yours, but without the unnecessary swap and
disk space and with a smaller RAM allocation (KISS?).

    $ openstack flavor create bug.1464286 --id 100 --ram 8192 --disk 0 \
        --vcpus 12

    $ openstack flavor set bug.1464286 \
        --property "hw:cpu_policy=dedicated" \
        --property "hw:numa_nodes=1"

    $ openstack flavor show bug.1464286
    +----------------------------+----------------------------------------------+
    | Field                      | Value                                        |
    +----------------------------+----------------------------------------------+
    | OS-FLV-DISABLED:disabled   | False                                        |
    | OS-FLV-EXT-DATA:ephemeral  | 0                                            |
    | disk                       | 0                                            |
    | id                         | 100                                          |
    | name                       | bug.1464286                                  |
    | os-flavor-access:is_public | True                                         |
    | properties                 | hw:cpu_policy='dedicated', hw:numa_nodes='1' |
    | ram                        | 8192                                         |
    | rxtx_factor                | 1.0                                          |
    | swap                       |                                              |
    | vcpus                      | 12                                           |
    +----------------------------+----------------------------------------------+

I also modified the default quotas to allow allocation of more than 20
cores:

    $ openstack quota set --cores 40 demo

I boot one instance...

    $ openstack server create --flavor=bug.1464286 \
        --image=cirros-0.3.4-x86_64-uec --wait test1

    $ sudo virsh list
     Id    Name                           State
    ----------------------------------------------------
     20    instance-00000010              running

    $ sudo virsh dumpxml 20
    <domain type='kvm' id='20'>
      <name>instance-00000010</name>
      ...
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <shares>12288</shares>
        <vcpupin vcpu='0' cpuset='1'/>
        <vcpupin vcpu='1' cpuset='21'/>
        <vcpupin vcpu='2' cpuset='0'/>
        <vcpupin vcpu='3' cpuset='20'/>
        <vcpupin vcpu='4' cpuset='25'/>
        <vcpupin vcpu='5' cpuset='5'/>
        <vcpupin vcpu='6' cpuset='8'/>
        <vcpupin vcpu='7' cpuset='28'/>
        <vcpupin vcpu='8' cpuset='9'/>
        <vcpupin vcpu='9' cpuset='29'/>
        <vcpupin vcpu='10' cpuset='24'/>
        <vcpupin vcpu='11' cpuset='4'/>
        <emulatorpin cpuset='0-1,4-5,8-9,20-21,24-25,28-29'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='0'/>
        <memnode cellid='0' mode='strict' nodeset='0'/>
      </numatune>
      ...
      <cpu>
        <topology sockets='6' cores='1' threads='2'/>
        <numa>
          <cell id='0' cpus='0-11' memory='8388608' unit='KiB'/>
        </numa>
      </cpu>
      ...
    </domain>

Then I boot another...

    $ openstack server create --flavor=bug.1464286 \
        --image=cirros-0.3.4-x86_64-uec --wait test2

    $ sudo virsh list
     Id    Name                           State
    ----------------------------------------------------
     20    instance-00000010              running
     21    instance-00000011              running

    $ sudo virsh dumpxml 20
    <domain type='kvm' id='20'>
      <name>instance-00000010</name>
      ...
      <vcpu placement='static'>12</vcpu>
      <cputune>
        <shares>12288</shares>
        <vcpupin vcpu='0' cpuset='35'/>
        <vcpupin vcpu='1' cpuset='15'/>
        <vcpupin vcpu='2' cpuset='10'/>
        <vcpupin vcpu='3' cpuset='30'/>
        <vcpupin vcpu='4' cpuset='16'/>
        <vcpupin vcpu='5' cpuset='36'/>
        <vcpupin vcpu='6' cpuset='11'/>
        <vcpupin vcpu='7' cpuset='31'/>
        <vcpupin vcpu='8' cpuset='32'/>
        <vcpupin vcpu='9' cpuset='12'/>
        <vcpupin vcpu='10' cpuset='17'/>
        <vcpupin vcpu='11' cpuset='37'/>
        <emulatorpin cpuset='10-12,15-17,30-32,35-37'/>
      </cputune>
      <numatune>
        <memory mode='strict' nodeset='1'/>
        <memnode cellid='0' mode='strict' nodeset='1'/>
      </numatune>
      ...
      <cpu>
        <topology sockets='6' cores='1' threads='2'/>
        <numa>
          <cell id='0' cpus='0-11' memory='8388608' unit='KiB'/>
        </numa>
      </cpu>
      ...
    </domain>

Just for the laughs (figuratively speaking), I tried to boot another to ensure
bug #1438253 was still in effect. It is:

    $ openstack server create --flavor=bug.1464286 --image=cirros-0.3.4-x86_64-uec --wait test3
    Error creating server: test3

    Error creating server

    $ openstack server delete test3

So, based on the above, it seems this bug has been resolved in Mitaka and no
longer applies. I have a rough idea which patches might have fixed this and
they were backported to Liberty (and Kilo) so there's a good chance that things
are fixed there also. For now though, I'm going to close this as "fixed". We can
trace down the exact fix if necessary later.


** Changed in: nova
       Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1464286

Title:
  NumaTopololgyFilter Not behaving as expected (returns 0 hosts)

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  I have a system with 32 cores (2 sockets, 8 cores, hyperthreading enabled).
  The NUMA topology as follows:

  numactl --hardware

  available: 2 nodes (0-1)
  node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
  node 0 size: 65501 MB
  node 0 free: 38562 MB
  node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
  node 1 size: 65535 MB
  node 1 free: 63846 MB
  node distances:
  node   0   1
    0:  10  20
    1:  20  10

  I have defined an flavor in Openstack with 12 vcpus as follows:
  nova flavor-show c4.3xlarge
  +----------------------------+------------------------------------------------------+
  | Property                   | Value                                                |
  +----------------------------+------------------------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                                |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                                    |
  | disk                       | 40                                                   |
  | extra_specs                | {"hw:cpu_policy": "dedicated", "hw:numa_nodes": "1"} |
  | id                         | 1d76a225-90c1-4f6f-a59b-000795c33e63                 |
  | name                       | c4.3xlarge                                           |
  | os-flavor-access:is_public | True                                                 |
  | ram                        | 24576                                                |
  | rxtx_factor                | 1.0                                                  |
  | swap                       | 8192                                                 |
  | vcpus                      | 12                                                   |
  +----------------------------+------------------------------------------------------+

  I expect to be able to launch two instances of this flavor on the 32
  core host, one contained within each NUMA node.

  When I launch two instances, the first succeeds, but the second fails.
  The instance xml is attached, along with the system capabilities.

  If I change hw:numa_nodes = 2, then I can launch two copies of the
  instance.

  N.B for the purposes of testing I have disabled all vcpu_pin and
  isolcpu settings.

  
  This was tested on RDO Kilo running on CentOS 7.
  I had to upgrade the hypervisor with packages from the ovirt master branch in order to support NUMA pinning.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1464286/+subscriptions


References