← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1898272] [NEW] "mixed" policy calculations don't account for host cells with no free shared CPUs

 

Public bug reported:

The 'mixed' CPU policy allows us to use both shared and dedicated CPUs
(VCPU and PCPU) in the same instance. The expectation is that the both
sets of CPUs will use host cores from the same NUMA node(s). The current
code does appear to be doing this, at least for single NUMA nodes,
however, it does not account for NUMA nodes without any shared CPUs.

# Steps to reproduce

Configure a dual NUMA node host so that all cores from one node are
assigned to '[compute] cpu_shared_set', while all the cores from the
other node are assigned to '[compute] cpu_dedicated_set'. For example,
on a host where cores 0-5 are on node 0, while cores 6-11 are on node 1:

  [compute]
  cpu_shared_set = 0-5
  cpu_dedicated_set = 6-11

 Now attempt to boot a guest using the mixed policy, e.g.

  $ openstack flavor create --vcpu 4 --ram 512 --disk 1 \
      --property 'hw:cpu_policy=mixed' --property 'hw:cpu_dedicated_mask=^0' \
      test.mixed
  $ openstack server create --os-compute-api-version=2.latest \
      --flavor test.mixed --image cirros-0.5.1-x86_64-disk --nic none --wait \
      test-server

# Expected result

The instance should fail to schedule as the 'NUMATopologyFilter' should
reject the host.

# Actual result

The instance is scheduled but fails to boot since the following invalid
XML snippet is generated:

  <cputune>
    <shares>4096</shares>
    <emulatorpin cpuset="0-1,4"/>
    <vcpupin vcpu="0" cpuset=""/>  # <--- here
    <vcpupin vcpu="1" cpuset="0"/>
    <vcpupin vcpu="2" cpuset="1"/>
    <vcpupin vcpu="3" cpuset="4"/>
  </cputune>

This results in the following traceback in the nova-compute logs.

  ERROR nova.compute.manager [instance: ...] Traceback (most recent call last):
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/compute/manager.py", line 2625, in _build_resources
  ERROR nova.compute.manager [instance: ...]     yield resources
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/compute/manager.py", line 2398, in _build_and_run_instance
  ERROR nova.compute.manager [instance: ...]     accel_info=accel_info)
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3752, in spawn
  ERROR nova.compute.manager [instance: ...]     cleanup_instance_disks=created_disks)
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6749, in _create_guest_with_network
  ERROR nova.compute.manager [instance: ...]     cleanup_instance_disks=cleanup_instance_disks)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  ERROR nova.compute.manager [instance: ...]     self.force_reraise()
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  ERROR nova.compute.manager [instance: ...]     six.reraise(self.type_, self.value, self.tb)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.compute.manager [instance: ...]     raise value
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6718, in _create_guest_with_network
  ERROR nova.compute.manager [instance: ...]     post_xml_callback=post_xml_callback)
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6643, in _create_guest
  ERROR nova.compute.manager [instance: ...]     guest = libvirt_guest.Guest.create(xml, self._host)
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 145, in create
  ERROR nova.compute.manager [instance: ...]     encodeutils.safe_decode(xml))
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
  ERROR nova.compute.manager [instance: ...]     self.force_reraise()
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
  ERROR nova.compute.manager [instance: ...]     six.reraise(self.type_, self.value, self.tb)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.compute.manager [instance: ...]     raise value
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 141, in create
  ERROR nova.compute.manager [instance: ...]     guest = host.write_instance_config(xml)
  ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1144, in write_instance_config
  ERROR nova.compute.manager [instance: ...]     domain = self.get_connection().defineXML(xml)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit
  ERROR nova.compute.manager [instance: ...]     result = proxy_call(self._autowrap, f, *args, **kwargs)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call
  ERROR nova.compute.manager [instance: ...]     rv = execute(f, *args, **kwargs)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute
  ERROR nova.compute.manager [instance: ...]     six.reraise(c, e, tb)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
  ERROR nova.compute.manager [instance: ...]     raise value
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, in tworker
  ERROR nova.compute.manager [instance: ...]     rv = meth(*args, **kwargs)
  ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 3703, in defineXML
  ERROR nova.compute.manager [instance: ...]     if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
  ERROR nova.compute.manager [instance: ...] libvirt.libvirtError: invalid argument: Failed to parse bitmap ''

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1898272

Title:
  "mixed" policy calculations don't account for host cells with no free
  shared CPUs

Status in OpenStack Compute (nova):
  New

Bug description:
  The 'mixed' CPU policy allows us to use both shared and dedicated CPUs
  (VCPU and PCPU) in the same instance. The expectation is that the both
  sets of CPUs will use host cores from the same NUMA node(s). The
  current code does appear to be doing this, at least for single NUMA
  nodes, however, it does not account for NUMA nodes without any shared
  CPUs.

  # Steps to reproduce

  Configure a dual NUMA node host so that all cores from one node are
  assigned to '[compute] cpu_shared_set', while all the cores from the
  other node are assigned to '[compute] cpu_dedicated_set'. For example,
  on a host where cores 0-5 are on node 0, while cores 6-11 are on node
  1:

    [compute]
    cpu_shared_set = 0-5
    cpu_dedicated_set = 6-11

   Now attempt to boot a guest using the mixed policy, e.g.

    $ openstack flavor create --vcpu 4 --ram 512 --disk 1 \
        --property 'hw:cpu_policy=mixed' --property 'hw:cpu_dedicated_mask=^0' \
        test.mixed
    $ openstack server create --os-compute-api-version=2.latest \
        --flavor test.mixed --image cirros-0.5.1-x86_64-disk --nic none --wait \
        test-server

  # Expected result

  The instance should fail to schedule as the 'NUMATopologyFilter'
  should reject the host.

  # Actual result

  The instance is scheduled but fails to boot since the following
  invalid XML snippet is generated:

    <cputune>
      <shares>4096</shares>
      <emulatorpin cpuset="0-1,4"/>
      <vcpupin vcpu="0" cpuset=""/>  # <--- here
      <vcpupin vcpu="1" cpuset="0"/>
      <vcpupin vcpu="2" cpuset="1"/>
      <vcpupin vcpu="3" cpuset="4"/>
    </cputune>

  This results in the following traceback in the nova-compute logs.

    ERROR nova.compute.manager [instance: ...] Traceback (most recent call last):
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/compute/manager.py", line 2625, in _build_resources
    ERROR nova.compute.manager [instance: ...]     yield resources
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/compute/manager.py", line 2398, in _build_and_run_instance
    ERROR nova.compute.manager [instance: ...]     accel_info=accel_info)
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3752, in spawn
    ERROR nova.compute.manager [instance: ...]     cleanup_instance_disks=created_disks)
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6749, in _create_guest_with_network
    ERROR nova.compute.manager [instance: ...]     cleanup_instance_disks=cleanup_instance_disks)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    ERROR nova.compute.manager [instance: ...]     self.force_reraise()
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    ERROR nova.compute.manager [instance: ...]     six.reraise(self.type_, self.value, self.tb)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
    ERROR nova.compute.manager [instance: ...]     raise value
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6718, in _create_guest_with_network
    ERROR nova.compute.manager [instance: ...]     post_xml_callback=post_xml_callback)
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6643, in _create_guest
    ERROR nova.compute.manager [instance: ...]     guest = libvirt_guest.Guest.create(xml, self._host)
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 145, in create
    ERROR nova.compute.manager [instance: ...]     encodeutils.safe_decode(xml))
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    ERROR nova.compute.manager [instance: ...]     self.force_reraise()
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    ERROR nova.compute.manager [instance: ...]     six.reraise(self.type_, self.value, self.tb)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
    ERROR nova.compute.manager [instance: ...]     raise value
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 141, in create
    ERROR nova.compute.manager [instance: ...]     guest = host.write_instance_config(xml)
    ERROR nova.compute.manager [instance: ...]   File "/opt/stack/nova/nova/virt/libvirt/host.py", line 1144, in write_instance_config
    ERROR nova.compute.manager [instance: ...]     domain = self.get_connection().defineXML(xml)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 190, in doit
    ERROR nova.compute.manager [instance: ...]     result = proxy_call(self._autowrap, f, *args, **kwargs)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 148, in proxy_call
    ERROR nova.compute.manager [instance: ...]     rv = execute(f, *args, **kwargs)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 129, in execute
    ERROR nova.compute.manager [instance: ...]     six.reraise(c, e, tb)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/six.py", line 703, in reraise
    ERROR nova.compute.manager [instance: ...]     raise value
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/eventlet/tpool.py", line 83, in tworker
    ERROR nova.compute.manager [instance: ...]     rv = meth(*args, **kwargs)
    ERROR nova.compute.manager [instance: ...]   File "/usr/local/lib/python3.6/dist-packages/libvirt.py", line 3703, in defineXML
    ERROR nova.compute.manager [instance: ...]     if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
    ERROR nova.compute.manager [instance: ...] libvirt.libvirtError: invalid argument: Failed to parse bitmap ''

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1898272/+subscriptions