← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1829349] [NEW] Resource Tracker failed to update usage in case numa topology conflict happen

 

Public bug reported:

Let me describe when this bug will happen first.

Assume there are 2 running VMs they are booted with flavor that contain
metadata 'hw:cpu_policy=dedicated'.

And assume some of these 2 VMs' vCPUs are pinned to the same physical CPU.
Let's say VM1 pinned to {"0": 50, "1": 22, "2": 49, "3": 21}
and VM2 pinned to {"0": 27, "1": 55, "2": 50, "3": 22}

Refer to patch https://opendev.org/openstack/nova/commit/52b89734426253f64b6d4797ba4d849c3020fb52 merged in Rocky Release.
By default live migration is disabled if instance has a numa topology. But can still be enabled by configure CONF.workarounds.enable_numa_live_migration. As live migration is really very important for us in daily operation work.

So when I do live migration to these 2 VMs at the same time what will
happen?

In my case, I will encounter 3 problems.
#1. Because numa related information is not reported to placement. Placement API will return same candidates, and as schedule action is async so there is probability that VM1 and VM2 will pick the same dest host. In my case, these 2 VMs both passed scheduler and picked the same host.

#2. And then as BP numa-aware-live-
migration[https://review.opendev.org/#/q/topic:bp/numa-aware-live-
migration] is not implement completed yet, VM1 and VM2 will use the same
numa-topology from their src host. So as a result after VM1 and VM2
start up. Conflict will happen on host CPU 50 and 22. About numa-aware-
live-migration related bug can be found at
https://bugs.launchpad.net/nova/+bug/1289064

#3. And as VM1 and VM2 have numa-topology conflict, we will hit the
third problem. That is as the title says resource tracker failed to
update usage. That is because when call _update_usage in RT, it will
eventually call numa_usage_from_instances.

nova.compute.resource_tracker:_update_usage
` nova.virt.hardware:get_host_numa_usage_from_instance
  ` nova.virt.hardware:numa_usage_from_instances

And numa_usage_from_instances will

                    if free:
                        if (instancecell.cpu_thread_policy ==
                                fields.CPUThreadAllocationPolicy.ISOLATE):
                            newcell.unpin_cpus_with_siblings(pinned_cpus)
                        else:
                            newcell.unpin_cpus(pinned_cpus)
                    else:
                        if (instancecell.cpu_thread_policy ==
                                fields.CPUThreadAllocationPolicy.ISOLATE):
                            newcell.pin_cpus_with_siblings(pinned_cpus)
                        else:
                            newcell.pin_cpus(pinned_cpus)

And in pin_cpus, pin_cpus_with_siblings, unpin_cpus and
unpin_cpus_with_siblings, if there is numa-topology conflict, they will
raise an Exception. The result is RT failed to update usage to
Scheduler. And Eventually cause scheduler always think this host has
enough resource to boot new VMs. So the result is disaster.


So I think, to completed solve problem with VMs has a numa-topology.
For 
Problem #1, we need to report numa topology to placement API as well, and take numa-topology into account when get candidates from placement.

Problem #2,we need to continue complete BP numa-aware-live-migration

Problem #3, numa_usage_from_instances is used in RT and scheduler.
In scheduler numa_usage_from_instances will not hit this problem because it is used right after virt.hardware.numa_fit_instance_to_host. So I think raise an exception has no meaning. We can just change the exception to an Error Log instead.


Above is the summary about the live migration issue in my mind.
And this bug is focused on solving the problem#3.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1829349

Title:
  Resource Tracker failed to update usage in case numa topology conflict
  happen

Status in OpenStack Compute (nova):
  New

Bug description:
  Let me describe when this bug will happen first.

  Assume there are 2 running VMs they are booted with flavor that
  contain metadata 'hw:cpu_policy=dedicated'.

  And assume some of these 2 VMs' vCPUs are pinned to the same physical CPU.
  Let's say VM1 pinned to {"0": 50, "1": 22, "2": 49, "3": 21}
  and VM2 pinned to {"0": 27, "1": 55, "2": 50, "3": 22}

  Refer to patch https://opendev.org/openstack/nova/commit/52b89734426253f64b6d4797ba4d849c3020fb52 merged in Rocky Release.
  By default live migration is disabled if instance has a numa topology. But can still be enabled by configure CONF.workarounds.enable_numa_live_migration. As live migration is really very important for us in daily operation work.

  So when I do live migration to these 2 VMs at the same time what will
  happen?

  In my case, I will encounter 3 problems.
  #1. Because numa related information is not reported to placement. Placement API will return same candidates, and as schedule action is async so there is probability that VM1 and VM2 will pick the same dest host. In my case, these 2 VMs both passed scheduler and picked the same host.

  #2. And then as BP numa-aware-live-
  migration[https://review.opendev.org/#/q/topic:bp/numa-aware-live-
  migration] is not implement completed yet, VM1 and VM2 will use the
  same numa-topology from their src host. So as a result after VM1 and
  VM2 start up. Conflict will happen on host CPU 50 and 22. About numa-
  aware-live-migration related bug can be found at
  https://bugs.launchpad.net/nova/+bug/1289064

  #3. And as VM1 and VM2 have numa-topology conflict, we will hit the
  third problem. That is as the title says resource tracker failed to
  update usage. That is because when call _update_usage in RT, it will
  eventually call numa_usage_from_instances.

  nova.compute.resource_tracker:_update_usage
  ` nova.virt.hardware:get_host_numa_usage_from_instance
    ` nova.virt.hardware:numa_usage_from_instances

  And numa_usage_from_instances will

                      if free:
                          if (instancecell.cpu_thread_policy ==
                                  fields.CPUThreadAllocationPolicy.ISOLATE):
                              newcell.unpin_cpus_with_siblings(pinned_cpus)
                          else:
                              newcell.unpin_cpus(pinned_cpus)
                      else:
                          if (instancecell.cpu_thread_policy ==
                                  fields.CPUThreadAllocationPolicy.ISOLATE):
                              newcell.pin_cpus_with_siblings(pinned_cpus)
                          else:
                              newcell.pin_cpus(pinned_cpus)

  And in pin_cpus, pin_cpus_with_siblings, unpin_cpus and
  unpin_cpus_with_siblings, if there is numa-topology conflict, they
  will raise an Exception. The result is RT failed to update usage to
  Scheduler. And Eventually cause scheduler always think this host has
  enough resource to boot new VMs. So the result is disaster.


  So I think, to completed solve problem with VMs has a numa-topology.
  For 
  Problem #1, we need to report numa topology to placement API as well, and take numa-topology into account when get candidates from placement.

  Problem #2,we need to continue complete BP numa-aware-live-migration

  Problem #3, numa_usage_from_instances is used in RT and scheduler.
  In scheduler numa_usage_from_instances will not hit this problem because it is used right after virt.hardware.numa_fit_instance_to_host. So I think raise an exception has no meaning. We can just change the exception to an Error Log instead.

  
  Above is the summary about the live migration issue in my mind.
  And this bug is focused on solving the problem#3.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1829349/+subscriptions