yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1638889] [NEW] Libvirt get_available_resource is reporting incorrect vcpus_used data for QEMU/LXC instances

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Daniel Berrange <1638889@xxxxxxxxxxxxxxxxxx>
Date: Thu, 03 Nov 2016 11:31:23 -0000
Reply-to: Bug 1638889 <1638889@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Public bug reported:

Currently if Nova is using the libvirt LXC driver, it is hardcoded to report 1 vCPU used on the host, regardless of how many containers are running.
   
Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info method is throwing an exception, since QEMU does not use a dedicated thread per vCPU currently. The effect is that on QEMU hosts, we're reporting 0 vCPUs used on the host regardless of how many guests are running

This causes the 'get_available_resources' method to report incorrect
'vcpus_used' values for the compute node:

eg with 2 instances running:

$ nova list
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                                               |
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+
| deee00d9-3903-43aa-aa33-40e869b61bf6 | demo1 | ACTIVE | -          | Running     | private=10.0.0.4, 2001:db8:8000:0:f816:3eff:fe8f:135d  |
| 3d160f7c-18fb-4c62-8464-5477be7432d0 | demo2 | ACTIVE | -          | Running     | private=10.0.0.13, 2001:db8:8000:0:f816:3eff:fef6:58d9 |
+--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+

We're correctly recording that 2 vCPUs are used against the compute node

$ nova hypervisor-show 1 | grep vcpus
| vcpus                     | 12                                       |
| vcpus_used                | 2                                        |

but when reporting the hypervisors view of available vCPUs the value
never lowers from 12. eg it should be reporting 10, but it reports 12:

$ grep 'Hypervisor: free VCPUs' ../logs/n-cpu.log | tail
2016-11-03 11:17:24.003 19647 DEBUG nova.compute.resource_tracker [req-559dcffd-b4c8-494b-b1f2-f936346132cf - -] Hypervisor: free VCPUs: 12 _report_hypervisor_resource_view /home/ds-f23-master/openstack/nova/nova/compute/resource_tracker.py:623


The resource tracker ignores the vcpus_used value reported by the hypervisor (which is arguably a bug in itself, because it causes it to incorrectly over-count QEMU CPU usage), but at least it means it is not affected by this libvirt bug - it merely causes mis-leading log messages to be emitted. None the less we should fix the libvirt reporting so that it is possible to have resource tracker honour this data in the future.

** Affects: nova
     Importance: Undecided
     Assignee: Daniel Berrange (berrange)
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1638889

Title:
  Libvirt get_available_resource is reporting incorrect vcpus_used data
  for QEMU/LXC instances

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Currently if Nova is using the libvirt LXC driver, it is hardcoded to report 1 vCPU used on the host, regardless of how many containers are running.
     
  Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info method is throwing an exception, since QEMU does not use a dedicated thread per vCPU currently. The effect is that on QEMU hosts, we're reporting 0 vCPUs used on the host regardless of how many guests are running

  This causes the 'get_available_resources' method to report incorrect
  'vcpus_used' values for the compute node:

  eg with 2 instances running:

  $ nova list
  +--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+
  | ID                                   | Name  | Status | Task State | Power State | Networks                                               |
  +--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+
  | deee00d9-3903-43aa-aa33-40e869b61bf6 | demo1 | ACTIVE | -          | Running     | private=10.0.0.4, 2001:db8:8000:0:f816:3eff:fe8f:135d  |
  | 3d160f7c-18fb-4c62-8464-5477be7432d0 | demo2 | ACTIVE | -          | Running     | private=10.0.0.13, 2001:db8:8000:0:f816:3eff:fef6:58d9 |
  +--------------------------------------+-------+--------+------------+-------------+--------------------------------------------------------+

  We're correctly recording that 2 vCPUs are used against the compute
  node

  $ nova hypervisor-show 1 | grep vcpus
  | vcpus                     | 12                                       |
  | vcpus_used                | 2                                        |

  but when reporting the hypervisors view of available vCPUs the value
  never lowers from 12. eg it should be reporting 10, but it reports 12:

  $ grep 'Hypervisor: free VCPUs' ../logs/n-cpu.log | tail
  2016-11-03 11:17:24.003 19647 DEBUG nova.compute.resource_tracker [req-559dcffd-b4c8-494b-b1f2-f936346132cf - -] Hypervisor: free VCPUs: 12 _report_hypervisor_resource_view /home/ds-f23-master/openstack/nova/nova/compute/resource_tracker.py:623

  
  The resource tracker ignores the vcpus_used value reported by the hypervisor (which is arguably a bug in itself, because it causes it to incorrectly over-count QEMU CPU usage), but at least it means it is not affected by this libvirt bug - it merely causes mis-leading log messages to be emitted. None the less we should fix the libvirt reporting so that it is possible to have resource tracker honour this data in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1638889/+subscriptions
Follow ups

[Bug 1638889] Re: Libvirt get_available_resource is reporting incorrect vcpus_used data for QEMU/LXC instances
From: OpenStack Infra, 2017-01-27