← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1692397] Re: hypervisor statistics could be incorrect

 

Reviewed:  https://review.openstack.org/467220
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3d3e9cdd774efe96f468f2bcba6c09a40f5e71d3
Submitter: Jenkins
Branch:    master

commit 3d3e9cdd774efe96f468f2bcba6c09a40f5e71d3
Author: Kevin_Zheng <zhengzhenyu@xxxxxxxxxx>
Date:   Tue May 23 20:28:28 2017 +0800

    Exclude deleted service records when calling hypervisor statistics
    
    Hypervisor statistics could be incorrect if not
    exclude deleted service records from DB.
    
    User may stop 'nova-compute' service on some
    compute nodes and delete the service from nova.
    When delete 'nova-compute' service, it performs
    'soft-delete' to the corresponding db records in
    both 'service' table and 'compute_nodes' table if
    the compute_nodes record is old, i.e. it is linked
    to the service record. For modern compute_nodes
    records, they aren't linked to the services table
    so deleting the services record will not delete
    the compute_nodes record, and the ResourceTracker
    won't recreate the compute_nodes record if the host
    and hypervisor_hostname still match the existing
    record, but restarting the process after deleting
    the service will create a new services table record
    with the same host/binary/topic.
    
    If the 'nova-compute' service on that server
    re-starts, it will automatically add a record
    in 'compute_nodes' table (assuming it was deleted
    because it was an old-style record) and also a correspoding
    record in 'service' table, and if the host name
    of the compute node did not change, the newly
    created records in 'service' and 'compute_nodes'
    table will be identical to the priously soft-deleted
    records except the deleted row.
    
    When calling Hypervisor-statistics, the DB layer
    joined records across the whole deployment by
    comparing records' host field selected from
    serivce table and records' host field selected
    from compute_nodes table, and the calculated
    results could be multiplied if multiple records
    from service table have the same host field,
    and this scenario could happen if user perform
    the above actions.
    
    Co-Authored-By: Matt Riedemann <mriedem.os@xxxxxxxxx>
    
    Change-Id: I9dfa15f69f8ef9c6cb36b2734a8601bd73e9d6b3
    Closes-Bug: #1692397


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1692397

Title:
  hypervisor statistics could be incorrect

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Confirmed

Bug description:
  Hypervisor statistics could be incorrect:

  When we killed a nova-compute service and deleted the service from nova DB, and then
  start the nova-compute service again, the result of Hypervisor/statistics API (nova hypervisor-stats) will be
  incorrect;

  How to reproduce:

  Step1. Check the correct statistics before we do anything:
  root@SZX1000291919:/opt/stack/nova# nova  hypervisor-stats
  +----------------------+-------+
  | Property             | Value |
  +----------------------+-------+
  | count                | 1     |
  | current_workload     | 0     |
  | disk_available_least | 14    |
  | free_disk_gb         | 34    |
  | free_ram_mb          | 6936  |
  | local_gb             | 35    |
  | local_gb_used        | 1     |
  | memory_mb            | 7960  |
  | memory_mb_used       | 1024  |
  | running_vms          | 1     |
  | vcpus                | 8     |
  | vcpus_used           | 1     |
  +----------------------+-------+

  Step2. Kill the compute service:
  root@SZX1000291919:/var/log/nova# ps -ef | grep nova-com
  root     120419 120411  0 11:06 pts/27   00:00:00 sg libvirtd /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
  root     120420 120419  0 11:06 pts/27   00:00:07 /usr/bin/python /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log

  root@SZX1000291919:/var/log/nova# kill -9 120419
  root@SZX1000291919:/var/log/nova# /usr/local/bin/stack: line 19: 120419 Killed                  sg libvirtd '/usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log' > /dev/null 2>&1

  root@SZX1000291919:/var/log/nova# nova service-list
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
  | Id | Binary           | Host          | Zone     | Status  | State | Updated_at                 | Disabled Reason |
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
  | 4  | nova-conductor   | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:24:36.000000 | -               |
  | 6  | nova-scheduler   | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:24:36.000000 | -               |
  | 7  | nova-consoleauth | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:24:37.000000 | -               |
  | 8  | nova-compute     | SZX1000291919 | nova     | enabled | down  | 2017-05-22T03:23:38.000000 | -               |
  | 9  | nova-cert        | SZX1000291919 | internal | enabled | down  | 2017-05-17T02:50:13.000000 | -               |
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+

  Step3. Delete the service from DB:

  root@SZX1000291919:/var/log/nova# nova service-delete 8
  root@SZX1000291919:/var/log/nova# nova service-list
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
  | Id | Binary           | Host          | Zone     | Status  | State | Updated_at                 | Disabled Reason |
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
  | 4  | nova-conductor   | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:25:16.000000 | -               |
  | 6  | nova-scheduler   | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:25:16.000000 | -               |
  | 7  | nova-consoleauth | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:25:17.000000 | -               |
  | 9  | nova-cert        | SZX1000291919 | internal | enabled | down  | 2017-05-17T02:50:13.000000 | -               |
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+

  Step4. Start the compute service again:
  root@SZX1000291919:/var/log/nova# nova service-list
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
  | Id | Binary           | Host          | Zone     | Status  | State | Updated_at                 | Disabled Reason |
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
  | 4  | nova-conductor   | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:48:55.000000 | -               |
  | 6  | nova-scheduler   | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:48:56.000000 | -               |
  | 7  | nova-consoleauth | SZX1000291919 | internal | enabled | up    | 2017-05-22T03:48:56.000000 | -               |
  | 9  | nova-cert        | SZX1000291919 | internal | enabled | down  | 2017-05-17T02:50:13.000000 | -               |
  | 10 | nova-compute     | SZX1000291919 | nova     | enabled | up    | 2017-05-22T03:48:57.000000 | -               |
  +----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+

  Step5. Check again the hyervisor statistics, the result is incorrect:

  root@SZX1000291919:/var/log/nova# nova  hypervisor-stats
  +----------------------+-------+
  | Property             | Value |
  +----------------------+-------+
  | count                | 2     |
  | current_workload     | 0     |
  | disk_available_least | 28    |
  | free_disk_gb         | 68    |
  | free_ram_mb          | 13872 |
  | local_gb             | 70    |
  | local_gb_used        | 2     |
  | memory_mb            | 15920 |
  | memory_mb_used       | 2048  |
  | running_vms          | 2     |
  | vcpus                | 16    |
  | vcpus_used           | 2     |
  +----------------------+-------+

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1692397/+subscriptions


References