yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #64306
[Bug 1692397] Re: hypervisor statistics could be incorrect
Reviewed: https://review.openstack.org/467220
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3d3e9cdd774efe96f468f2bcba6c09a40f5e71d3
Submitter: Jenkins
Branch: master
commit 3d3e9cdd774efe96f468f2bcba6c09a40f5e71d3
Author: Kevin_Zheng <zhengzhenyu@xxxxxxxxxx>
Date: Tue May 23 20:28:28 2017 +0800
Exclude deleted service records when calling hypervisor statistics
Hypervisor statistics could be incorrect if not
exclude deleted service records from DB.
User may stop 'nova-compute' service on some
compute nodes and delete the service from nova.
When delete 'nova-compute' service, it performs
'soft-delete' to the corresponding db records in
both 'service' table and 'compute_nodes' table if
the compute_nodes record is old, i.e. it is linked
to the service record. For modern compute_nodes
records, they aren't linked to the services table
so deleting the services record will not delete
the compute_nodes record, and the ResourceTracker
won't recreate the compute_nodes record if the host
and hypervisor_hostname still match the existing
record, but restarting the process after deleting
the service will create a new services table record
with the same host/binary/topic.
If the 'nova-compute' service on that server
re-starts, it will automatically add a record
in 'compute_nodes' table (assuming it was deleted
because it was an old-style record) and also a correspoding
record in 'service' table, and if the host name
of the compute node did not change, the newly
created records in 'service' and 'compute_nodes'
table will be identical to the priously soft-deleted
records except the deleted row.
When calling Hypervisor-statistics, the DB layer
joined records across the whole deployment by
comparing records' host field selected from
serivce table and records' host field selected
from compute_nodes table, and the calculated
results could be multiplied if multiple records
from service table have the same host field,
and this scenario could happen if user perform
the above actions.
Co-Authored-By: Matt Riedemann <mriedem.os@xxxxxxxxx>
Change-Id: I9dfa15f69f8ef9c6cb36b2734a8601bd73e9d6b3
Closes-Bug: #1692397
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1692397
Title:
hypervisor statistics could be incorrect
Status in OpenStack Compute (nova):
Fix Released
Status in OpenStack Compute (nova) ocata series:
Confirmed
Bug description:
Hypervisor statistics could be incorrect:
When we killed a nova-compute service and deleted the service from nova DB, and then
start the nova-compute service again, the result of Hypervisor/statistics API (nova hypervisor-stats) will be
incorrect;
How to reproduce:
Step1. Check the correct statistics before we do anything:
root@SZX1000291919:/opt/stack/nova# nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 1 |
| current_workload | 0 |
| disk_available_least | 14 |
| free_disk_gb | 34 |
| free_ram_mb | 6936 |
| local_gb | 35 |
| local_gb_used | 1 |
| memory_mb | 7960 |
| memory_mb_used | 1024 |
| running_vms | 1 |
| vcpus | 8 |
| vcpus_used | 1 |
+----------------------+-------+
Step2. Kill the compute service:
root@SZX1000291919:/var/log/nova# ps -ef | grep nova-com
root 120419 120411 0 11:06 pts/27 00:00:00 sg libvirtd /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
root 120420 120419 0 11:06 pts/27 00:00:07 /usr/bin/python /usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
root@SZX1000291919:/var/log/nova# kill -9 120419
root@SZX1000291919:/var/log/nova# /usr/local/bin/stack: line 19: 120419 Killed sg libvirtd '/usr/local/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log' > /dev/null 2>&1
root@SZX1000291919:/var/log/nova# nova service-list
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
| 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:36.000000 | - |
| 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:36.000000 | - |
| 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:24:37.000000 | - |
| 8 | nova-compute | SZX1000291919 | nova | enabled | down | 2017-05-22T03:23:38.000000 | - |
| 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - |
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
Step3. Delete the service from DB:
root@SZX1000291919:/var/log/nova# nova service-delete 8
root@SZX1000291919:/var/log/nova# nova service-list
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
| 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:16.000000 | - |
| 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:16.000000 | - |
| 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:25:17.000000 | - |
| 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - |
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
Step4. Start the compute service again:
root@SZX1000291919:/var/log/nova# nova service-list
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
| 4 | nova-conductor | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:55.000000 | - |
| 6 | nova-scheduler | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:56.000000 | - |
| 7 | nova-consoleauth | SZX1000291919 | internal | enabled | up | 2017-05-22T03:48:56.000000 | - |
| 9 | nova-cert | SZX1000291919 | internal | enabled | down | 2017-05-17T02:50:13.000000 | - |
| 10 | nova-compute | SZX1000291919 | nova | enabled | up | 2017-05-22T03:48:57.000000 | - |
+----+------------------+---------------+----------+---------+-------+----------------------------+-----------------+
Step5. Check again the hyervisor statistics, the result is incorrect:
root@SZX1000291919:/var/log/nova# nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 2 |
| current_workload | 0 |
| disk_available_least | 28 |
| free_disk_gb | 68 |
| free_ram_mb | 13872 |
| local_gb | 70 |
| local_gb_used | 2 |
| memory_mb | 15920 |
| memory_mb_used | 2048 |
| running_vms | 2 |
| vcpus | 16 |
| vcpus_used | 2 |
+----------------------+-------+
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1692397/+subscriptions
References