yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #78029
[Bug 1824974] [NEW] Nova Compute Manager (Resource update) fails if a disk is missing
Public bug reported:
===Description===
We recently ran into an issue with the periodic resource update on a kvm
hypervisor.
if for some reason a disk is missing or unreadable then the periodic
resource updater will fail with
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager [req-a9ee69b3-6e90-4137-8a6a-c4a4a4426b5a 05fafdddb8f7495fbe162e748fe3f63f 13a3d2b57166496e86e9d25ec8967869 - - -] Error updating resources for node ock00043i2.frn00006.cni.ukcloud.com.
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager Traceback (most recent call last):
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6460, in update_available_resource_for_node
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager rt.update_available_resource(context)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 511, in update_available_resource
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager resources = self.driver.get_available_resource(self.nodename)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5599, in get_available_resource
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7192, in _get_disk_over_committed_size_total
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager block_device_info=block_device_info)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7101, in _get_instance_disk_info
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager dk_size = disk_api.get_allocated_disk_size(path)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 158, in get_allocated_disk_size
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager return images.qemu_img_info(path).disk_size
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager raise exception.DiskNotFound(location=path)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/7af164a3-2eb9-4cb3-9a81-3fd57bf087e9/disk
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager
This is of course expected, in that the disk is missing, but the issue
arises by the fact that the nova.compute.manager periodic resource
update is unable to update ANY resource information for this hypervisor,
which leads to stale stats.
It would be better if this was logged, but the other stats CPU/Memory
were able to be updated.
Steps to reproduce
===================
1. Boot an instance on a hypervisor
2. shut the instance down
3. rename the folder the disk is located in
4. watch the logs for the above error
5. Build additional instances on hypervisor
6. Look and see if stats are updated.
Expected results
=================
CPU and memory stats should still be updated, with maybe disk stats being not updated, or marked as stale?
Actual Results
=================
No stats are updated for the hypervisor.
Environment
===================
Newton OpenStack - Although looking at the latest code, i think this is still an issue in the latest release.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1824974
Title:
Nova Compute Manager (Resource update) fails if a disk is missing
Status in OpenStack Compute (nova):
New
Bug description:
===Description===
We recently ran into an issue with the periodic resource update on a
kvm hypervisor.
if for some reason a disk is missing or unreadable then the periodic
resource updater will fail with
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager [req-a9ee69b3-6e90-4137-8a6a-c4a4a4426b5a 05fafdddb8f7495fbe162e748fe3f63f 13a3d2b57166496e86e9d25ec8967869 - - -] Error updating resources for node ock00043i2.frn00006.cni.ukcloud.com.
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager Traceback (most recent call last):
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6460, in update_available_resource_for_node
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager rt.update_available_resource(context)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 511, in update_available_resource
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager resources = self.driver.get_available_resource(self.nodename)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5599, in get_available_resource
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager disk_over_committed = self._get_disk_over_committed_size_total()
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7192, in _get_disk_over_committed_size_total
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager block_device_info=block_device_info)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7101, in _get_instance_disk_info
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager dk_size = disk_api.get_allocated_disk_size(path)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/disk/api.py", line 158, in get_allocated_disk_size
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager return images.qemu_img_info(path).disk_size
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/virt/images.py", line 57, in qemu_img_info
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager raise exception.DiskNotFound(location=path)
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager DiskNotFound: No disk at /var/lib/nova/instances/7af164a3-2eb9-4cb3-9a81-3fd57bf087e9/disk
2019-04-16 09:38:29.708 151890 ERROR nova.compute.manager
This is of course expected, in that the disk is missing, but the issue
arises by the fact that the nova.compute.manager periodic resource
update is unable to update ANY resource information for this
hypervisor, which leads to stale stats.
It would be better if this was logged, but the other stats CPU/Memory
were able to be updated.
Steps to reproduce
===================
1. Boot an instance on a hypervisor
2. shut the instance down
3. rename the folder the disk is located in
4. watch the logs for the above error
5. Build additional instances on hypervisor
6. Look and see if stats are updated.
Expected results
=================
CPU and memory stats should still be updated, with maybe disk stats being not updated, or marked as stale?
Actual Results
=================
No stats are updated for the hypervisor.
Environment
===================
Newton OpenStack - Although looking at the latest code, i think this is still an issue in the latest release.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1824974/+subscriptions