yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #80751
[Bug 1853259] [NEW] performance gaps on detect crashed instance
Public bug reported:
Description
===========
If a QEMU process crashed(oom, etc.), libvirt will send an event which say the instance stopped, and in detail say the instance stopped failed. But nova only handle the stop event, it not check the detail.
When event handler receive a stopped event, it will sleep 15s to ensure the event is not sent by a reboot operation.
https://github.com/openstack/nova/blob/stable/train/nova/virt/libvirt/host.py#L352
As a result, nova will take a long time to detect the crashed instance.
Steps to reproduce
==================
1. Launch a VM
2. Login the compute node, find the corresponding process, and kill the process:
"kill -SIGBUS pid"
Expected result
===============
The nova service can detect the crashed event in second.
Actual result
=============
Nova need more that 10 seconds to handle the event.
Environment
===========
1. OpenStack cluster version
master build 2019.11.11 (all-in-one)
2. Hypervisor
Libvirt + KVM
3. Storage type
Ceph
4. Networking type
Neutron with OVS
** Affects: nova
Importance: Undecided
Status: New
** Tags: libvirt
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1853259
Title:
performance gaps on detect crashed instance
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
If a QEMU process crashed(oom, etc.), libvirt will send an event which say the instance stopped, and in detail say the instance stopped failed. But nova only handle the stop event, it not check the detail.
When event handler receive a stopped event, it will sleep 15s to ensure the event is not sent by a reboot operation.
https://github.com/openstack/nova/blob/stable/train/nova/virt/libvirt/host.py#L352
As a result, nova will take a long time to detect the crashed
instance.
Steps to reproduce
==================
1. Launch a VM
2. Login the compute node, find the corresponding process, and kill the process:
"kill -SIGBUS pid"
Expected result
===============
The nova service can detect the crashed event in second.
Actual result
=============
Nova need more that 10 seconds to handle the event.
Environment
===========
1. OpenStack cluster version
master build 2019.11.11 (all-in-one)
2. Hypervisor
Libvirt + KVM
3. Storage type
Ceph
4. Networking type
Neutron with OVS
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1853259/+subscriptions