yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #43517
[Bug 1444630] Re: nova-compute should stop handling virt lifecycle events when it's shutting down
Libvirt event threads are not stopped during stopping of nova-compute
service. That'w why during restart nova-compute with SIGHUP signal we
can see traceback:
2015-11-30 10:03:06.013 INFO nova.service [-] Starting compute node (version 13.0.0)
2015-11-30 10:03:06.013 DEBUG nova.virt.libvirt.host [-] Starting native event thread from (pid=17505) _init_events /opt/stack/nova/nova/virt/libvirt/host.py:452
2015-11-30 10:03:06.014 DEBUG nova.virt.libvirt.host [-] Starting green dispatch thread from (pid=17505) _init_events /opt/stack/nova/nova/virt/libvirt/host.py:458
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
listener.cb(fileno)
File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
result = function(*args, **kwargs)
File "/opt/stack/nova/nova/utils.py", line 1158, in context_wrapper
return func(*args, **kwargs)
File "/opt/stack/nova/nova/virt/libvirt/host.py", line 248, in _dispatch_thread
self._dispatch_events()
File "/opt/stack/nova/nova/virt/libvirt/host.py", line 353, in _dispatch_events
assert _c
AssertionError
Removing descriptor: 9
Started threads should be stopped during stopping of nova-compute
service
** Changed in: nova
Status: Fix Released => In Progress
** Changed in: nova
Assignee: Matt Riedemann (mriedem) => Marian Horban (mhorban)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1444630
Title:
nova-compute should stop handling virt lifecycle events when it's
shutting down
Status in OpenStack Compute (nova):
In Progress
Status in OpenStack Compute (nova) juno series:
Fix Released
Status in OpenStack Compute (nova) kilo series:
Fix Released
Bug description:
This is a follow on to bug 1293480 and related to bug 1408176 and bug
1443186.
There can be a race when rebooting a compute host where libvirt is
shutting down guest VMs and sending STOPPED lifecycle events up to
nova compute which then tries to stop them via the stop API, which
sometimes works and sometimes doesn't - the compute service can go
down with a vm_state of ACTIVE and task_state of powering-off which
isn't resolve on host reboot.
Sometimes the stop API completes and the instance is stopped with
power_state=4 (shutdown) in the nova database. When the host comes
back up and libvirt restarts, it starts up the guest VMs which sends
the STARTED lifecycle event and nova handles that but because the
vm_state in the nova database is STOPPED and the power_state is 1
(running) from the hypervisor, nova things it started up unexpectedly
and stops it:
http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2015.1.0rc1#n6145
So nova shuts the running guest down.
Actually the block in:
http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py?id=2015.1.0rc1#n6145
conflicts with the statement in power_state.py:
http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/power_state.py?id=2015.1.0rc1#n19
"The hypervisor is always considered the authority on the status
of a particular VM, and the power_state in the DB should be viewed as a
snapshot of the VMs's state in the (recent) past."
Anyway, that's a different issue but the point is when nova-compute is
shutting down it should stop accepting lifecycle events from the
hypervisor (virt driver code) since it can't really reliably act on
them anyway - we can leave any sync up that needs to happen in
init_host() in the compute manager.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1444630/+subscriptions
References