yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71677
[Bug 1755981] [NEW] powering off and on an instance can result in instance boot failure due to serial port handling race
Public bug reported:
The following is specific to the libvirt driver.
When we call power_off() it calls _destroy(), which in turn calls
self._get_serial_ports_from_guest() and loops over all the serial ports
calling serial_console.release_port() on each. This removes the host
TCP port from ALLOCATED_PORTS (which is the set of allocated ports on
the host).
Then when we call power_on(), it again calls _destroy(), which again
calls self._get_serial_ports_from_guest(). This will return the same
set of ports that it did before. This is a problem, because those ports
could have been allocated to another instance in the meantime!
So in the case where one or more of those ports had been allocated to
another instance, we call serial_console.release_port() on them, and
remove them from ALLOCATED_PORTS.
Then as part of power_on() we will create new XML with new serial ports,
which could select the ports that we just removed from ALLOCATED_PORTS
(which are actually in use by another instance). When qemu tries to
bind to this port it will fail, causing the instance to error out and
stay in the SHUTOFF state.
One possible solution would be to call guest.detach_device() on the
"serial" and "console" devices from the guest in the power_off()
routine. That way when we call _destroy() in the power_on() routine
there wouldn't be any devices returned by
_get_serial_ports_from_guest(). This is a bit messy though, so if
anyone has any better ideas I'd like to hear about it.
** Affects: nova
Importance: Undecided
Status: New
** Tags: compute libvirt
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1755981
Title:
powering off and on an instance can result in instance boot failure
due to serial port handling race
Status in OpenStack Compute (nova):
New
Bug description:
The following is specific to the libvirt driver.
When we call power_off() it calls _destroy(), which in turn calls
self._get_serial_ports_from_guest() and loops over all the serial
ports calling serial_console.release_port() on each. This removes the
host TCP port from ALLOCATED_PORTS (which is the set of allocated
ports on the host).
Then when we call power_on(), it again calls _destroy(), which again
calls self._get_serial_ports_from_guest(). This will return the same
set of ports that it did before. This is a problem, because those
ports could have been allocated to another instance in the meantime!
So in the case where one or more of those ports had been allocated to
another instance, we call serial_console.release_port() on them, and
remove them from ALLOCATED_PORTS.
Then as part of power_on() we will create new XML with new serial
ports, which could select the ports that we just removed from
ALLOCATED_PORTS (which are actually in use by another instance). When
qemu tries to bind to this port it will fail, causing the instance to
error out and stay in the SHUTOFF state.
One possible solution would be to call guest.detach_device() on the
"serial" and "console" devices from the guest in the power_off()
routine. That way when we call _destroy() in the power_on() routine
there wouldn't be any devices returned by
_get_serial_ports_from_guest(). This is a bit messy though, so if
anyone has any better ideas I'd like to hear about it.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1755981/+subscriptions