yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #19793
[Bug 1364876] [NEW] Specifying both rpc_workers and api_workers make stoping neutron-server fail
Public bug reported:
Hi,
By setting both rpc_workers and api_workers to something bigger than 1,
when you try to stop the service with e.g. upstart the stop doesn't kill
all neutron-server processes, which result to failure when starting
neutron-server again.
Details:
======
neutron-server will create to openstack.common.service.ProcessLauncher
instances one for each service i.e. rpc and api, now the ProcessLauncher
wasn't meant to be instantiated more than once in a single process and
here is why:
1. Each ProcessLauncher instance register a callback to catch signals
like SIGTERM, SIGINT and SIGHUB, having two instances of ProcessLauncher
mean the signal.signal will be called twice with different callbacks,
only the last one executed will take effect.
2. Each ProcessLauncher think that he own all children processes of the
current process, for example take a look at "_wait_child" method that
will catch all killed child processes.
3. When only one ProcessLauncher instance is handling the process
termination while the other doesn't this lead to race condition between
both:
3.1. Running "stop neutron-server" will kill also children processes
too, but b/c we have 2 ProcessLauncher the one that didn't catch the
kill signal will keep respawning new children processes when it detect
that they died, the other want because self.running was set to False.
3.2. When children processes dies (i.e. stop neutron-server), one of
the ProcessLauncher will catch that with os.waitpid(0, os.WNOHANG) (both
do that), and if the death of a child process is catched by the wrong
ProcessLauncher i.e. not the one that has it in his children instance
variable, the parent process will hang forever in this loop b/c
self.children will always contain that child process:
if self.children:
LOG.info(_LI('Waiting on %d children to exit'), len(self.children))
while self.children:
self._wait_child()
Hopefully I made this clear.
Cheers,
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1364876
Title:
Specifying both rpc_workers and api_workers make stoping neutron-
server fail
Status in OpenStack Neutron (virtual network service):
New
Bug description:
Hi,
By setting both rpc_workers and api_workers to something bigger than
1, when you try to stop the service with e.g. upstart the stop doesn't
kill all neutron-server processes, which result to failure when
starting neutron-server again.
Details:
======
neutron-server will create to openstack.common.service.ProcessLauncher
instances one for each service i.e. rpc and api, now the
ProcessLauncher wasn't meant to be instantiated more than once in a
single process and here is why:
1. Each ProcessLauncher instance register a callback to catch signals
like SIGTERM, SIGINT and SIGHUB, having two instances of
ProcessLauncher mean the signal.signal will be called twice with
different callbacks, only the last one executed will take effect.
2. Each ProcessLauncher think that he own all children processes of
the current process, for example take a look at "_wait_child" method
that will catch all killed child processes.
3. When only one ProcessLauncher instance is handling the process
termination while the other doesn't this lead to race condition
between both:
3.1. Running "stop neutron-server" will kill also children
processes too, but b/c we have 2 ProcessLauncher the one that didn't
catch the kill signal will keep respawning new children processes when
it detect that they died, the other want because self.running was set
to False.
3.2. When children processes dies (i.e. stop neutron-server), one
of the ProcessLauncher will catch that with os.waitpid(0, os.WNOHANG)
(both do that), and if the death of a child process is catched by the
wrong ProcessLauncher i.e. not the one that has it in his children
instance variable, the parent process will hang forever in this loop
b/c self.children will always contain that child process:
if self.children:
LOG.info(_LI('Waiting on %d children to exit'), len(self.children))
while self.children:
self._wait_child()
Hopefully I made this clear.
Cheers,
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1364876/+subscriptions
Follow ups
References