yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #40206
[Bug 1506021] [NEW] AsyncProcess.stop() can lead to deadlock
Public bug reported:
The bug occurs when calling stop() on an AsyncProcess instance which is
running a progress generating substantial amounts of output to
stdout/stderr and that has a signal handler for some signal (SIGTERM for
example) that causes the program to exit gracefully.
Linux Pipes 101: when calling write() to some one-way pipe, if the pipe
is full of data [1], write() will block until the other end read()s from
the pipe.
AsyncProcess is using eventlet.green.subprocess to create an eventlet-
safe subprocess, using stdout=subprocess.PIPE and
stderr=subprocess.PIPE. In other words, stdout and stderr are redirected
to a one-way linux pipe to the executing AsyncProcess. When stopping the
subprocess, the current code [2] first kills the readers used to empty
stdout/stderr and only then sends the signal.
It is clear that if SIGTERM is sent to the subprocess, and if the
subprocess is generating a lot of output to stdout/stderr AFTER the
readers were killed, a deadlock is achieved: the parent process is
blocking on wait() and the subprocess is blocking on write() (waiting
for someone to read and empty the pipe).
This can be avoided by sending SIGKILL to the AsyncProcesses (this is
the code's default), but other signals such as SIGTERM, that can be
handled by the userspace code to cause the process to exit gracefully,
might trigger this deadlock. For example, I ran into this while trying
to modify existing fullstack tests to SIGTERM processes instead of
SIGKILL them, and the ovs agent got deadlocked a lot.
[1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
[2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163
** Affects: neutron
Importance: Undecided
Assignee: John Schwarz (jschwarz)
Status: New
** Changed in: neutron
Assignee: (unassigned) => John Schwarz (jschwarz)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1506021
Title:
AsyncProcess.stop() can lead to deadlock
Status in neutron:
New
Bug description:
The bug occurs when calling stop() on an AsyncProcess instance which
is running a progress generating substantial amounts of output to
stdout/stderr and that has a signal handler for some signal (SIGTERM
for example) that causes the program to exit gracefully.
Linux Pipes 101: when calling write() to some one-way pipe, if the
pipe is full of data [1], write() will block until the other end
read()s from the pipe.
AsyncProcess is using eventlet.green.subprocess to create an eventlet-
safe subprocess, using stdout=subprocess.PIPE and
stderr=subprocess.PIPE. In other words, stdout and stderr are
redirected to a one-way linux pipe to the executing AsyncProcess. When
stopping the subprocess, the current code [2] first kills the readers
used to empty stdout/stderr and only then sends the signal.
It is clear that if SIGTERM is sent to the subprocess, and if the
subprocess is generating a lot of output to stdout/stderr AFTER the
readers were killed, a deadlock is achieved: the parent process is
blocking on wait() and the subprocess is blocking on write() (waiting
for someone to read and empty the pipe).
This can be avoided by sending SIGKILL to the AsyncProcesses (this is
the code's default), but other signals such as SIGTERM, that can be
handled by the userspace code to cause the process to exit gracefully,
might trigger this deadlock. For example, I ran into this while trying
to modify existing fullstack tests to SIGTERM processes instead of
SIGKILL them, and the ovs agent got deadlocked a lot.
[1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
[2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1506021/+subscriptions
Follow ups