← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1506021] [NEW] AsyncProcess.stop() can lead to deadlock

 

Public bug reported:

The bug occurs when calling stop() on an AsyncProcess instance which is
running a progress generating substantial amounts of output to
stdout/stderr and that has a signal handler for some signal (SIGTERM for
example) that causes the program to exit gracefully.

Linux Pipes 101: when calling write() to some one-way pipe, if the pipe
is full of data [1], write() will block until the other end read()s from
the pipe.

AsyncProcess is using eventlet.green.subprocess to create an eventlet-
safe subprocess, using stdout=subprocess.PIPE and
stderr=subprocess.PIPE. In other words, stdout and stderr are redirected
to a one-way linux pipe to the executing AsyncProcess. When stopping the
subprocess, the current code [2] first kills the readers used to empty
stdout/stderr and only then sends the signal.

It is clear that if SIGTERM is sent to the subprocess, and if the
subprocess is generating a lot of output to stdout/stderr AFTER the
readers were killed, a deadlock is achieved: the parent process is
blocking on wait() and the subprocess is blocking on write() (waiting
for someone to read and empty the pipe).

This can be avoided by sending SIGKILL to the AsyncProcesses (this is
the code's default), but other signals such as SIGTERM, that can be
handled by the userspace code to cause the process to exit gracefully,
might trigger this deadlock. For example, I ran into this while trying
to modify existing fullstack tests to SIGTERM processes instead of
SIGKILL them, and the ovs agent got deadlocked a lot.

[1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
[2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163

** Affects: neutron
     Importance: Undecided
     Assignee: John Schwarz (jschwarz)
         Status: New

** Changed in: neutron
     Assignee: (unassigned) => John Schwarz (jschwarz)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1506021

Title:
  AsyncProcess.stop() can lead to deadlock

Status in neutron:
  New

Bug description:
  The bug occurs when calling stop() on an AsyncProcess instance which
  is running a progress generating substantial amounts of output to
  stdout/stderr and that has a signal handler for some signal (SIGTERM
  for example) that causes the program to exit gracefully.

  Linux Pipes 101: when calling write() to some one-way pipe, if the
  pipe is full of data [1], write() will block until the other end
  read()s from the pipe.

  AsyncProcess is using eventlet.green.subprocess to create an eventlet-
  safe subprocess, using stdout=subprocess.PIPE and
  stderr=subprocess.PIPE. In other words, stdout and stderr are
  redirected to a one-way linux pipe to the executing AsyncProcess. When
  stopping the subprocess, the current code [2] first kills the readers
  used to empty stdout/stderr and only then sends the signal.

  It is clear that if SIGTERM is sent to the subprocess, and if the
  subprocess is generating a lot of output to stdout/stderr AFTER the
  readers were killed, a deadlock is achieved: the parent process is
  blocking on wait() and the subprocess is blocking on write() (waiting
  for someone to read and empty the pipe).

  This can be avoided by sending SIGKILL to the AsyncProcesses (this is
  the code's default), but other signals such as SIGTERM, that can be
  handled by the userspace code to cause the process to exit gracefully,
  might trigger this deadlock. For example, I ran into this while trying
  to modify existing fullstack tests to SIGTERM processes instead of
  SIGKILL them, and the ovs agent got deadlocked a lot.

  [1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
  [2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1506021/+subscriptions


Follow ups