← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1506021] Re: AsyncProcess.stop() can lead to deadlock

 

** Changed in: neutron
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1506021

Title:
  AsyncProcess.stop() can lead to deadlock

Status in neutron:
  Fix Released

Bug description:
  The bug occurs when calling stop() on an AsyncProcess instance which
  is running a progress generating substantial amounts of output to
  stdout/stderr and that has a signal handler for some signal (SIGTERM
  for example) that causes the program to exit gracefully.

  Linux Pipes 101: when calling write() to some one-way pipe, if the
  pipe is full of data [1], write() will block until the other end
  read()s from the pipe.

  AsyncProcess is using eventlet.green.subprocess to create an eventlet-
  safe subprocess, using stdout=subprocess.PIPE and
  stderr=subprocess.PIPE. In other words, stdout and stderr are
  redirected to a one-way linux pipe to the executing AsyncProcess. When
  stopping the subprocess, the current code [2] first kills the readers
  used to empty stdout/stderr and only then sends the signal.

  It is clear that if SIGTERM is sent to the subprocess, and if the
  subprocess is generating a lot of output to stdout/stderr AFTER the
  readers were killed, a deadlock is achieved: the parent process is
  blocking on wait() and the subprocess is blocking on write() (waiting
  for someone to read and empty the pipe).

  This can be avoided by sending SIGKILL to the AsyncProcesses (this is
  the code's default), but other signals such as SIGTERM, that can be
  handled by the userspace code to cause the process to exit gracefully,
  might trigger this deadlock. For example, I ran into this while trying
  to modify existing fullstack tests to SIGTERM processes instead of
  SIGKILL them, and the ovs agent got deadlocked a lot.

  [1]: http://linux.die.net/man/7/pipe (Section called "Pipe capacity")
  [2]: https://github.com/openstack/neutron/blob/stable/liberty/neutron/agent/linux/async_process.py#L163

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1506021/+subscriptions


References