← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1798472] Re: Fullstack tests fails because process is not killed properly

 

Reviewed:  https://review.openstack.org/618024
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9b23abbdb68f7e0c80c305ec1874281f6dea7e9e
Submitter: Zuul
Branch:    master

commit 9b23abbdb68f7e0c80c305ec1874281f6dea7e9e
Author: Slawek Kaplonski <skaplons@xxxxxxxxxx>
Date:   Wed Nov 14 21:31:04 2018 +0100

    Add kill_timeout to AsyncProcess
    
    AsyncProcess.stop() method has now additional parameter
    kill_timeout. If this is set to some value different than
    None, eventlet.green.subprocess.Popen.wait() will be called
    with this timeout, so TimeoutExpired exception will be raised
    in case if process will not be killed for this "kill_timeout"
    time.
    In such case process will be killed "again" with SIGKILL signal
    to make sure that it is gone.
    
    This should fix problem with failing fullstack tests, when
    ovs_agent process is sometimes not killed and test timeout was
    reached in this wait() method.
    
    Change-Id: I1e12255e5e142c395adf4e67be9d9da0f7a3d4fd
    Closes-Bug: #1798472


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1798472

Title:
  Fullstack tests fails because process is not killed properly

Status in neutron:
  Fix Released

Bug description:
  Fullstack tests are failing quite often recently. There are different
  tests failed in CI runs but it looks that the culprit each time is the
  same. Some of processes spawned during the test is not killed
  properly, hangs and test got timeout exception.

  Examples:
  http://logs.openstack.org/97/602497/5/check/neutron-fullstack/f110a1f/logs/testr_results.html.gz

  http://logs.openstack.org/68/564668/7/check/neutron-fullstack-
  python36/c4223c2/logs/testr_results.html.gz

  In second example it looks that some process wasn't exited properly: http://logs.openstack.org/68/564668/7/check/neutron-fullstack-python36/c4223c2/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetwork.test_connectivity_GRE-l2pop-arp_responder,openflow-native_.txt.gz#_2018-10-16_02_43_49_755
  and in this example it looks that it is openvswitch-agent: http://logs.openstack.org/68/564668/7/check/neutron-fullstack-python36/c4223c2/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetwork.test_connectivity_GRE-l2pop-arp_responder,openflow-native_/neutron-openvswitch-agent--2018-10-16--02-42-43-987526.txt.gz

  Looking at logs of this ovs agent it looks that there is no log like
  "Agent caught SIGTERM, quitting daemon loop." at the end

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1798472/+subscriptions


References