← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1788865] Re: neutron-openvswitch-agent interface monitor does not work if ovsdb-client generates warnings (ovs 2.10)

 

Reviewed:  https://review.openstack.org/596717
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49
Submitter: Zuul
Branch:    master

commit f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49
Author: Bernard Cafarelli <bcafarel@xxxxxxxxxx>
Date:   Mon Aug 27 14:37:15 2018 +0200

    ovsdb monitor: do not die on ovsdb-client stderr output
    
    That process may generate stderr output (ovs 2.10 with dpdk support will
    log about missing optional libraries for example), in which case the
    agent will loop forever respawning the ovsdb-client processes.
    
    AsyncProcess already handles processes exiting uncleanly, and logs
    stderr output with log_output=True (which is the case for OvsdbMonitor).
    
    As the monitors work on stdout output, disabling die_on_error is enough
    to make them work with this behaviour.
    
    Change-Id: I8f2e5b93b9c16f9b288046911b5aeb4938845233
    Closes-Bug: #1788865


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1788865

Title:
  neutron-openvswitch-agent interface monitor does not work if ovsdb-
  client generates warnings (ovs 2.10)

Status in neutron:
  Fix Released

Bug description:
  This was found while testing with ovs 2.10

  openvswitch has all drivers built-in, and unfortunately Mellanox needs extra libs, so the driver can't be initialized if you miss those libs. A visible result if the system does not have these extra libs is a warning on stderr when calling ovsdb-client.
  A typical call as done by OvsdbMonitor is [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]
  It will result in
  PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory # on stderr
  [...]
  {"data":[...]} # Proper JSON output on stdout

  But OvsdbMonitor is an AsyncProcess(die_on_error=True), so if any
  stderr ouput is found, the process is killed. With the libibverbs
  warning, that basically gives a non-working agent

  There are possible workarounds and fixes on the ovs side of course,
  but the agent should be more robust to this kind of events (stderr is
  not always fatal).

  Initial fix ideas:
  * Disable die_on_error in OvsdbMonitor, update sub-classes process_events() to filter out non JSON output. Log error lines in debug or similar (as it may be quite verbose in this ovs 2.10 warning case). This is a short-term fix, but we may miss actual errors, and slower reactions to them (until we hit timeout)
  * Update the OvsdbMonitor/AsyncProcess logic to check process return code. This allows to ignore/log in a low level stderr output and rely on process reporting success. But it is a bigger change and is still vulnerable to CLI changes
  * Use native ovsdb implementation. No more subprocess and vulnerability to CLI changes, but a bit longer-term solution.

  Original downstream bug with some additional info and workarounds:
  https://bugzilla.redhat.com/show_bug.cgi?id=1619387

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1788865/+subscriptions


References