← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1788865] [NEW] neutron-openvswitch-agent interface monitor does not work if ovsdb-client generates warnings (ovs 2.10)

 

Public bug reported:

This was found while testing with ovs 2.10

openvswitch has all drivers built-in, and unfortunately Mellanox needs extra libs, so the driver can't be initialized if you miss those libs. A visible result if the system does not have these extra libs is a warning on stderr when calling ovsdb-client.
A typical call as done by OvsdbMonitor is [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]
It will result in
PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory # on stderr
[...]
{"data":[...]} # Proper JSON output on stdout

But OvsdbMonitor is an AsyncProcess(die_on_error=True), so if any stderr
ouput is found, the process is killed. With the libibverbs warning, that
basically gives a non-working agent

There are possible workarounds and fixes on the ovs side of course, but
the agent should be more robust to this kind of events (stderr is not
always fatal).

Initial fix ideas:
* Disable die_on_error in OvsdbMonitor, update sub-classes process_events() to filter out non JSON output. Log error lines in debug or similar (as it may be quite verbose in this ovs 2.10 warning case). This is a short-term fix, but we may miss actual errors, and slower reactions to them (until we hit timeout)
* Update the OvsdbMonitor/AsyncProcess logic to check process return code. This allows to ignore/log in a low level stderr output and rely on process reporting success. But it is a bigger change and is still vulnerable to CLI changes
* Use native ovsdb implementation. No more subprocess and vulnerability to CLI changes, but a bit longer-term solution.

Original downstream bug with some additional info and workarounds:
https://bugzilla.redhat.com/show_bug.cgi?id=1619387

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1788865

Title:
  neutron-openvswitch-agent interface monitor does not work if ovsdb-
  client generates warnings (ovs 2.10)

Status in neutron:
  New

Bug description:
  This was found while testing with ovs 2.10

  openvswitch has all drivers built-in, and unfortunately Mellanox needs extra libs, so the driver can't be initialized if you miss those libs. A visible result if the system does not have these extra libs is a warning on stderr when calling ovsdb-client.
  A typical call as done by OvsdbMonitor is [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]
  It will result in
  PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory # on stderr
  [...]
  {"data":[...]} # Proper JSON output on stdout

  But OvsdbMonitor is an AsyncProcess(die_on_error=True), so if any
  stderr ouput is found, the process is killed. With the libibverbs
  warning, that basically gives a non-working agent

  There are possible workarounds and fixes on the ovs side of course,
  but the agent should be more robust to this kind of events (stderr is
  not always fatal).

  Initial fix ideas:
  * Disable die_on_error in OvsdbMonitor, update sub-classes process_events() to filter out non JSON output. Log error lines in debug or similar (as it may be quite verbose in this ovs 2.10 warning case). This is a short-term fix, but we may miss actual errors, and slower reactions to them (until we hit timeout)
  * Update the OvsdbMonitor/AsyncProcess logic to check process return code. This allows to ignore/log in a low level stderr output and rely on process reporting success. But it is a bigger change and is still vulnerable to CLI changes
  * Use native ovsdb implementation. No more subprocess and vulnerability to CLI changes, but a bit longer-term solution.

  Original downstream bug with some additional info and workarounds:
  https://bugzilla.redhat.com/show_bug.cgi?id=1619387

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1788865/+subscriptions


Follow ups