← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1531772] [NEW] Liberty server and Kilo security group aware agent fail to refresh firewall for DHCP and router IPv6 ports

 

Public bug reported:

When we try to mix Liberty server with Kilo L2 agent, we get the
following traceback in the agent log:

ERROR oslo_messaging.rpc.dispatcher [-] Exception during message handling: Endpoint does not support RPC version 1.3. Attempted method: security_groups_provider_updated
TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
TRACE oslo_messaging.rpc.dispatcher     executor_callback))
TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 195, in _dispatch
TRACE oslo_messaging.rpc.dispatcher     raise UnsupportedVersion(version, method=method)
TRACE oslo_messaging.rpc.dispatcher UnsupportedVersion: Endpoint does not support RPC version 1.3. Attempted method: security_groups_provider_updated

In Kilo, server just dropped a bare notification about some change, and
the firewall was reset for all devices; in Liberty, it now passes the
list of devices to refresh, so that firewall setup on security group
change is more optimized.

Missing the notification could mean any kind of issues that will all go
back to ‘my firewall is not updated after security group change’. For
what I see in the code, it would affect DHCP and router IPv6 ports only.

Now, since the signature of the RPC call was changed (adding the list of
devices), the server requires version = 1.3 for the agent endpoint that
would know about the new argument. If that would be a usual notification
directed specifically to the agent, we would just use call() instead of
cast() and handle UnsupportedVersion exception by calling remotely
without the device list. But since it’s fanout, we can’t do it.

The solution for the upgrade issue would probably be reverting the
optimization in Liberty. Since we don’t support spanning upgrades
through multiple cycles just yet, it should be enough.

Other alternatives do not seem to work here:
- cast()ing for both new and old signatures would effectively disable the optimization, because the same agent would receive both versions of the method, and the old one will trigger full firewall reset anyway;
- calling cast() with the new signature but without the version specified would probably make the older Kilo agent to crash in a more horrible way; (note: I need to check that locally).

Side note: it’s interesting that we have a backwards compatible code on
agent side to accommodate to older servers. I will probably kill it
since it’s not in line with usual rolling upgrade scenarios that we
support where you never run a server older than an agent in the cluster.

** Affects: neutron
     Importance: High
     Assignee: Ihar Hrachyshka (ihar-hrachyshka)
         Status: New


** Tags: liberty-backport-potential upgrade

** Changed in: neutron
   Importance: Undecided => High

** Changed in: neutron
     Assignee: (unassigned) => Ihar Hrachyshka (ihar-hrachyshka)

** Tags added: upgrade

** Tags added: liberty-backport-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1531772

Title:
  Liberty server and Kilo security group aware agent  fail to refresh
  firewall for DHCP and router IPv6 ports

Status in neutron:
  New

Bug description:
  When we try to mix Liberty server with Kilo L2 agent, we get the
  following traceback in the agent log:

  ERROR oslo_messaging.rpc.dispatcher [-] Exception during message handling: Endpoint does not support RPC version 1.3. Attempted method: security_groups_provider_updated
  TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
  TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
  TRACE oslo_messaging.rpc.dispatcher     executor_callback))
  TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 195, in _dispatch
  TRACE oslo_messaging.rpc.dispatcher     raise UnsupportedVersion(version, method=method)
  TRACE oslo_messaging.rpc.dispatcher UnsupportedVersion: Endpoint does not support RPC version 1.3. Attempted method: security_groups_provider_updated

  In Kilo, server just dropped a bare notification about some change,
  and the firewall was reset for all devices; in Liberty, it now passes
  the list of devices to refresh, so that firewall setup on security
  group change is more optimized.

  Missing the notification could mean any kind of issues that will all
  go back to ‘my firewall is not updated after security group change’.
  For what I see in the code, it would affect DHCP and router IPv6 ports
  only.

  Now, since the signature of the RPC call was changed (adding the list
  of devices), the server requires version = 1.3 for the agent endpoint
  that would know about the new argument. If that would be a usual
  notification directed specifically to the agent, we would just use
  call() instead of cast() and handle UnsupportedVersion exception by
  calling remotely without the device list. But since it’s fanout, we
  can’t do it.

  The solution for the upgrade issue would probably be reverting the
  optimization in Liberty. Since we don’t support spanning upgrades
  through multiple cycles just yet, it should be enough.

  Other alternatives do not seem to work here:
  - cast()ing for both new and old signatures would effectively disable the optimization, because the same agent would receive both versions of the method, and the old one will trigger full firewall reset anyway;
  - calling cast() with the new signature but without the version specified would probably make the older Kilo agent to crash in a more horrible way; (note: I need to check that locally).

  Side note: it’s interesting that we have a backwards compatible code
  on agent side to accommodate to older servers. I will probably kill it
  since it’s not in line with usual rolling upgrade scenarios that we
  support where you never run a server older than an agent in the
  cluster.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1531772/+subscriptions


Follow ups