← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1996594] [NEW] OVN metadata randomly stops working

 

Public bug reported:

We found that OVN metadata will not work randomly when OVN is writing a
snapshot.

1, At 12:30:35, OVN started to transfer leadership to write a snapshot

$ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring leadership'
sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring leadership to write a snapshot.
sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring leadership to write a snapshot.
sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring leadership to write a snapshot.

2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error

$ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error'
sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available
sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available

3, At 12:57:53, we saw the error 'No port found in network', then we
will hit the problem that OVN metadata does not work randomly

2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server [-]
No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with IP
address 100.94.98.135

After the problem occurs, restarting neutron-ovn-metadata-agent or
restarting haproxy instance as follows can be used as a workaround.

/usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec
ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f
/var/lib/neutron/ovn-metadata-
proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf

One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it should be beneficial to this problem.
But it only reduces the occurrence of problems, not completely avoiding them. I wonder if we need to add some retry logic on the neutron side

NOTE: The openstack version we are using is focal-xena, and
openvswitch's version is 2.16.0-0ubuntu2.1~cloud0

[1] https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1990978

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1996594

Title:
  OVN metadata randomly stops working

Status in neutron:
  New

Bug description:
  We found that OVN metadata will not work randomly when OVN is writing
  a snapshot.

  1, At 12:30:35, OVN started to transfer leadership to write a snapshot

  $ find sosreport-juju-2752e1-*/var/log/ovn/* |xargs zgrep -i -E 'Transferring leadership'
  sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.322Z|80962|raft|INFO|Transferring leadership to write a snapshot.
  sosreport-juju-2752e1-6-lxd-24-xxx-2022-08-18-entowko/var/log/ovn/ovsdb-server-sb.log:2022-08-18T17:52:53.024Z|82382|raft|INFO|Transferring leadership to write a snapshot.
  sosreport-juju-2752e1-7-lxd-27-xxx-2022-08-18-hhxxqci/var/log/ovn/ovsdb-server-sb.log:2022-08-18T12:30:35.330Z|92698|raft|INFO|Transferring leadership to write a snapshot.

  2, At 12:30:36, neutron-ovn-metadata-agent reported OVSDB Error

  $ find sosreport-srv1*/var/log/neutron/* |xargs zgrep -i -E 'OVSDB Error'
  sosreport-srv1xxx2d-xxx-2022-08-18-cuvkufw/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.103 75556 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available
  sosreport-srv1xxx6d-xxx-2022-08-18-bgnovqu/var/log/neutron/neutron-ovn-metadata-agent.log:2022-08-18 12:30:36.104 2171 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: no error details available

  3, At 12:57:53, we saw the error 'No port found in network', then we
  will hit the problem that OVN metadata does not work randomly

  2022-08-18 12:57:53.800 3730 ERROR neutron.agent.ovn.metadata.server
  [-] No port found in network 63e2c276-60dd-40e3-baa1-c16342eacce2 with
  IP address 100.94.98.135

  After the problem occurs, restarting neutron-ovn-metadata-agent or
  restarting haproxy instance as follows can be used as a workaround.

  /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ip netns exec
  ovnmeta-63e2c276-60dd-40e3-baa1-c16342eacce2 haproxy -f
  /var/lib/neutron/ovn-metadata-
  proxy/63e2c276-60dd-40e3-baa1-c16342eacce2.conf

  One lp bug #1990978 [1] is trying to reducing the frequency of transfers, it should be beneficial to this problem.
  But it only reduces the occurrence of problems, not completely avoiding them. I wonder if we need to add some retry logic on the neutron side

  NOTE: The openstack version we are using is focal-xena, and
  openvswitch's version is 2.16.0-0ubuntu2.1~cloud0

  [1] https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1990978

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1996594/+subscriptions



Follow ups