sts-sponsors team mailing list archive

Thread
Date
[Bug 1990978] Re: Raft bug: OVSDB leadership transfers every 10-20 min after initial compaction

To: sts-sponsors@xxxxxxxxxxxxxxxxxxx
From: Stefan Lupsa <1990978@xxxxxxxxxxxxxxxxxx>
Date: Fri, 27 Jan 2023 11:57:39 -0000
Reply-to: Bug 1990978 <1990978@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Hello, as mentioned above, this is also affecting focal-xena.

The commit addressing the issue (https://github.com/openvswitch/ovs/commit/a32a4e1fa2d3fad284834d4b7bccc2e71d33f9da) has been backported to 2.16 in ovs repo and is available in v2.16.5 release tag, however; the latest cloud archive for latest kolla xena image 
openvswitch-switch:
  Installed: 2.16.4-0ubuntu1~cloud0
  Candidate: 2.16.4-0ubuntu1~cloud0
  Version table:
 *** 2.16.4-0ubuntu1~cloud0 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/xena/main amd64 Packages
        100 /var/lib/dpkg/status

-- 
You received this bug notification because you are a member of SE SRU
("STS") Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1990978

Title:
  Raft bug: OVSDB leadership transfers every 10-20 min after initial
  compaction

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  Fix Released
Status in openvswitch package in Ubuntu:
  Confirmed
Status in openvswitch source package in Focal:
  Fix Released
Status in openvswitch source package in Jammy:
  Fix Released
Status in openvswitch source package in Kinetic:
  Fix Released

Bug description:
  First compaction starts after 24 hours, or earlier after doubling of
  DB size.

  Subsequent compactions will trigger every 10-20 min.

  The OVS version hitting this issue:
  ovs-vsctl (Open vSwitch) 2.17.2

  Commit ID that fixes the issue is:
  https://github.com/openvswitch/ovs/commit/a32a4e1fa2d3fad284834d4b7bccc2e71d33f9da

  https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d7781
  (2.17 branch patch)

  Reproducer:
  Trigger compactions by using command line tool:
  ovs-appctl -t /var/run/ovn/ovnsb_db.ctl ovsdb-server/compact
  or by creating DB pressure, i.e.:
  #!/bin/bash
  for i in {1..5000}
  do
  ovn-nbctl ls-add sw$i
  if [[ $? -ne 0 ]] ; then
      echo "Failed on ls-add i: $i"
      exit 1
  fi
          for j in {1..2000}
          do
                  echo "Iteration i: $i and j:$j"
                  ovn-nbctl lsp-add sw$i sw$i$j
                  if [[ $? -ne 0 ]] ; then
                      echo "Failed on lsp-add i: $i and j: $j"
                      exit 1
                  fi
          done
  done
  for i in {1..5000}
  do
          echo "Delete iteration i: $i"
          ovn-nbctl ls-del sw$i
          if [[ $? -ne 0 ]] ; then
              echo "Failed on ls-del i: $i"
              exit 1
          fi
  done

  Check for leadership transfers using:
  sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log
  There should be a new entry every 10-20min.

  === Ubuntu SRU Details ===

  [Impact]
  Please see above

  [Test Case]
  * deploy Openstack Yoga
  * connect to the NB DB leader and run the script to generate DB pressure. Compaction will occur after the DB doubles its size
  * check for subsequent transfers after one hour using the following script:
  sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log

  [Where things could go wrong]
  Regression is not expected since it reduces the frequency of transfers.
  The fix has also been applied upstream https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d778 however a new version has not been released.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1990978/+subscriptions