sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #04879
[Bug 1990978] Re: Raft bug: OVSDB leadership transfers every 10-20 min after initial compaction
Hello, as mentioned above, this is also affecting focal-xena.
The commit addressing the issue (https://github.com/openvswitch/ovs/commit/a32a4e1fa2d3fad284834d4b7bccc2e71d33f9da) has been backported to 2.16 in ovs repo and is available in v2.16.5 release tag, however; the latest cloud archive for latest kolla xena image
openvswitch-switch:
Installed: 2.16.4-0ubuntu1~cloud0
Candidate: 2.16.4-0ubuntu1~cloud0
Version table:
*** 2.16.4-0ubuntu1~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/xena/main amd64 Packages
100 /var/lib/dpkg/status
--
You received this bug notification because you are a member of SE SRU
("STS") Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1990978
Title:
Raft bug: OVSDB leadership transfers every 10-20 min after initial
compaction
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive yoga series:
Fix Released
Status in openvswitch package in Ubuntu:
Confirmed
Status in openvswitch source package in Focal:
Fix Released
Status in openvswitch source package in Jammy:
Fix Released
Status in openvswitch source package in Kinetic:
Fix Released
Bug description:
First compaction starts after 24 hours, or earlier after doubling of
DB size.
Subsequent compactions will trigger every 10-20 min.
The OVS version hitting this issue:
ovs-vsctl (Open vSwitch) 2.17.2
Commit ID that fixes the issue is:
https://github.com/openvswitch/ovs/commit/a32a4e1fa2d3fad284834d4b7bccc2e71d33f9da
https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d7781
(2.17 branch patch)
Reproducer:
Trigger compactions by using command line tool:
ovs-appctl -t /var/run/ovn/ovnsb_db.ctl ovsdb-server/compact
or by creating DB pressure, i.e.:
#!/bin/bash
for i in {1..5000}
do
ovn-nbctl ls-add sw$i
if [[ $? -ne 0 ]] ; then
echo "Failed on ls-add i: $i"
exit 1
fi
for j in {1..2000}
do
echo "Iteration i: $i and j:$j"
ovn-nbctl lsp-add sw$i sw$i$j
if [[ $? -ne 0 ]] ; then
echo "Failed on lsp-add i: $i and j: $j"
exit 1
fi
done
done
for i in {1..5000}
do
echo "Delete iteration i: $i"
ovn-nbctl ls-del sw$i
if [[ $? -ne 0 ]] ; then
echo "Failed on ls-del i: $i"
exit 1
fi
done
Check for leadership transfers using:
sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log
There should be a new entry every 10-20min.
=== Ubuntu SRU Details ===
[Impact]
Please see above
[Test Case]
* deploy Openstack Yoga
* connect to the NB DB leader and run the script to generate DB pressure. Compaction will occur after the DB doubles its size
* check for subsequent transfers after one hour using the following script:
sudo grep "Transferring leadership" /var/log/ovn/ov* | grep ovsdb-server-sb.log
[Where things could go wrong]
Regression is not expected since it reduces the frequency of transfers.
The fix has also been applied upstream https://github.com/openvswitch/ovs/commit/dfc3e65c8191f5dc375337c23aed128b5c0d778 however a new version has not been released.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1990978/+subscriptions