yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93488
[Bug 2017748] Re: OVN: ovnmeta namespaces missing during scalability test causing DHCP issues
A customer has the similar issue. Although I can't reproduce this in my local environment. I prepared debdiff for yoga.
Our support engineer pointed this out ( patch 2 ) and it makes sense to backport.
As you can see the description, it is happening intermittently with high load. the customer also faced this few times and can't reproduce even they want.
There are two commits inside the debdiff file
[PATCH 1/2] ovn-metadata: Refactor events
[PATCH 2/2] Handle creation of Port_Binding with chassis set
patch 1 is needed because of massive conflict
Above 2023.1 already has above patches.
** Tags added: sts
** Also affects: neutron (Ubuntu)
Importance: Undecided
Status: New
** Also affects: neutron (Ubuntu Focal)
Importance: Undecided
Status: New
** Also affects: neutron (Ubuntu Jammy)
Importance: Undecided
Status: New
** Changed in: neutron (Ubuntu)
Status: New => Fix Released
** Patch added: "lp2017748_focal_yoga.debdiff"
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/2017748/+attachment/5746530/+files/lp2017748_focal_yoga.debdiff
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2017748
Title:
OVN: ovnmeta namespaces missing during scalability test causing DHCP
issues
Status in neutron:
Fix Released
Status in neutron package in Ubuntu:
Fix Released
Status in neutron source package in Focal:
New
Status in neutron source package in Jammy:
New
Bug description:
Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650
During a scalability test it was noted that a few VMs where having
issues being pinged (2 out of ~5000 VMs in the test conducted). After
some investigation it was found that the VMs in question did not
receive a DHCP lease:
udhcpc: no lease, failing
FAIL
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.90. request failed
And the ovnmeta- namespaces for the networks that the VMs was booting
from were missing. Looking into the ovn-metadata-agent.log:
2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent
[-] There is no metadata port for network
9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses
configured, tearing the namespace down if needed _get_provision_params
/usr/lib/python3.9/site-
packages/neutron/agent/ovn/metadata/agent.py:495
Apparently, when the system is under stress (scalability tests) there
are some edge cases where the metadata port information has not yet
being propagated by OVN to the Southbound database and when the
PortBindingChassisEvent event is being handled and try to find either
the metadata port of the IP information on it (which is updated by
ML2/OVN during subnet creation) it can not be found and fails silently
with the error shown above.
Note that, running the same tests but with less concurrency did not
trigger this issue. So only happens when the system is overloaded.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2017748/+subscriptions
References