yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95211
[Bug 2095152] [NEW] ovs-agent: Leftover tpi/spi interfaces after VM boot/delete with trunk port(s)
Public bug reported:
We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent
when they should have been deleted already.
At the moment I only have a reproduction based on chance with wildly
varying frequency of the error symptoms:
ovs-dump() {
for bridge in $( sudo ovs-vsctl list-br )
do
for port in $( sudo ovs-vsctl list-ports $bridge )
do
echo $bridge $port
done
done | sort
}
ovs-dump > ovs-state.0
for j in $( seq 1 10 )
do
openstack network create tnet0
openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 tsubnet0
openstack port create --network tnet0 tport0
openstack network trunk create --parent-port tport0 trunk0
tport0_mac="$( openstack port show tport0 -f value -c mac_address )"
for i in $( seq 1 30 )
do
openstack network create tnet$i
openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 + $i )).0/24 tsubnet$i
openstack port create --network tnet$i --mac-address "$tport0_mac" tport$i
openstack network trunk set --subport port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0
done
openstack server create --flavor cirros256 --image cirros-0.6.3-x86_64-disk --nic port-id=tport0 tvm0 --wait
# Theoretically not needed, but still make sure we don't interrupt anything work in progress to make the repro more uniform.
while [ "$( openstack network trunk show trunk0 -f value -c status )" != "ACTIVE" ]
do
sleep 1
done
openstack server delete tvm0 --wait
openstack network trunk delete trunk0
openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | xargs -r openstack port delete
openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | xargs -r openstack net delete
done
sleep 10
ovs-dump > ovs-state.1
diff -u ovs-state.{0,1}
One example output with j=1..20 and i=1..30:
--- ovs-state.0 2025-01-16 13:31:07.881407421 +0000
+++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000
@@ -8,9 +8,27 @@
br-int qr-88029aef-01
br-int sg-73e24638-69
br-int sg-e45cf925-de
+br-int spi-1eeb4ae6-1b
+br-int spi-2093a8c2-df
+br-int spi-2d9ae883-d9
+br-int spi-3f17d563-cd
+br-int spi-9c0d9c98-d8
+br-int spi-a2dc4baf-ef
+br-int spi-af2efafa-39
+br-int spi-c14e8bc3-62
+br-int spi-c16959f8-da
+br-int spi-e90d4d84-31
br-int tap03961474-06
br-int tap3e6a6311-95
br-int tpi-1f8b5666-bf
+br-int tpi-2477b06f-5d
+br-int tpi-4421d69a-be
+br-int tpi-572a3af8-42
br-int tpi-9cf24ba1-ba
+br-int tpi-9e60cb66-5e
+br-int tpi-a533a27b-78
+br-int tpi-cddcaa7b-15
+br-int tpi-d7cd2e3e-e6
+br-int tpi-e68ca29d-4d
br-physnet0 phy-br-physnet0
br-tun patch-int
These ports are not even cleaned up by an ovs-agent restart. During the
runs I have not found ERROR messages in ovs-agent logs.
The amount of ports left behind varies wildly. I have seen cases when
more than 50% of vm start/deletes left behind one tpi port. But I have
also seen cases when I had to have ten runs (j=1..10) to see the first
leftover interface. This makes me believe there's a causal factor
present here (probably timing based) I don't understand and cannot
control yet.
I want to get back to analyse the root cause, however I hope that first
I can find a quicker and more reliable reproduction method so it becomes
easier to work with this.
devstack 2f3440dc
neutron 8cca47f2e7
** Affects: neutron
Importance: Undecided
Status: New
** Tags: ovs trunk
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2095152
Title:
ovs-agent: Leftover tpi/spi interfaces after VM boot/delete with trunk
port(s)
Status in neutron:
New
Bug description:
We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent
when they should have been deleted already.
At the moment I only have a reproduction based on chance with wildly
varying frequency of the error symptoms:
ovs-dump() {
for bridge in $( sudo ovs-vsctl list-br )
do
for port in $( sudo ovs-vsctl list-ports $bridge )
do
echo $bridge $port
done
done | sort
}
ovs-dump > ovs-state.0
for j in $( seq 1 10 )
do
openstack network create tnet0
openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 tsubnet0
openstack port create --network tnet0 tport0
openstack network trunk create --parent-port tport0 trunk0
tport0_mac="$( openstack port show tport0 -f value -c mac_address )"
for i in $( seq 1 30 )
do
openstack network create tnet$i
openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 + $i )).0/24 tsubnet$i
openstack port create --network tnet$i --mac-address "$tport0_mac" tport$i
openstack network trunk set --subport port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0
done
openstack server create --flavor cirros256 --image cirros-0.6.3-x86_64-disk --nic port-id=tport0 tvm0 --wait
# Theoretically not needed, but still make sure we don't interrupt anything work in progress to make the repro more uniform.
while [ "$( openstack network trunk show trunk0 -f value -c status )" != "ACTIVE" ]
do
sleep 1
done
openstack server delete tvm0 --wait
openstack network trunk delete trunk0
openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | xargs -r openstack port delete
openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | xargs -r openstack net delete
done
sleep 10
ovs-dump > ovs-state.1
diff -u ovs-state.{0,1}
One example output with j=1..20 and i=1..30:
--- ovs-state.0 2025-01-16 13:31:07.881407421 +0000
+++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000
@@ -8,9 +8,27 @@
br-int qr-88029aef-01
br-int sg-73e24638-69
br-int sg-e45cf925-de
+br-int spi-1eeb4ae6-1b
+br-int spi-2093a8c2-df
+br-int spi-2d9ae883-d9
+br-int spi-3f17d563-cd
+br-int spi-9c0d9c98-d8
+br-int spi-a2dc4baf-ef
+br-int spi-af2efafa-39
+br-int spi-c14e8bc3-62
+br-int spi-c16959f8-da
+br-int spi-e90d4d84-31
br-int tap03961474-06
br-int tap3e6a6311-95
br-int tpi-1f8b5666-bf
+br-int tpi-2477b06f-5d
+br-int tpi-4421d69a-be
+br-int tpi-572a3af8-42
br-int tpi-9cf24ba1-ba
+br-int tpi-9e60cb66-5e
+br-int tpi-a533a27b-78
+br-int tpi-cddcaa7b-15
+br-int tpi-d7cd2e3e-e6
+br-int tpi-e68ca29d-4d
br-physnet0 phy-br-physnet0
br-tun patch-int
These ports are not even cleaned up by an ovs-agent restart. During
the runs I have not found ERROR messages in ovs-agent logs.
The amount of ports left behind varies wildly. I have seen cases when
more than 50% of vm start/deletes left behind one tpi port. But I have
also seen cases when I had to have ten runs (j=1..10) to see the first
leftover interface. This makes me believe there's a causal factor
present here (probably timing based) I don't understand and cannot
control yet.
I want to get back to analyse the root cause, however I hope that
first I can find a quicker and more reliable reproduction method so it
becomes easier to work with this.
devstack 2f3440dc
neutron 8cca47f2e7
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2095152/+subscriptions