yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #70002
[Bug 1738768] [NEW] Dataplane downtime when containers are stopped/restarted
Public bug reported:
I have deployed a 3 controllers - 3 computes HA environment with ML2/OVS
and observed dataplane downtime when restarting/stopping neutron-l3
container on controllers. This is what I did:
1. Created a network, subnet, router, a VM and attached a FIP to the VM
2. Left a ping running on the undercloud to the FIP
3. Stopped l3 container in controller-0.
Result: Observed some packet loss while the router was failed over to controller-1
4. Stopped l3 container in controller-1
Result: Observed some packet loss while the router was failed over to controller-2
5. Stopped l3 container in controller-2
Result: No traffic to/from the FIP at all.
(overcloud) [stack@undercloud ~]$ ping 10.0.0.131
PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data.
64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=1.83 ms
64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=1.56 ms
<---- Last l3 container was stopped here (step 5 above)---->
>From 10.0.0.1 icmp_seq=10 Destination Host Unreachable
>From 10.0.0.1 icmp_seq=11 Destination Host Unreachable
When containers are stopped, I guess that the qrouter namespace is not
accessible by the kernel:
[heat-admin@overcloud-controller-2 ~]$ sudo ip netns e qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" failed: Invalid argument
This means that not only we're getting controlplane downtime but also dataplane which could be seen as a regression when compared to non-containerized environments.
The same would happen with DHCP and I expect instances not being able to fetch IP addresses from dnsmasq when dhcp containers are stopped.
** Affects: neutron
Importance: Undecided
Status: New
** Description changed:
I have deployed a 3 controllers - 3 computes HA environment with ML2/OVS
and observed dataplane downtime when restarting/stopping neutron-l3
container on controllers. This is what I did:
- 1. Created a network, subnet, router, a VM and attached a FIP to the VIM
+ 1. Created a network, subnet, router, a VM and attached a FIP to the VM
2. Left a ping running on the undercloud to the FIP
3. Stopped l3 container in controller-0.
- Result: Observed some packet loss while the router was failed over to controller-1
+ Result: Observed some packet loss while the router was failed over to controller-1
4. Stopped l3 container in controller-1
- Result: Observed some packet loss while the router was failed over to controller-2
+ Result: Observed some packet loss while the router was failed over to controller-2
5. Stopped l3 container in controller-2
- Result: No traffic to/from the FIP at all.
+ Result: No traffic to/from the FIP at all.
-
- (overcloud) [stack@undercloud ~]$ ping 10.0.0.131
+ (overcloud) [stack@undercloud ~]$ ping 10.0.0.131
PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data.
64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=1.83 ms
64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=1.56 ms
- <---- Last l3 container was stopped here (step 5) in the above description ---->
-
+ <---- Last l3 container was stopped here (step 5 above)---->
+
From 10.0.0.1 icmp_seq=10 Destination Host Unreachable
From 10.0.0.1 icmp_seq=11 Destination Host Unreachable
-
- When containers are stopped, I guess that the qrouter namespace is not accessible by the kernel:
+ When containers are stopped, I guess that the qrouter namespace is not
+ accessible by the kernel:
[heat-admin@overcloud-controller-2 ~]$ sudo ip netns e qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" failed: Invalid argument
This means that not only we're getting controlplane downtime but also dataplane which could be seen as a regression when compared to non-containerized environments.
The same would happen with DHCP and I expect instances not being able to fetch IP addresses from dnsmasq when dhcp containers are stopped.
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1738768
Title:
Dataplane downtime when containers are stopped/restarted
Status in neutron:
New
Bug description:
I have deployed a 3 controllers - 3 computes HA environment with
ML2/OVS and observed dataplane downtime when restarting/stopping
neutron-l3 container on controllers. This is what I did:
1. Created a network, subnet, router, a VM and attached a FIP to the VM
2. Left a ping running on the undercloud to the FIP
3. Stopped l3 container in controller-0.
Result: Observed some packet loss while the router was failed over to controller-1
4. Stopped l3 container in controller-1
Result: Observed some packet loss while the router was failed over to controller-2
5. Stopped l3 container in controller-2
Result: No traffic to/from the FIP at all.
(overcloud) [stack@undercloud ~]$ ping 10.0.0.131
PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data.
64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=1.83 ms
64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=1.56 ms
<---- Last l3 container was stopped here (step 5 above)---->
From 10.0.0.1 icmp_seq=10 Destination Host Unreachable
From 10.0.0.1 icmp_seq=11 Destination Host Unreachable
When containers are stopped, I guess that the qrouter namespace is not
accessible by the kernel:
[heat-admin@overcloud-controller-2 ~]$ sudo ip netns e qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" failed: Invalid argument
This means that not only we're getting controlplane downtime but also dataplane which could be seen as a regression when compared to non-containerized environments.
The same would happen with DHCP and I expect instances not being able to fetch IP addresses from dnsmasq when dhcp containers are stopped.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1738768/+subscriptions
Follow ups