yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #85195
[Bug 1916024] [NEW] HA router master instance in error state because qg-xx interface is down
Public bug reported:
BZ reference: https://bugzilla.redhat.com/show_bug.cgi?id=1929829
Sometimes a router is created with all the instances in standby mode
because the qg-xx interface is in down state and there isn't
connectivity:
(overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router1
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+---------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+---------------------------+----------------+-------+----------+
| 3b93ec23-48fa-4847-bbb2-f8903e9865f9 | networker-1.redhat.local | True | :-) | standby |
| 41b8d1a8-4695-445a-916a-d12db523eb91 | controller-0.redhat.local | True | :-) | standby |
| 4533bd88-d2d1-4320-9e39-6fcb2a5cc236 | networker-0.redhat.local | True | :-) | standby |
+--------------------------------------+---------------------------+----------------+-------+----------+
(overcloud) [stack@undercloud-0 ~]$
Steps to reproduce:
1. for i in $(seq 10); do ./create.sh $i; done
3. Check FIP connectivity to detect the error
4. for i in $(seq 10); do ./delete.sh $i; done
Scripts: http://paste.openstack.org/show/802777/
Seems to be a race condition between L3 agent and keepalived configuring qg-xxx interface:
- /var/log/messages: http://paste.openstack.org/show/802778/
- L3 agent logs: http://paste.openstack.org/show/802779/
When keepalive is setting the qg-xxx interface IP addresses, the
interface disappears from udev and reappears again (I still don't know
why yet). The log in journalctl looks the same as when a new interface
is created.
Since [1], the L3 agent controls the GW interface status (up or down).
If the L3 agent do not link up the interface, the router namespace won't
be able to send/receive any traffic.
[1]https://review.opendev.org/q/I8dca2c1a2f8cb467cfb44420f0eea54ca0932b05
** Affects: neutron
Importance: Undecided
Assignee: Rodolfo Alonso (rodolfo-alonso-hernandez)
Status: New
** Tags: l3-ha
** Changed in: neutron
Assignee: (unassigned) => Rodolfo Alonso (rodolfo-alonso-hernandez)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1916024
Title:
HA router master instance in error state because qg-xx interface is
down
Status in neutron:
New
Bug description:
BZ reference: https://bugzilla.redhat.com/show_bug.cgi?id=1929829
Sometimes a router is created with all the instances in standby mode
because the qg-xx interface is in down state and there isn't
connectivity:
(overcloud) [stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router1
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+---------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+---------------------------+----------------+-------+----------+
| 3b93ec23-48fa-4847-bbb2-f8903e9865f9 | networker-1.redhat.local | True | :-) | standby |
| 41b8d1a8-4695-445a-916a-d12db523eb91 | controller-0.redhat.local | True | :-) | standby |
| 4533bd88-d2d1-4320-9e39-6fcb2a5cc236 | networker-0.redhat.local | True | :-) | standby |
+--------------------------------------+---------------------------+----------------+-------+----------+
(overcloud) [stack@undercloud-0 ~]$
Steps to reproduce:
1. for i in $(seq 10); do ./create.sh $i; done
3. Check FIP connectivity to detect the error
4. for i in $(seq 10); do ./delete.sh $i; done
Scripts: http://paste.openstack.org/show/802777/
Seems to be a race condition between L3 agent and keepalived configuring qg-xxx interface:
- /var/log/messages: http://paste.openstack.org/show/802778/
- L3 agent logs: http://paste.openstack.org/show/802779/
When keepalive is setting the qg-xxx interface IP addresses, the
interface disappears from udev and reappears again (I still don't know
why yet). The log in journalctl looks the same as when a new interface
is created.
Since [1], the L3 agent controls the GW interface status (up or down).
If the L3 agent do not link up the interface, the router namespace
won't be able to send/receive any traffic.
[1]https://review.opendev.org/q/I8dca2c1a2f8cb467cfb44420f0eea54ca0932b05
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1916024/+subscriptions
Follow ups