yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79191
[Bug 1823314] Re: ha router sometime goes in standby mode in all controllers
** Changed in: neutron
Status: Confirmed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1823314
Title:
ha router sometime goes in standby mode in all controllers
Status in neutron:
Fix Released
Bug description:
Sometimes when 2 HA routers are created for same tenant in very short
time, it may happen that both routers will have same vr_id assigned
thus it will be same application for keepalived and only one of those
routers will be active on some hosts.
When I spotted it it looked like:
[stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router-2
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| 0d654b7c-da42-4847-a24f-6d1df804ca3b | controller-1.localdomain | True | :-) | standby |
| 242e1e81-7e4e-466e-8354-a9c46982ff88 | controller-0.localdomain | True | :-) | active |
| 3d241b02-031a-4623-a179-88e1953b3889 | controller-2.localdomain | True | :-) | standby |
+--------------------------------------+--------------------------+----------------+-------+----------+
[stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router-1
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| 3d241b02-031a-4623-a179-88e1953b3889 | controller-2.localdomain | True | :-) | standby |
| 0d654b7c-da42-4847-a24f-6d1df804ca3b | controller-1.localdomain | True | :-) | standby |
| 242e1e81-7e4e-466e-8354-a9c46982ff88 | controller-0.localdomain | True | :-) | standby |
+--------------------------------------+--------------------------+----------------+-------+----------+
And in db it looks like:
MariaDB [ovs_neutron]> select * from router_extra_attributes;
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| router_id | distributed | service_router | ha | ha_vr_id | availability_zone_hints |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
| 6ba430d7-2f9d-4e8e-a59f-4d4fb5644a8e | 0 | 0 | 1 | 1 | [] |
| ace64e85-5f3b-4815-aeae-3b54c75ef5eb | 0 | 0 | 1 | 1 | [] |
| cd6b61e1-60c9-47da-8866-169ca29ece20 | 1 | 0 | 0 | 0 | [] |
+--------------------------------------+-------------+----------------+----+----------+-------------------------+
3 rows in set (0.01 sec)
MariaDB [ovs_neutron]> select * from ha_router_vrid_allocations;
+--------------------------------------+-------+
| network_id | vr_id |
+--------------------------------------+-------+
| 45aaae94-ce16-412d-bd74-b3812b16ff6f | 1 |
+--------------------------------------+-------+
1 row in set (0.01 sec)
So indeed there is possible race during such creation of 2 different
routers in very short time.
But when I then created another router, it was created properly with
new vr_id and all worked fine for it:
[stack@undercloud-0 ~]$ neutron l3-agent-list-hosting-router router-3
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| 0d654b7c-da42-4847-a24f-6d1df804ca3b | controller-1.localdomain | True | :-) | standby |
| 242e1e81-7e4e-466e-8354-a9c46982ff88 | controller-0.localdomain | True | :-) | active |
| 3d241b02-031a-4623-a179-88e1953b3889 | controller-2.localdomain | True | :-) | standby |
+--------------------------------------+--------------------------+----------------+-------+----------+
MariaDB [ovs_neutron]> select * from ha_router_vrid_allocations;
+--------------------------------------+-------+
| network_id | vr_id |
+--------------------------------------+-------+
| 45aaae94-ce16-412d-bd74-b3812b16ff6f | 1 |
| 45aaae94-ce16-412d-bd74-b3812b16ff6f | 2 |
+--------------------------------------+-------+
I found this bug on old version based on Newton release but from what I saw in https://github.com/openstack/neutron/blob/master/neutron/db/l3_hamode_db.py#L109 this code didn't change a lot so I think that the same issue may happen also on newer releases.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1823314/+subscriptions
References