yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #57787
[Bug 1633306] Re: Partial HA network causing HA router creation failed (race conditon)
Adding a new configuration option is almost never temporary as deleting
config options is rarely backward-compatible.
The race condition, as I understand it, is as following:
1. Create HA router, have worker1 send 'router_updated' to agent1.
2. Delete HA router (done by worker2). worker2 will now detect that there are no more HA routers and will delete the HA network for the tenant.
3. agent1 issues a 'sync_router', which triggers auto_schedule_routers. create_ha_port_and_bind will try to create the HA port but there are no more IP addresses available, causing add_ha_port to fail as specified in the first paste.
Point #3 is a bit weird to me, as it looks like IPAM is detecting a
"network deleted during function run" as "no more IP addresses". In
addition, this should be caught by [2], forcing a silent retrigger of
this issue.
Aside from the issue that isn't clear to me, I'd like to point out that
the latest stable/mitaka [1] doesn't even trigger auto_schedule_routers
on sync_router (not since [3] - perhaps you're missing this backport?) -
hence the trace received in the first paste can't be reproduced. For
this reason, I'm closing this as Invalid. Liu, feel free to reopen if
you disagree with my assessment :)
[1]: https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/api/rpc/handlers/l3_rpc.py#L79
[2]: https://github.com/openstack/neutron/blob/5860fb21e966ab8f1e011654dd477d7af35f7a27/neutron/common/utils.py#L726
[3]: https://github.com/openstack/neutron/commit/33650bf1d1994a96eff993af0bfdaa62588f08a4
(5860fb21e966ab8f1e011654dd477d7af35f7a27 is the latest stable/mitaka
hash that github.com provided.)
** Changed in: neutron
Importance: High => Undecided
** Changed in: neutron
Status: Confirmed => Invalid
** Changed in: neutron
Milestone: ocata-1 => None
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1633306
Title:
Partial HA network causing HA router creation failed (race conditon)
Status in neutron:
Invalid
Bug description:
ENV: stable/mitaka,VXLAN
Neutron API: two neutron-servers behind a HA proxy VIP.
Exception log:
[1] http://paste.openstack.org/show/585669/
[2] http://paste.openstack.org/show/585670/
Log [1] shows that the subnet of HA network is concurrently deleted
while a new HA router create API comes. Seems the race conditon
described in this bug is till exists :
https://bugs.launchpad.net/neutron/+bug/1533440, where has description
said:
"""
Some known exceptions:
...
2. IpAddressGenerationFailure: (HA port created failed due to the
concurrently HA subnet deletion)
...
"""
Log [2] has a very strange behavior that those 3 APIs have a same
request-id [req-780b1f6e-2b3c-4303-a1de-a5fb4c7ea31e].
Test scenario:
Just create one HA router for a tenant, and then quickly delete it.
For now, our mitaka ENV use VxLAN as tenant network type. So there is a very large range of VNI.
So don't save that, and temporarily solution, we add a new config to decide whether delete the HA network every time.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1633306/+subscriptions
References