yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81595
[Bug 1863110] [NEW] 2/3 snat namespace transitions to master
Public bug reported:
neutron version: 14.0.2
general deployment version: stein
deployment method: kolla-ansible
neutron configuration:
- l3 = ha
- agent_mode = dvr_snat
- ovs
general info: multi node deployment, ca ~100 computes
when spawning larger heat stacks with multiple instances (think k8s
infrastructure) sometimes (roughly 50%) we get a "split brain" on snat
namespaces.
logs looks like this on one of the three controller/network nodes.
11:53:43.402 Handling notification for router 2a218a31-2ef6-406a-a719-17965600e182, state master 11:53:43.403 enqueue /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
Router 2a218a31-2ef6-406a-a719-17965600e182 transitioned to master
and then this happens on another of the three controller/network nodes.
11:53:57.582 Handling notification for router 2a218a31-2ef6-406a-a719-17965600e182, state master enqueue /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
11:53:57.583 Router 2a218a31-2ef6-406a-a719-17965600e182 transitioned to master
so neutron sets up all routes in both controller nodes and wrecks havoc on session that instances are creating to the outside. obviously deleting the routes from the faulty namespace solves the issue.
i can't really find the reason for it being promoted to master even when looking through the debug logs. would greatly appreciate any helpful pointers.
the only thing i can think of is some kind of race condition happening and therefor everything in neutron looks fine.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1863110
Title:
2/3 snat namespace transitions to master
Status in neutron:
New
Bug description:
neutron version: 14.0.2
general deployment version: stein
deployment method: kolla-ansible
neutron configuration:
- l3 = ha
- agent_mode = dvr_snat
- ovs
general info: multi node deployment, ca ~100 computes
when spawning larger heat stacks with multiple instances (think k8s
infrastructure) sometimes (roughly 50%) we get a "split brain" on snat
namespaces.
logs looks like this on one of the three controller/network nodes.
11:53:43.402 Handling notification for router 2a218a31-2ef6-406a-a719-17965600e182, state master 11:53:43.403 enqueue /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
Router 2a218a31-2ef6-406a-a719-17965600e182 transitioned to master
and then this happens on another of the three controller/network
nodes.
11:53:57.582 Handling notification for router 2a218a31-2ef6-406a-a719-17965600e182, state master enqueue /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
11:53:57.583 Router 2a218a31-2ef6-406a-a719-17965600e182 transitioned to master
so neutron sets up all routes in both controller nodes and wrecks havoc on session that instances are creating to the outside. obviously deleting the routes from the faulty namespace solves the issue.
i can't really find the reason for it being promoted to master even when looking through the debug logs. would greatly appreciate any helpful pointers.
the only thing i can think of is some kind of race condition happening and therefor everything in neutron looks fine.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1863110/+subscriptions