yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #19407
[Bug 1360351] [NEW] FWaaS stuck in PENDING_CREATE when deploying with DVR
Public bug reported:
When Firewall is created in conjunction with a distributed router, the
firewall may or may not fail to reach the ACTIVE state as observed in
[1]. The reason for this faulty behavior is because, when the firewall
create request comes in, the firewall object's status is set
PENDING_CREATE (see [2]), then the request is send to the L3 agent,
which will work to make the state transition to ACTIVE (see [3]). This
state transition is predicated on the following statements being true:
1) The tenant has firewalls
2) The tenant has routers
3) The routers' namespaces have been created on the L3 agent
Now, in the DVR case, 3) might be true or not depending on the state of
the router or the cloud (see [4] for details). For instance, if the
router had an external gateway set, condition 3) would be True, and the
firewall state would transition to ACTIVE. This may lead the user to
believe that everything is correct when it is actually not. What makes
the matter worse is the fact that in the DVR case, the firewall needs
itself to be distributed, which means that if we kept the same logic as
outlined in [2], [3], the last L3 agent to update the state of the
firewall will overwrite any other (last write wins), leading to
potential inconsistency.
To start addressing this issue, it would be appropriate to tweak the
logic as follow:
a) When DVR is present, firewall should be created directly in CREATED state, we'll keep the logic for the centralized case as is, where the firewall is created in PENDING_CREATE state
b) When L3 agents can install the right firewall rules, the server will need to collect all the acknowledgments from the L3 agents
c) Only after all acknowledgments have been collected and they are positive, the firewall state will transition from CREATED to ACTIVE, ERROR otherwise.
In theory, step a) could be simplified by renaming PENDING_CREATE to
CREATED and leave it at that, however this would be a non-backward
compatible API change which would affect the legacy case, and should be
discouraged.
[1] - http://logs.openstack.org/91/114691/2/experimental/check-tempest-dsvm-neutron-dvr/93b2ff0/logs/testr_results.html.gz
[2] - https://github.com/openstack/neutron/blob/master/neutron/services/firewall/fwaas_plugin.py#L227
[3] - https://github.com/openstack/neutron/blob/master/neutron/services/firewall/agents/l3reference/firewall_l3_agent.py#L194
[4] - https://review.openstack.org/#/c/116100/
** Affects: neutron
Importance: Undecided
Assignee: Armando Migliaccio (armando-migliaccio)
Status: New
** Tags: l3-dvr-backlog
** Changed in: neutron
Assignee: (unassigned) => Armando Migliaccio (armando-migliaccio)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1360351
Title:
FWaaS stuck in PENDING_CREATE when deploying with DVR
Status in OpenStack Neutron (virtual network service):
New
Bug description:
When Firewall is created in conjunction with a distributed router, the
firewall may or may not fail to reach the ACTIVE state as observed in
[1]. The reason for this faulty behavior is because, when the firewall
create request comes in, the firewall object's status is set
PENDING_CREATE (see [2]), then the request is send to the L3 agent,
which will work to make the state transition to ACTIVE (see [3]). This
state transition is predicated on the following statements being true:
1) The tenant has firewalls
2) The tenant has routers
3) The routers' namespaces have been created on the L3 agent
Now, in the DVR case, 3) might be true or not depending on the state
of the router or the cloud (see [4] for details). For instance, if the
router had an external gateway set, condition 3) would be True, and
the firewall state would transition to ACTIVE. This may lead the user
to believe that everything is correct when it is actually not. What
makes the matter worse is the fact that in the DVR case, the firewall
needs itself to be distributed, which means that if we kept the same
logic as outlined in [2], [3], the last L3 agent to update the state
of the firewall will overwrite any other (last write wins), leading to
potential inconsistency.
To start addressing this issue, it would be appropriate to tweak the
logic as follow:
a) When DVR is present, firewall should be created directly in CREATED state, we'll keep the logic for the centralized case as is, where the firewall is created in PENDING_CREATE state
b) When L3 agents can install the right firewall rules, the server will need to collect all the acknowledgments from the L3 agents
c) Only after all acknowledgments have been collected and they are positive, the firewall state will transition from CREATED to ACTIVE, ERROR otherwise.
In theory, step a) could be simplified by renaming PENDING_CREATE to
CREATED and leave it at that, however this would be a non-backward
compatible API change which would affect the legacy case, and should
be discouraged.
[1] - http://logs.openstack.org/91/114691/2/experimental/check-tempest-dsvm-neutron-dvr/93b2ff0/logs/testr_results.html.gz
[2] - https://github.com/openstack/neutron/blob/master/neutron/services/firewall/fwaas_plugin.py#L227
[3] - https://github.com/openstack/neutron/blob/master/neutron/services/firewall/agents/l3reference/firewall_l3_agent.py#L194
[4] - https://review.openstack.org/#/c/116100/
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1360351/+subscriptions
Follow ups
References