yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94673
[Bug 2037876] Re: DHCP Worker skips queued tasks if earlier tasks fail
Given this bug report is a year old now (public for most of that time)
and still hasn't been confirmed by the developers, it seems unlikely to
rise to the level of urgency where we'd issue an OSSA even if it did
eventually get fixed. As such, I'm closing the Security Advisory task as
Won't Fix, but if there are any dissenting opinions I'm happy to reopen
and revisit that decision.
** Changed in: ossa
Status: Incomplete => Won't Fix
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2037876
Title:
DHCP Worker skips queued tasks if earlier tasks fail
Status in neutron:
New
Status in OpenStack Security Advisory:
Won't Fix
Bug description:
Steps to reproduce:
1.Create a network with several subnets and a router.
2.Delete the router and quickly afterwards delete the subnets and finally the network.
Expected behavior:
- Subnet and networks should be deleted as expected after deleting the router.
Actual behavior:
1.Router is not deleted properly (the port is not deleted)
2.Because of the above, the subnet and network deletion tasks are dropped because of the design of the task management in DHCP agent.
RCA:
1. Router deletion failure:
a. Eventually the task port_delete_end is called from the router deletion for the port: https://github.com/openstack/neutron/blob/stable/yoga/neutron/agent/dhcp/agent.py
b. As part of the event queue, the resource __lt__ function is called to check for the IP:
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L177C1-L178C1
c. The __lt__ function fails because when a router uses the port_delete_end, the fixed_ip 'ip_address' key is not accessible.
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/dhcp/agent.py#L86
d. Since there is no error handling in the primary loop, all other tasks that were within the queue are forgotten
https://github.com/openstack/neutron/blob/cf096344b07b80524c3888e44e0b895465598a74/neutron/agent/common/resource_processing_queue.py#L156
As far as I understand, there are two problems:
1. In this commit https://github.com/openstack/neutron/commit/53000704f211bbbd5e439890015891039ef6752e the __lt__ functionality was changed but did not support the router port deleteion.
2. The primary worker loop mechanism does not support unexpected
behavior like crashes and such. Is it by design that all other tasks
will drop in this case?
Here's a small visualization: TBD
Version:
Yoga
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2037876/+subscriptions