yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71239
[Bug 1750450] [NEW] ironic: n-cpu fails to recover after losing connection to ironic-api and placement-api
Public bug reported:
The ironic virt driver does some crazy things when the ironic API goes
down - it returns [] from get_available_nodes(). When the resource
tracker sees this, it immediately attempts to delete all of the compute
node records and resource providers for said nodes.
If placement is also down at this time, the resource providers will not
be properly deleted.
When ironic-api and placement-api return, nova will see nodes, create
compute_node records for them, and try to create new resource providers
(as they are new compute_node records). This will fail with a name
conflict, and the nodes will be unusable.
This is easy to fix, by raising an exception in get_available_nodes,
instead of lying to the resource tracker and returning []. However, this
causes nova-compute to fail to start if ironic-api is not available.
This may be fine but should have a larger discussion. We've added these
hacks over the years for some reason, we should look at the bigger
picture and decide how we want to handle these cases.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1750450
Title:
ironic: n-cpu fails to recover after losing connection to ironic-api
and placement-api
Status in OpenStack Compute (nova):
New
Bug description:
The ironic virt driver does some crazy things when the ironic API goes
down - it returns [] from get_available_nodes(). When the resource
tracker sees this, it immediately attempts to delete all of the
compute node records and resource providers for said nodes.
If placement is also down at this time, the resource providers will
not be properly deleted.
When ironic-api and placement-api return, nova will see nodes, create
compute_node records for them, and try to create new resource
providers (as they are new compute_node records). This will fail with
a name conflict, and the nodes will be unusable.
This is easy to fix, by raising an exception in get_available_nodes,
instead of lying to the resource tracker and returning []. However,
this causes nova-compute to fail to start if ironic-api is not
available.
This may be fine but should have a larger discussion. We've added
these hacks over the years for some reason, we should look at the
bigger picture and decide how we want to handle these cases.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1750450/+subscriptions
Follow ups