yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #63531
[Bug 1685590] [NEW] No retry for removing instance in case of ironic service down
Public bug reported:
When ironic service is shortly down (e.g. ironic conductor down),
removing an instance will immediately make this instance into error
state without retry.
After investigation, it points to the code segment:
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L977-L984
When conductor is down, we will not receive the InstanceDeployFailure
exception. The exception is raised, so ironic will not apply the
configuration CONF.ironic.api_max_retries and
CONF.ironic.api_retry_interval.
Reproduce:
1. nova boot a baremetal instance.
2. reboot the ironic conductor node (or stop conductor service).
3. remove instance in spawn.
4. instance go into error state, not after 2 minutes (default value).
As a comparison, simply comments L983-984 to reproduce.
Proposed fix:
Improve the exception handling to be more robust.
** Affects: nova
Importance: Undecided
Status: New
** Tags: ironic
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1685590
Title:
No retry for removing instance in case of ironic service down
Status in OpenStack Compute (nova):
New
Bug description:
When ironic service is shortly down (e.g. ironic conductor down),
removing an instance will immediately make this instance into error
state without retry.
After investigation, it points to the code segment:
https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L977-L984
When conductor is down, we will not receive the InstanceDeployFailure
exception. The exception is raised, so ironic will not apply the
configuration CONF.ironic.api_max_retries and
CONF.ironic.api_retry_interval.
Reproduce:
1. nova boot a baremetal instance.
2. reboot the ironic conductor node (or stop conductor service).
3. remove instance in spawn.
4. instance go into error state, not after 2 minutes (default value).
As a comparison, simply comments L983-984 to reproduce.
Proposed fix:
Improve the exception handling to be more robust.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1685590/+subscriptions