yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #62252
[Bug 1671648] Re: Instances are not rescheduled after deploy fails
** Also affects: nova/ocata
Importance: Undecided
Status: New
** Changed in: nova/ocata
Status: New => Confirmed
** Changed in: nova/ocata
Importance: Undecided => High
** Tags added: ocata-backport-potential
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1671648
Title:
Instances are not rescheduled after deploy fails
Status in OpenStack Compute (nova):
In Progress
Status in OpenStack Compute (nova) ocata series:
Confirmed
Bug description:
Steps to reproduce:
Pre-step. Need to force the deploy to fail in such a way that it can be rescheduled. For testing I just forced it to fail by adding raise nova.exception.ComputeResourcesUnavailable('forced failure') during the instance spawn on the host.
1. Make sure environment is set to retry failed deploys.
2. Attempt to deploy VM and wait for it to fail.
Expected result:
Failed instance is rescheduled and attempted on another host.
Actual result:
Deploy fails but is not rescheduled.
I am just beginning to experiment with ocata build from early March. I
found that when an instance fails to deploy and throws a
RescheduledException, it is not getting rescheduled as expected. The
problem appears to be that the filter_properties['retry'] is not
getting set during initial deploy.
On initial deploy
nova.conductor.manager.schedule_and_build_instances() schedules the
build_request and creates the instance object. That method also
creates the filter properties (filter_props) that is passed on to
compute_rpcapi.build_and_run_instance(). The problem is that
scheduler_utils.populate_retry() is not called before the filter_props
is passed on to the build call. When the deploy later fails on the
host nova.compute.manager._do_build_and_run_instance() catches the
RescheduledException but does not try and reschedule it because
filter_properties.get('retry') returns None.
In the past it looks like populate_retry() was called in by
nova.conductor.manager.build_instances() during the initial deploy.
I'm not seeing build_instances() get called during initial deploy
after switching to ocata. As an experiment I added
scheduler_utils.populate_retry(filter_props,
build_request.instance_uuid) immediately after filter_props is set in
schedule_and_build_instances(). Afterward I do see the instances get
rescheduled. I also noticed nova.conductor.manager.build_instances()
gets called for each attempt after the first.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1671648/+subscriptions
References