yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #83708
[Bug 1855927] Re: _poll_unconfirmed_resizes may not retry later if confirm_resize fails in API
Reviewed: https://review.opendev.org/699291
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e4601c77fb3b90638a6a56ec1a8e0e9eb7c91777
Submitter: Zuul
Branch: master
commit e4601c77fb3b90638a6a56ec1a8e0e9eb7c91777
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Mon Dec 16 16:15:24 2019 -0500
Ensure source compute is up when confirming a resize
When _poll_unconfirmed_resizes runs or a user tries to confirm
a resize in the API, if the source compute service is down the
migration status will be stuck in "confirming" status if it never
reached the source compute. Subsequent runs of
_poll_unconfirmed_resizes will not be able to auto-confirm the
resize nor will the user be able to manually confirm the resize.
An admin could reset the status on the server to ACTIVE or ERROR
but that means the source compute never gets cleaned up since you
can only confirm or revert a resize on a server with VERIFY_RESIZE
status.
This adds a check in the API before updating the migration record
such that if the source compute service is down the API returns a
409 response as an indication to try again later.
SingleCellSimple._fake_target_cell is updated so that tests using
it can assert when a context was targeted without having to stub
nova.context.target_cell. As a result some HostManager unit tests
needed to be updated.
Change-Id: I33aa5e32cb321e5a16da51e227af2f67ed9e6713
Closes-Bug: #1855927
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1855927
Title:
_poll_unconfirmed_resizes may not retry later if confirm_resize fails
in API
Status in OpenStack Compute (nova):
Fix Released
Bug description:
This is based on code inspection but let's say I have configured my
computes to set resize_confirm_window=3600 to automatically confirm a
resized server after 1 hour. Within that hour, let's say the source
compute service is down.
The periodic task gets the unconfirmed migrations with
status='finished' which have been updated some time older than the
given configurable window:
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/manager.py#L8793
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/db/sqlalchemy/api.py#L4342
The periodic task then calls the compute API code to confirm the
resize:
https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7160
which changes the migration status to 'confirming':
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3684
And casts off to the source compute:
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/rpcapi.py#L600
Now if the source compute is down and that fails, the compute manager
task code will handle it and say it will retry later:
https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L7163
However, because the migration status was changed from 'finished' to
'confirming' the task will not retry because it won't find the
migration given the DB query. And trying to confirm the resize via the
API will fail as well because we'll get MigrationNotFoundByStatus
since the migration status is no longer 'finished':
https://github.com/openstack/nova/blob/5a3ef39539ca112ae0552aef5cbd536338db61b7/nova/compute/api.py#L3681
The compute manager code should probably mark the migration status as
'finished' again if it's really going to try later, or mark the
migration status as 'error'. Note that the confirm_resize method in
the compute manager doesn't mark the migration status as 'error' if
something fails there either:
https://github.com/openstack/nova/blob/c295e395d/nova/compute/manager.py#L3807
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1855927/+subscriptions
References