yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #86710
[Bug 1934770] Re: Mismatch between forced host and AZ prevents move operations
Reviewed: https://review.opendev.org/c/openstack/nova/+/798145
Committed: https://opendev.org/openstack/nova/commit/8f21ee42bd66b62e75e14acf4e91b786d78b9168
Submitter: "Zuul (22348)"
Branch: master
commit 8f21ee42bd66b62e75e14acf4e91b786d78b9168
Author: Stephen Finucane <stephenfin@xxxxxxxxxx>
Date: Fri Jun 25 18:51:06 2021 +0100
api: Align availability zone info with forced host
Users can create a server like so:
$ openstack server create --availability-zone az:host ...
This is a historical way to request that an instance be scheduled to a
specific host and it causes the scheduler to be bypassed. However, no
validation of this availability zone-host combo takes place. The host
could in fact belong to a different availability zone. If it does, we'll
end up in a very odd situation whereby the RequestSpec record for the
instance will record the availability zone requested by the user at
create time, but the Instance record itself will record the availability
zone of the host on which the instance was scheduled. This leads to even
more confusing behavior when we attempt to do something like live
migrate the instance since the RequestSpec record, with its original and
possibly invalid availability zone information, is used. The
'AvailabilityZoneFilter' will fail an error message like the following:
Availability Zone 'foo' requested. ... has AZs: bar
but the 'openstack server list --long' command will show a non-foo value
for the availability zone column.
The solution is simple: when given an availability zone-host combo, make
sure the availability zone requested matches that of the host (or, more
specifically, the host is a member of the host aggregates that form the
availability zone [1]). If not, simply ignore the requested availability
zone information in favour of using the availability zone of the host,
logging a warning just for record keeping purposes. This is deemed
preferable to failing with HTTP 400 (Bad Request) since what users are
really requesting by using this was to schedule to a specific host: the
availability zone portion of the request is really irrelevant and just
an artifact of this legacy mechanism to request hosts. If users wish to
truly validate a host-availability zone combo, they can use the 'host'
field introduced in microversion 2.74 along with the 'availability_zone'
field:
$ openstack server create --availability-zone az --host host ...
[1] https://docs.openstack.org/nova/latest/admin/aggregates.html
Change-Id: Iac0e634e66cd4e150a50935cf635f626fc11b70e
Signed-off-by: Stephen Finucane <stephenfin@xxxxxxxxxx>
Closes-Bug: #1934770
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1934770
Title:
Mismatch between forced host and AZ prevents move operations
Status in OpenStack Compute (nova):
Fix Released
Bug description:
When spawning a new instance, it's possible to force the instance to a
specific host by using a special 'availability_zone[:host[:node]]'
syntax for the 'availability_zone' field in the request. For example,
when using OSC:
openstack server create --availability-zone my-az:my-host ... my-
server
Doing so bypasses the scheduler, which means the
'AvailabilityZoneFilter' never runs to validate the availability zone-
host combo. As a result, the availability zone portion of this value
is effectively ignored and the host will be used regardless of the
availability zone requested. This has some nasty side-effects. For
one, the availability zone information stored on the instance is
generated from the availability zone of the host the instance boots
on, *not* the availability zone requested in the host. This means that
when a user runs 'openstack server show' or 'openstack server list
--long', they'll see different availability zone information to what
they requested. However, the value requested *is* recorded in
'RequestSpec' object created for the instance. This is reused if we
attempt future move operations and because the availability zone
information was never verified, it's possible to end up with an
instance that can't be moved since no host with the matching
availability zone information exists. The two issues collide with each
other since the failure logs in the latter case will reference one
availability zone value, while inspecting the instance record will
show another value. This is seriously confusing.
The solution seems to be to either (a) error out when an invalid
availability zone-host combo is requested or simply ignore the
availability zone aspect of the request, opting to use the value of
the host instead (with a warning, ideally). Note that microversion
2.74 introduced a better way of requesting a specific host without
bypassing the scheduler, using 'host' and 'hypervisor_hostname' fields
in the body of the instance create request, however, the old way of
doing things is not yet deprecated and even if it was, we'd still have
to support this for older microversions. We should fix this DB
discrepancy one way or the other.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1934770/+subscriptions
References