yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1934770] [NEW] Mismatch between forced host and AZ prevents move operations

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Stephen Finucane <1934770@xxxxxxxxxxxxxxxxxx>
Date: Tue, 06 Jul 2021 10:44:56 -0000
Reply-to: Bug 1934770 <1934770@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

When spawning a new instance, it's possible to force the instance to a
specific host by using a special 'availability_zone[:host[:node]]'
syntax for the 'availability_zone' field in the request. For example,
when using OSC:

  openstack server create --availability-zone my-az:my-host ... my-
server

Doing so bypasses the scheduler, which means the
'AvailabilityZoneFilter' never runs to validate the availability zone-
host combo. As a result, the availability zone portion of this value is
effectively ignored and the host will be used regardless of the
availability zone requested. This has some nasty side-effects. For one,
the availability zone information stored on the instance is generated
from the availability zone of the host the instance boots on, *not* the
availability zone requested in the host. This means that when a user
runs 'openstack server show' or 'openstack server list --long', they'll
see different availability zone information to what they requested.
However, the value requested *is* recorded in 'RequestSpec' object
created for the instance. This is reused if we attempt future move
operations and because the availability zone information was never
verified, it's possible to end up with an instance that can't be moved
since no host with the matching availability zone information exists.
The two issues collide with each other since the failure logs in the
latter case will reference one availability zone value, while inspecting
the instance record will show another value. This is seriously
confusing.

The solution seems to be to either (a) error out when an invalid
availability zone-host combo is requested or simply ignore the
availability zone aspect of the request, opting to use the value of the
host instead (with a warning, ideally). Note that microversion 2.74
introduced a better way of requesting a specific host without bypassing
the scheduler, using 'host' and 'hypervisor_hostname' fields in the body
of the instance create request, however, the old way of doing things is
not yet deprecated and even if it was, we'd still have to support this
for older microversions. We should fix this DB discrepancy one way or
the other.

** Affects: nova
     Importance: Medium
     Assignee: Stephen Finucane (stephenfinucane)
         Status: Confirmed


** Tags: availability-zones scheduler

** Tags added: availability-zones

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1934770

Title:
  Mismatch between forced host and AZ prevents move operations

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  When spawning a new instance, it's possible to force the instance to a
  specific host by using a special 'availability_zone[:host[:node]]'
  syntax for the 'availability_zone' field in the request. For example,
  when using OSC:

    openstack server create --availability-zone my-az:my-host ... my-
  server

  Doing so bypasses the scheduler, which means the
  'AvailabilityZoneFilter' never runs to validate the availability zone-
  host combo. As a result, the availability zone portion of this value
  is effectively ignored and the host will be used regardless of the
  availability zone requested. This has some nasty side-effects. For
  one, the availability zone information stored on the instance is
  generated from the availability zone of the host the instance boots
  on, *not* the availability zone requested in the host. This means that
  when a user runs 'openstack server show' or 'openstack server list
  --long', they'll see different availability zone information to what
  they requested. However, the value requested *is* recorded in
  'RequestSpec' object created for the instance. This is reused if we
  attempt future move operations and because the availability zone
  information was never verified, it's possible to end up with an
  instance that can't be moved since no host with the matching
  availability zone information exists. The two issues collide with each
  other since the failure logs in the latter case will reference one
  availability zone value, while inspecting the instance record will
  show another value. This is seriously confusing.

  The solution seems to be to either (a) error out when an invalid
  availability zone-host combo is requested or simply ignore the
  availability zone aspect of the request, opting to use the value of
  the host instead (with a warning, ideally). Note that microversion
  2.74 introduced a better way of requesting a specific host without
  bypassing the scheduler, using 'host' and 'hypervisor_hostname' fields
  in the body of the instance create request, however, the old way of
  doing things is not yet deprecated and even if it was, we'd still have
  to support this for older microversions. We should fix this DB
  discrepancy one way or the other.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1934770/+subscriptions

Follow ups

[Bug 1934770] Re: Mismatch between forced host and AZ prevents move operations
From: Elod Illes, 2021-10-07
[Bug 1934770] Re: Mismatch between forced host and AZ prevents move operations
From: OpenStack Infra, 2021-07-25