yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1784020] Re: Shared storage providers are not supported and will break things if used

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: melanie witt <1784020@xxxxxxxxxxxxxxxxxx>
Date: Thu, 09 Aug 2018 14:25:13 -0000
Reply-to: Bug 1784020 <1784020@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

I think we can consider this bug "things will break" resolved now that
the patch landed to disable the bit that makes shared storage providers
affect allocations. The work to finish proper support for shared storage
providers will be tracked on its blueprint.

** Changed in: nova
       Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1784020

Title:
  Shared storage providers are not supported and will break things if
  used

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  https://review.openstack.org/#/c/560459/ in Rocky changed the libvirt
  driver such that if the compute node provider is in a shared storage
  provider aggregate relationship (in the same aggregate with a resource
  provider that has DISK_GB inventory and the MISC_SHARES_VIA_AGGREGATE
  trait), the compute node provider won't report DISK_GB inventory.

  There are at least two major issues with this:

  1. On upgrade from Queens, any existing allocations against the
  compute node provider's DISK_GB inventory will not allow removal of
  the DISK_GB inventory from the compute node provider during the
  update_available_resource periodic task. In other words, we have no
  data migration routine in place to move DISK_GB allocations from the
  compute node provider to the shared storage provider in Rocky.

  2. During a move operation, we move the instance's allocations from
  the source compute node provider to the migration record, then go
  through the scheduler to pick a dest host for the instance and
  allocate resources against the dest host (and optionally shared
  storage provider). So:

  a) The DISK_GB allocation from the instance to the shared storage
  provider is deleted for a short window of time during scheduling until
  we pick a dest host.

  https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/conductor/tasks/migrate.py#L57

  b) If cold migrate fails or is reverted, we delete the allocations
  (created by the scheduler) and move the allocations from the migration
  record (against the source node provider) back to the instance, but
  because we failed to move the DISK_GB allocation against the sharing
  provider for the instance to the migration record, we've lost that
  DISK_GB allocation when copying it back to the instance on
  revert/failure:

  https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/compute/manager.py#L4155

  --

  We could also have issues with how forced live migrate:

  https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/conductor/tasks/live_migrate.py#L109

  And evacuate:

  https://github.com/openstack/nova/blob/6be7f7248fb1c2bbb890a0a48a424e205e173c9c/nova/conductor/manager.py#L868

  bypass the scheduler altogether so we're potentially not handling
  shared provider allocations there either.

  Also, we don't have *any* shared storage provider CI jobs setup. A
  start to that is here:

  https://review.openstack.org/#/c/586363/

  But that's just a single-node job at the moment and we'd need a multi-
  node shared storage CI job to really say we support shared storage
  providers as a feature in nova.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1784020/+subscriptions

References

[Bug 1784020] [NEW] Shared storage providers are not supported and will break things if used
From: Matt Riedemann, 2018-07-27