yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1980657] [NEW] nova instance snapshot does not wait for volume snaps

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Vladislav Belogrudov <1980657@xxxxxxxxxxxxxxxxxx>
Date: Mon, 04 Jul 2022 11:14:35 -0000
Reply-to: Bug 1980657 <1980657@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Public bug reported:

Currently when snapping a boot-from-volume instance nova immediately
returns success and glance reports active image. Snapping of disks can
take some time and if we try to launch a new instance from that image as
soon as we get active image state we get error. Nova should wait until
all snaps are ready before returning success.

relevant code lines:
https://github.com/openstack/nova/blob/512fbdfa9933f2e9b48bcded537ffb394979b24b/nova/compute/api.py#L3445

because in my backend setup snaps are somewhat long I always fail
multiattach tempest test:

{0} tearDownClass
(tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest)
[0.000000s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/opt/stack/tempest/tempest/test.py", line 220, in tearDownClass
    raise value.with_traceback(trace)

      File "/opt/stack/tempest/tempest/test.py", line 192, in tearDownClass
    teardown()

      File "/opt/stack/tempest/tempest/test.py", line 602, in resource_cleanup
    raise testtools.MultipleExceptions(*cleanup_errors)

    testtools.runtest.MultipleExceptions: (<class 'tempest.lib.exceptions.BadRequest'>, Bad request
Details: {'code': 400, 'message': 'Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer.'}, <traceback object at 0x7f622cbbdc80>)

This happens because we try to delete boot volume and snap while snap is
not ready. Also this trace is while trying to create a new instance from
'active' image:

юл 04 11:13:10 openstack-dev-00 nova-compute[1034924]: ERROR
nova.compute.manager [None req-8122e11f-b490-4c68-b447-e35c3ad21cdf demo
admin] [instance: 10cefe2f-ac44-4cb8-9006-28443481ba9d] Build of
instance 10cefe2f-ac44-4cb8-9006-28443481ba9d aborted: Invalid input
received: Invalid snapshot: Originating snapshot status must be one of
'available' values (HTTP 400) (Request-ID:
req-32350814-f125-448f-a620-dc8526e6256b):
nova.exception.BuildAbortException: Build of instance
10cefe2f-ac44-4cb8-9006-28443481ba9d aborted: Invalid input received:
Invalid snapshot: Originating snapshot status must be one of 'available'
values (HTTP 400) (Request-ID: req-32350814-f125-448f-a620-dc8526e6256b)

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1980657

Title:
  nova instance snapshot does not wait for volume snaps

Status in OpenStack Compute (nova):
  New

Bug description:
  Currently when snapping a boot-from-volume instance nova immediately
  returns success and glance reports active image. Snapping of disks can
  take some time and if we try to launch a new instance from that image
  as soon as we get active image state we get error. Nova should wait
  until all snaps are ready before returning success.

  relevant code lines:
  https://github.com/openstack/nova/blob/512fbdfa9933f2e9b48bcded537ffb394979b24b/nova/compute/api.py#L3445

  because in my backend setup snaps are somewhat long I always fail
  multiattach tempest test:

  {0} tearDownClass
  (tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest)
  [0.000000s] ... FAILED

  Captured traceback:
  ~~~~~~~~~~~~~~~~~~~
      Traceback (most recent call last):

        File "/opt/stack/tempest/tempest/test.py", line 220, in tearDownClass
      raise value.with_traceback(trace)

        File "/opt/stack/tempest/tempest/test.py", line 192, in tearDownClass
      teardown()

        File "/opt/stack/tempest/tempest/test.py", line 602, in resource_cleanup
      raise testtools.MultipleExceptions(*cleanup_errors)

      testtools.runtest.MultipleExceptions: (<class 'tempest.lib.exceptions.BadRequest'>, Bad request
  Details: {'code': 400, 'message': 'Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer.'}, <traceback object at 0x7f622cbbdc80>)

  This happens because we try to delete boot volume and snap while snap
  is not ready. Also this trace is while trying to create a new instance
  from 'active' image:

  юл 04 11:13:10 openstack-dev-00 nova-compute[1034924]: ERROR
  nova.compute.manager [None req-8122e11f-b490-4c68-b447-e35c3ad21cdf
  demo admin] [instance: 10cefe2f-ac44-4cb8-9006-28443481ba9d] Build of
  instance 10cefe2f-ac44-4cb8-9006-28443481ba9d aborted: Invalid input
  received: Invalid snapshot: Originating snapshot status must be one of
  'available' values (HTTP 400) (Request-ID:
  req-32350814-f125-448f-a620-dc8526e6256b):
  nova.exception.BuildAbortException: Build of instance
  10cefe2f-ac44-4cb8-9006-28443481ba9d aborted: Invalid input received:
  Invalid snapshot: Originating snapshot status must be one of
  'available' values (HTTP 400) (Request-ID:
  req-32350814-f125-448f-a620-dc8526e6256b)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1980657/+subscriptions