yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1896621] Re: instance corrupted after volume retype

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1896621@xxxxxxxxxxxxxxxxxx>
Date: Sat, 17 Oct 2020 09:54:30 -0000
Reply-to: Bug 1896621 <1896621@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.opendev.org/754695
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6cf449bdd0d4beb95cf12311e7d2f8669e625fac
Submitter: Zuul
Branch:    master

commit 6cf449bdd0d4beb95cf12311e7d2f8669e625fac
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date:   Mon Sep 28 12:18:29 2020 +0100

    compute: Lock by instance.uuid lock during swap_volume
    
    The libvirt driver is currently the only virt driver implementing swap
    volume within Nova. While libvirt itself does support moving between
    multiple volumes attached to the same instance at the same time the
    current logic within the libvirt driver makes a call to
    virDomainGetXMLDesc that fails if there are active block jobs against
    any disk attached to the domain.
    
    This change simply uses an instance.uuid based lock in the compute layer
    to serialise requests to swap_volume to avoid this from being possible.
    
    Closes-Bug: #1896621
    Change-Id: Ic5ce2580e7638a47f1ffddb4edbb503bf490504c


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1896621

Title:
  instance corrupted after volume retype

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========

  Following a cinder volume retype on a volume attached to a running
  instance, the instance became corrupt and cannot boot into the guest
  operating system any more.

  Upon further investigating it seems the retype operation failed.  The
  nova-compute logs registered the following error:

  Exception during message handling: libvirtError: block copy still
  active: domain has active block job

  see log extract: http://paste.openstack.org/show/798201/

  Steps to reproduce
  ==================

  I'm not sure how easy this would be to replicate the exact problem.

  As an admin user within the project, in Horizon go to Project | Volume
  | Volume, then from the context menu of the required volume select
  "change volume type".

  Select the new type and migration policy 'on-demand'.

  Following this it was reported that the instance was none-responsive,
  when checking in the console the instance was unable to boot from the
  volume.

  
  Environment
  ===========
  DISTRIB_ID="OSA"
  DISTRIB_RELEASE="18.1.5"
  DISTRIB_CODENAME="Rocky"
  DISTRIB_DESCRIPTION="OpenStack-Ansible"

  # nova-manage --version
  18.1.1

  # virsh version
  Compiled against library: libvirt 4.0.0
  Using library: libvirt 4.0.0
  Using API: QEMU 4.0.0
  Running hypervisor: QEMU 2.11.1

  
  Cinder v13.0.3 backed volumes using Zadara VPSA driver

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1896621/+subscriptions

References

[Bug 1896621] [NEW] instance corrupted after volume retype
From: Craig McIntyre, 2020-09-22