yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1838392] Re: BDMNotFound raised and stale block devices left over when simultaneously reboot and deleting an instance

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1838392@xxxxxxxxxxxxxxxxxx>
Date: Tue, 26 Nov 2019 17:34:18 -0000
Reply-to: Bug 1838392 <1838392@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.opendev.org/673463
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9ad54f3dacbd372271f441baea5380f913072dde
Submitter: Zuul
Branch:    master

commit 9ad54f3dacbd372271f441baea5380f913072dde
Author: Lee Yarwood <lyarwood@xxxxxxxxxx>
Date:   Mon Jul 29 16:25:45 2019 +0100

    compute: Take an instance.uuid lock when rebooting
    
    Previously simultaneous requests to reboot and delete an instance could
    race as only the latter took a lock against the uuid of the instance.
    
    With the Libvirt driver this race could potentially result in attempts
    being made to reconnect previously disconnected volumes on the host.
    Depending on the volume backend being used this could then result in
    stale block devices point to unmapped volumes being left on the host
    that in turn could cause failures later on when connecting newly mapped
    volumes.
    
    This change avoids this race by ensuring any request to reboot an
    instance takes an instance.uuid lock within the compute manager,
    serialising requests to reboot and then delete the instance.
    
    Closes-Bug: #1838392
    Change-Id: Ieb59de10c63bb067f92ec054535766cdd722dae2


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838392

Title:
  BDMNotFound raised and stale block devices left over when
  simultaneously reboot and deleting an instance

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========
  Simultaneous requests to reboot and delete an instance _will_ race as only the call to delete takes a lock against the instance.uuid.

  One possible outcome of this seen in the wild with the Libvirt driver
  is that the request to soft reboot will eventually turn into a hard
  reboot, reconnecting volumes that the delete request has already
  disconnected. These volumes will eventually be unmapped on the Cinder
  side by the delete request leaving stale devices on the host.
  Additionally BDMNotFound is raised by the reboot operation as the
  delete operation has already deleted the BDMs.

  Steps to reproduce
  ==================
  $ nova reboot $instance && nova delete $instance

  Expected result
  ===============
  The instance reboots and is then deleted without any errors raised.

  Actual result
  =============
  BDMNotFound raised and stale block devices left over.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

  1599e3cf68779eafaaa2b13a273d3bebd1379c19 / 19.0.0.0rc1-992-g1599e3cf68

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     Libvirt + QEMU/kvm

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==============

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838392/+subscriptions

References

[Bug 1838392] [NEW] BDMNotFound raised and stale block devices left over when simultaneously reboot and deleting an instance
From: Lee Yarwood, 2019-07-30