← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1797333] [NEW] Instances are locked and unable to start after server crash (queens)

 

Public bug reported:

After restarting crashed host, disks of hosted instances on nfs are
locked and cannot be restarted:

libvirtError: internal error: process exited while connecting to monitor: 2018-10-10T10:16:09.816477Z qemu-system-x86_64: -drive file=/var/lib/nova/instances/ed7760a8-3008-4feb-83f3-3b753b0e7d6e/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Failed to get "write" lock
ERROR nova.compute.manager [instance: ed7760a8-3008-4feb-83f3-3b753b0e7d6e] Is another process using the image?

The same situation occurs on other compute nodes connected to the same
shared file system, after evacuate instances. So it seems that disks are
locked by libvirt in an unknown, undocumented way. As workaround I had
to make copy of all failed instances, delete their disk files and
restore them from copy. After that instances started successfully.

If there is other solution to unlock those instance disks, please share.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: libvirt

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1797333

Title:
  Instances are locked and unable to start after server crash (queens)

Status in OpenStack Compute (nova):
  New

Bug description:
  After restarting crashed host, disks of hosted instances on nfs are
  locked and cannot be restarted:

  libvirtError: internal error: process exited while connecting to monitor: 2018-10-10T10:16:09.816477Z qemu-system-x86_64: -drive file=/var/lib/nova/instances/ed7760a8-3008-4feb-83f3-3b753b0e7d6e/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none: Failed to get "write" lock
  ERROR nova.compute.manager [instance: ed7760a8-3008-4feb-83f3-3b753b0e7d6e] Is another process using the image?

  The same situation occurs on other compute nodes connected to the same
  shared file system, after evacuate instances. So it seems that disks
  are locked by libvirt in an unknown, undocumented way. As workaround I
  had to make copy of all failed instances, delete their disk files and
  restore them from copy. After that instances started successfully.

  If there is other solution to unlock those instance disks, please
  share.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1797333/+subscriptions


Follow ups