← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1177247] Re: libvirt migrate/resize on shared storage can cause data loss

 

** Changed in: nova/grizzly
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1177247

Title:
  libvirt migrate/resize on shared storage can cause data loss

Status in OpenStack Compute (Nova):
  Fix Released
Status in OpenStack Compute (nova) grizzly series:
  Fix Released

Bug description:
  When using shared storage across hypervisors, libvirt driver
  resize/migrate operations can result in a loss of instance data.  This
  is happening because many of the operations to create a copy of the
  instance are done within a try/except block.  Thus, if any operations
  fail, you're into the exception which does the following:

  === code ===

          except Exception:
              with excutils.save_and_reraise_exception():
                  self._cleanup_remote_migration(dest, inst_base,
                                                 inst_base_resize)

      def _cleanup_remote_migration(self, dest, inst_base, inst_base_resize):
          """Used only for cleanup in case migrate_disk_and_power_off fails."""
          try:
              if os.path.exists(inst_base_resize):
                  utils.execute('rm', '-rf', inst_base)
                  utils.execute('mv', inst_base_resize, inst_base)
                  utils.execute('ssh', dest, 'rm', '-rf', inst_base)
          except Exception:
              pass

  === end ===

  It doesn't take looking at this code for long to see why this is going
  to be a problem with shared storage.  In effect, the last ssh
  operation in the block above is going to blow away the original copy
  of the instance directory.

  The issue can be easily reproduced by issuing a resize of an instance
  with a large root disk.  In the middle of the resize, kill the ssh
  process created from the following call
  (https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3508)
  and observe the exception handler destroying everything.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1177247/+subscriptions