← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1552303] [NEW] Block live migrations are broken when nova calculates live migration type by itself

 

Public bug reported:

All block live migrations are broken when I want nova to calculate live
migration type by specifying {'block_migration': 'auto'} in request
body. This happens because block_migration and
migrate_data.block_migration flags do not have the same value.

In conductor live migrate task we call checks on destination and source
that builds up migrate_data in driver and sends them back to conductor:

https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L156

Here we calculate block migration, this is fine:

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5554

Then it goes back to conductor and we call compute manager sending both
flags - block_migration and migrate_data.block_migration - but we never
change value of block_migration to match migrate_data.block_migration:

https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L68

Because down in compute manager (and in drivers) we use both flags that
have different values (here block_migration=None,
migrate_data.block_migration=True), e.g. at this point
block_migration=None:

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L5196

We break all block live migrations with:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
    timer()
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, in _do_send
    waiter.switch(result)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/utils.py", line 1160, in context_wrapper
    return func(*args, **kwargs)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6095, in _live_migration_operation
    instance=instance)
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6063, in _live_migration_operation
    CONF.libvirt.live_migration_bandwidth)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
    rv = execute(f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
    six.reraise(c, e, tb)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
    rv = meth(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1825, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: Cannot access storage file '/opt/stack/data/nova/instances/572ad149-b7c5-4b77-85b5-34c1d2d37fcf/disk' (as uid:110, gid:116): No such file or directory

Fast workaround is making sure at compute manager level that
block_migration == migrate_data.block_migration, but really we should
cleanup all this mess and send only one flag, because it is error-prone
and hard to maintain.

** Affects: nova
     Importance: Critical
     Assignee: Pawel Koniszewski (pawel-koniszewski)
         Status: In Progress


** Tags: live-migration

** Description changed:

  All block live migrations are broken when I want nova to calculate live
  migration type by specifying {'block_migration': 'auto'} in request
  body. This happens because block_migration and
  migrate_data.block_migration flags do not have the same value.
  
  In conductor live migrate task we call checks on destination and source
  that builds up migrate_data in driver and sends them back to conductor:
  
  https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L156
  
  Here we calculate block migration, this is fine:
  
  https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5554
  
  Then it goes back to conductor and we call compute manager sending both
  flags - block_migration and migrate_data.block_migration - but we never
- changed value of block_migration to match migrate_data.block_migration:
+ change value of block_migration to match migrate_data.block_migration:
  
  https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L68
  
  Because down in compute manager (and in drivers) we use both flags that
  have different values (here block_migration=None,
  migrate_data.block_migration=True), e.g. at this point
  block_migration=None:
  
  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L5196
  
  We break all block live migrations with:
  
  Traceback (most recent call last):
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
-     timer()
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
-     cb(*args, **kw)
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, in _do_send
-     waiter.switch(result)
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
-     result = function(*args, **kwargs)
-   File "/opt/stack/nova/nova/utils.py", line 1160, in context_wrapper
-     return func(*args, **kwargs)
-   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6095, in _live_migration_operation
-     instance=instance)
-   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
-     self.force_reraise()
-   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
-     six.reraise(self.type_, self.value, self.tb)
-   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6063, in _live_migration_operation
-     CONF.libvirt.live_migration_bandwidth)
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
-     result = proxy_call(self._autowrap, f, *args, **kwargs)
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
-     rv = execute(f, *args, **kwargs)
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
-     six.reraise(c, e, tb)
-   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
-     rv = meth(*args, **kwargs)
-   File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1825, in migrateToURI2
-     if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
+     timer()
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
+     cb(*args, **kw)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, in _do_send
+     waiter.switch(result)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
+     result = function(*args, **kwargs)
+   File "/opt/stack/nova/nova/utils.py", line 1160, in context_wrapper
+     return func(*args, **kwargs)
+   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6095, in _live_migration_operation
+     instance=instance)
+   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
+     self.force_reraise()
+   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
+     six.reraise(self.type_, self.value, self.tb)
+   File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6063, in _live_migration_operation
+     CONF.libvirt.live_migration_bandwidth)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
+     result = proxy_call(self._autowrap, f, *args, **kwargs)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
+     rv = execute(f, *args, **kwargs)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
+     six.reraise(c, e, tb)
+   File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
+     rv = meth(*args, **kwargs)
+   File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1825, in migrateToURI2
+     if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
  libvirtError: Cannot access storage file '/opt/stack/data/nova/instances/572ad149-b7c5-4b77-85b5-34c1d2d37fcf/disk' (as uid:110, gid:116): No such file or directory
  
  Fast workaround is making sure at compute manager level that
  block_migration == migrate_data.block_migration, but really we should
  cleanup all this mess and send only one flag, because it is error-prone
  and hard to maintain.

** Changed in: nova
   Importance: Undecided => Critical

** Changed in: nova
       Status: New => In Progress

** Changed in: nova
     Assignee: (unassigned) => Pawel Koniszewski (pawel-koniszewski)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1552303

Title:
  Block live migrations are broken when nova calculates live migration
  type by itself

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  All block live migrations are broken when I want nova to calculate
  live migration type by specifying {'block_migration': 'auto'} in
  request body. This happens because block_migration and
  migrate_data.block_migration flags do not have the same value.

  In conductor live migrate task we call checks on destination and
  source that builds up migrate_data in driver and sends them back to
  conductor:

  https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L156

  Here we calculate block migration, this is fine:

  https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L5554

  Then it goes back to conductor and we call compute manager sending
  both flags - block_migration and migrate_data.block_migration - but we
  never change value of block_migration to match
  migrate_data.block_migration:

  https://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L68

  Because down in compute manager (and in drivers) we use both flags
  that have different values (here block_migration=None,
  migrate_data.block_migration=True), e.g. at this point
  block_migration=None:

  https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L5196

  We break all block live migrations with:

  Traceback (most recent call last):
    File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
      timer()
    File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
      cb(*args, **kw)
    File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, in _do_send
      waiter.switch(result)
    File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
      result = function(*args, **kwargs)
    File "/opt/stack/nova/nova/utils.py", line 1160, in context_wrapper
      return func(*args, **kwargs)
    File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6095, in _live_migration_operation
      instance=instance)
    File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
      self.force_reraise()
    File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
      six.reraise(self.type_, self.value, self.tb)
    File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6063, in _live_migration_operation
      CONF.libvirt.live_migration_bandwidth)
    File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
      result = proxy_call(self._autowrap, f, *args, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
      rv = execute(f, *args, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
      six.reraise(c, e, tb)
    File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
      rv = meth(*args, **kwargs)
    File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1825, in migrateToURI2
      if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
  libvirtError: Cannot access storage file '/opt/stack/data/nova/instances/572ad149-b7c5-4b77-85b5-34c1d2d37fcf/disk' (as uid:110, gid:116): No such file or directory

  Fast workaround is making sure at compute manager level that
  block_migration == migrate_data.block_migration, but really we should
  cleanup all this mess and send only one flag, because it is error-
  prone and hard to maintain.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1552303/+subscriptions


Follow ups