← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1583145] Re: live-migration monitoring is not working properly

 

The decision to abort is made via the should_abort() method using the
progress_timeout and the completion_timeout in the same method. The
issue reported here is that if the ephemeral disk is very full, the
progress part ends up taking too long, and the completion_timeout check
in the should_abort() method isn't getting considered in the abort
decision.

As far as I can tell, this is how Nova is supposed to work and this is
not a bug. Is the issue that you think progress and completion should be
mutually exclusive? Would you prefer to have an option to not consider
progress and only consider completion time? What about increasing the
progress_timeout value or considering the root cause of the transfer
latency?

The should_abort() method being called in context:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L6148

And the actual should_abort() implementation using progress_timeout and completion_timeout is here:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/migration.py#L206

And here is the original commit -
https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=b6b92db1874cc6485a234f176355ef3b2cbee15d

I'm going to close this as invalid for now, but if you have more
information that addresses some of the things I've raised above, please
feel free to reopen this bug with that information.


** Changed in: nova
       Status: New => Invalid

** Summary changed:

- live-migration monitoring is not working properly
+ Live migration of large ephemeral disks result in progress_timeout

** Summary changed:

- Live migration of large ephemeral disks result in progress_timeout
+ Live migration of large ephemeral disks result in progress_timeout not completion_timeout

** Summary changed:

- Live migration of large ephemeral disks result in progress_timeout not completion_timeout
+ Live migration of large ephemeral disks result in progress_timeout

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1583145

Title:
  Live migration of large ephemeral disks result in progress_timeout

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Live-migration monitoring is based on the memory migration only. Which
  means that if a live-migration is taking a long time to migrate a
  disk, only the live_migration_progress_timeout parameter will be taken
  into account and will override the live_migration_completion_timeout
  parameter. In other words, because live_migration_progress_timeout is
  logically smaller than live_migration_completion_timeout, the later
  will never be used except in a case where the disk migration is fast
  and the memory migration is slow.

  Steps to reproduce:
  - live-migrate an instance with lots a data on its ephemeral disk
  - observe that the live-migration is aborted because of live_migration_progress_timeout only.

  A bit of log with the original values (live_migration_progress_timeout
  = 150 seconds and live_migration_completion_timeout = 800 seconds):

  2016-05-25 08:59:30.344 3384 DEBUG nova.virt.libvirt.driver [req-3b2aab7c-5f43-494a-b9d2-a02a35d22fa1 c3eeb0123cf644889e157543da99ae48 b2657f1b7b86474baccf55faac526e5a] [instance: 46495844-fd23-448b-9e61-c5ffdb636155] Migration running for 145 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0) _live_migration_monitor /opt/stack/venv/nova-20160511T210741Z/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:6422
  2016-05-25 08:59:30.845 3384 WARNING nova.virt.libvirt.driver [req-3b2aab7c-5f43-494a-b9d2-a02a35d22fa1 c3eeb0123cf644889e157543da99ae48 b2657f1b7b86474baccf55faac526e5a] [instance: 46495844-fd23-448b-9e61-c5ffdb636155] Live migration stuck for 150 sec
  2016-05-25 08:59:31.590 3384 DEBUG nova.virt.libvirt.driver [req-3b2aab7c-5f43-494a-b9d2-a02a35d22fa1 c3eeb0123cf644889e157543da99ae48 b2657f1b7b86474baccf55faac526e5a] [instance: 46495844-fd23-448b-9e61-c5ffdb636155] Live migration monitoring is all done _live_migration /opt/stack/venv/nova-20160511T210741Z/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:6526
  2016-05-25 08:59:31.818 3384 ERROR nova.virt.libvirt.driver [req-3b2aab7c-5f43-494a-b9d2-a02a35d22fa1 c3eeb0123cf644889e157543da99ae48 b2657f1b7b86474baccf55faac526e5a] [instance: 46495844-fd23-448b-9e61-c5ffdb636155] Live Migration failure: operation aborted: migration out: canceled by client

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1583145/+subscriptions


References