← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1644248] [NEW] Nova incorrectly tracks live migration progress

 

Public bug reported:

Nova while monitoring live migration progress bases on what libvirt
reports under data_remaining property

https://github.com/openstack/nova/blob/54482fde22742bc852414c58552fe64ea59d61d5/nova/virt/libvirt/driver.py#L6189-L6193

However, data_remaining does not reflect any valuable information that
nova can use to track live migration progress. It's just an information
how many data needs to be transferred in current iteration to finish
current iteration and check whether VM can be switched to destination,
nothing more.

As an example let's assume we have VM with 4 GBs of memory. In the very
fist iteration libvirt will report that there is still 4GB of data to be
transferred. During the first iteration this number will go down to 0
bytes (or almost 0) and this will end the first iteration. Let's say
that during the first iteration VM has dirtied 3 GBs of memory. At the
beginning of subsequent iteration QEMU will calculate number of dirty
pages * page size and libvirt will report 3 GBs of data to be
transferred in the second iteration. However, during second iteration
data_remaining will again go down to zero at the end of second
iteration.

Given that nova makes snapshot of all those information once every 0.5
second and that data remaining reported by libvirt reflects only data
remaining in particular iteration, we can't say whether LM is
progressing or not. Therefore live migration progress timeout does not
make sense as nova can take a snapshot from libvirt in the first
iteration that will say that there is only 150 MB to be transferred to
destination and very likely in every subsequent iteration nova will not
take a snapshot with less amount of data to be transferred and will
think that LM is not progressing.

This affects all releases starting from Liberty.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: live-migration

** Description changed:

  Nova while monitoring live migration progress bases on what libvirt
  reports under data_remaining property
  
  https://github.com/openstack/nova/blob/54482fde22742bc852414c58552fe64ea59d61d5/nova/virt/libvirt/driver.py#L6189-L6193
  
  However, data_remaining does not reflect any valuable information that
  nova can use to track live migration progress. It's just an information
  how many data needs to be transferred in current iteration to finish
  current iteration and check whether VM can be switched to destination,
  nothing more.
  
  As an example let's assume we have VM with 4 GBs of memory. In the very
  fist iteration libvirt will report that there is still 4GB of data to be
  transferred. During the first iteration this number will go down to 0
  bytes (or almost 0) and this will end the first iteration. Let's say
  that during the first iteration VM has dirtied 3 GBs of memory. At the
  beginning of subsequent iteration QEMU will calculate number of dirty
  pages * page size and libvirt will report 3 GBs of data to be
  transferred in the second iteration. However, during second iteration
  data_remaining will again go down to zero at the end of second
  iteration.
  
  Given that nova makes snapshot of all those information once every 0.5
  second and that data remaining reported by libvirt reflects only data
  remaining in particular iteration, we can't say whether LM is
  progressing or not. Therefore live migration progress timeout does not
  make sense as nova can take a snapshot from libvirt in the first
  iteration that will say that there is only 150 MB to be transferred to
  destination and very likely in every subsequent iteration nova will not
  take a snapshot with less amount of data to be transferred and will
  think that LM is not progressing.
+ 
+ This affects all releases starting from Liberty.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1644248

Title:
  Nova incorrectly tracks live migration progress

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova while monitoring live migration progress bases on what libvirt
  reports under data_remaining property

  https://github.com/openstack/nova/blob/54482fde22742bc852414c58552fe64ea59d61d5/nova/virt/libvirt/driver.py#L6189-L6193

  However, data_remaining does not reflect any valuable information that
  nova can use to track live migration progress. It's just an
  information how many data needs to be transferred in current iteration
  to finish current iteration and check whether VM can be switched to
  destination, nothing more.

  As an example let's assume we have VM with 4 GBs of memory. In the
  very fist iteration libvirt will report that there is still 4GB of
  data to be transferred. During the first iteration this number will go
  down to 0 bytes (or almost 0) and this will end the first iteration.
  Let's say that during the first iteration VM has dirtied 3 GBs of
  memory. At the beginning of subsequent iteration QEMU will calculate
  number of dirty pages * page size and libvirt will report 3 GBs of
  data to be transferred in the second iteration. However, during second
  iteration data_remaining will again go down to zero at the end of
  second iteration.

  Given that nova makes snapshot of all those information once every 0.5
  second and that data remaining reported by libvirt reflects only data
  remaining in particular iteration, we can't say whether LM is
  progressing or not. Therefore live migration progress timeout does not
  make sense as nova can take a snapshot from libvirt in the first
  iteration that will say that there is only 150 MB to be transferred to
  destination and very likely in every subsequent iteration nova will
  not take a snapshot with less amount of data to be transferred and
  will think that LM is not progressing.

  This affects all releases starting from Liberty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1644248/+subscriptions


Follow ups