← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1317792] [NEW] XenAPI: Race - import followed by upload / waiting for snapshot coalesce

 

Public bug reported:

In a nutshell, so everyone doesn't have to read the rest of the bug description:
If a tree is imported with more than two VDIs, then the _snapshot_attached_here method will never succeed as the condition it is waiting for (the parent being equal to the original parent again) will not occur.

The following tree is imported from glance:
May  8 13:41:10 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
May  8 13:41:10 localhost SMGC: [14336]             *863cc348(40.000M/2.008M)
May  8 13:41:10 localhost SMGC: [14336]                 eccebf0b(40.000M/2.008M)

If we immediately try to perform an operation that uses a snapshot (e.g. resize), we get the 'original' parent as 863cc348 - which means that later we will expect 863cc348 to be the new parent:
2014-05-08 13:41:23.769 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 863cc348-7a28-4dd0-a976-142eb887256c from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026

We then take a snapshot and wait for coalesce so we have as few VHDs in the chain as possible:
May  8 13:41:24 localhost SM: [24744] ['/usr/sbin/td-util', 'snapshot', 'vhd', '/var/run/sr-mount/16e0a27c-007c-d083-7459-ee038c92f8a4/eccebf0b-00b9-4df5-b558-8feaf835e850.vhd', '151c921e-dc28-4a66-b286-65cd43bf9a6f.vhd']
2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 151c921e-dc28-4a66-b286-65cd43bf9a6f from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 151c921e-dc28-4a66-b286-65cd43bf9a6f doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119

The snapshot has the effect of creating two children (ecce is the one we're referencing and b554 keeps the snapshot from being deleted):
May  8 13:41:34 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
May  8 13:41:34 localhost SMGC: [14336]             *863cc348(40.000M/2.008M)
May  8 13:41:34 localhost SMGC: [14336]                 *151c921e(40.000M/14.032M)
May  8 13:41:34 localhost SMGC: [14336]                     b55436a7(40.000M/5.000K)
May  8 13:41:34 localhost SMGC: [14336]                     eccebf0b(40.000M/4.012M)

Coalescing then happens, destroying first 863cc348 then 151c921, merging the differences up the chain into 814ea7ed:
May  8 13:41:59 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
May  8 13:41:59 localhost SMGC: [14336]             *863cc348(40.000M/2.008M)
May  8 13:41:59 localhost SMGC: [14336]             *151c921e(40.000M/14.032M)
May  8 13:41:59 localhost SMGC: [14336]                 b55436a7(40.000M/5.000K)
May  8 13:41:59 localhost SMGC: [14336]                 eccebf0b(40.000M/16.036M)

May  8 13:42:11 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
May  8 13:42:11 localhost SMGC: [14336]             *151c921e(40.000M/14.032M)
May  8 13:42:11 localhost SMGC: [14336]                 b55436a7(40.000M/5.000K)
May  8 13:42:11 localhost SMGC: [14336]                 eccebf0b(40.000M/16.036M)

May  8 13:42:36 localhost SMGC: [5186]         *814ea7ed(40.000M/40.079M)
May  8 13:42:36 localhost SMGC: [5186]             b55436a7(40.000M/5.000K)
May  8 13:42:36 localhost SMGC: [5186]             eccebf0b(40.000M/16.036M)

During this time, we're waiting for coalesce and checking the parent.  Eventually we get:
2014-05-08 13:42:41.981 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
2014-05-08 13:42:41.982 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119

2014-05-08 13:44:00.678 ERROR nova.virt.xenapi.vmops [req-62887aaa-cd8a-
43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595
ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-
7a43bbc992eb] _migrate_disk_resizing_up failed. Restoring orig vm
due_to: VHD coalesce attempts exceeded (20), giving up....

We're lucky in a way that 863cc348 was destroyed before 151c921, because
otherwise this race might not have been noticed if Nova picked the right
time to claim all coalescing had finished - which could have given more
problems when uploading the VHDs.

** Affects: nova
     Importance: Medium
         Status: New


** Tags: xenserver

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1317792

Title:
  XenAPI: Race - import followed by upload / waiting for snapshot
  coalesce

Status in OpenStack Compute (Nova):
  New

Bug description:
  In a nutshell, so everyone doesn't have to read the rest of the bug description:
  If a tree is imported with more than two VDIs, then the _snapshot_attached_here method will never succeed as the condition it is waiting for (the parent being equal to the original parent again) will not occur.

  The following tree is imported from glance:
  May  8 13:41:10 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
  May  8 13:41:10 localhost SMGC: [14336]             *863cc348(40.000M/2.008M)
  May  8 13:41:10 localhost SMGC: [14336]                 eccebf0b(40.000M/2.008M)

  If we immediately try to perform an operation that uses a snapshot (e.g. resize), we get the 'original' parent as 863cc348 - which means that later we will expect 863cc348 to be the new parent:
  2014-05-08 13:41:23.769 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 863cc348-7a28-4dd0-a976-142eb887256c from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026

  We then take a snapshot and wait for coalesce so we have as few VHDs in the chain as possible:
  May  8 13:41:24 localhost SM: [24744] ['/usr/sbin/td-util', 'snapshot', 'vhd', '/var/run/sr-mount/16e0a27c-007c-d083-7459-ee038c92f8a4/eccebf0b-00b9-4df5-b558-8feaf835e850.vhd', '151c921e-dc28-4a66-b286-65cd43bf9a6f.vhd']
  2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 151c921e-dc28-4a66-b286-65cd43bf9a6f from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
  2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 151c921e-dc28-4a66-b286-65cd43bf9a6f doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119

  The snapshot has the effect of creating two children (ecce is the one we're referencing and b554 keeps the snapshot from being deleted):
  May  8 13:41:34 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
  May  8 13:41:34 localhost SMGC: [14336]             *863cc348(40.000M/2.008M)
  May  8 13:41:34 localhost SMGC: [14336]                 *151c921e(40.000M/14.032M)
  May  8 13:41:34 localhost SMGC: [14336]                     b55436a7(40.000M/5.000K)
  May  8 13:41:34 localhost SMGC: [14336]                     eccebf0b(40.000M/4.012M)

  Coalescing then happens, destroying first 863cc348 then 151c921, merging the differences up the chain into 814ea7ed:
  May  8 13:41:59 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
  May  8 13:41:59 localhost SMGC: [14336]             *863cc348(40.000M/2.008M)
  May  8 13:41:59 localhost SMGC: [14336]             *151c921e(40.000M/14.032M)
  May  8 13:41:59 localhost SMGC: [14336]                 b55436a7(40.000M/5.000K)
  May  8 13:41:59 localhost SMGC: [14336]                 eccebf0b(40.000M/16.036M)

  May  8 13:42:11 localhost SMGC: [14336]         *814ea7ed(40.000M/40.079M)
  May  8 13:42:11 localhost SMGC: [14336]             *151c921e(40.000M/14.032M)
  May  8 13:42:11 localhost SMGC: [14336]                 b55436a7(40.000M/5.000K)
  May  8 13:42:11 localhost SMGC: [14336]                 eccebf0b(40.000M/16.036M)

  May  8 13:42:36 localhost SMGC: [5186]         *814ea7ed(40.000M/40.079M)
  May  8 13:42:36 localhost SMGC: [5186]             b55436a7(40.000M/5.000K)
  May  8 13:42:36 localhost SMGC: [5186]             eccebf0b(40.000M/16.036M)

  During this time, we're waiting for coalesce and checking the parent.  Eventually we get:
  2014-05-08 13:42:41.981 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
  2014-05-08 13:42:41.982 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119

  2014-05-08 13:44:00.678 ERROR nova.virt.xenapi.vmops [req-62887aaa-
  cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595
  ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-
  852e-7a43bbc992eb] _migrate_disk_resizing_up failed. Restoring orig vm
  due_to: VHD coalesce attempts exceeded (20), giving up....

  We're lucky in a way that 863cc348 was destroyed before 151c921,
  because otherwise this race might not have been noticed if Nova picked
  the right time to claim all coalescing had finished - which could have
  given more problems when uploading the VHDs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1317792/+subscriptions


Follow ups

References