yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #15714
[Bug 1317792] Re: XenAPI: Race - import followed by upload / waiting for snapshot coalesce
** Changed in: nova
Status: Fix Committed => Fix Released
** Changed in: nova
Milestone: None => juno-1
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1317792
Title:
XenAPI: Race - import followed by upload / waiting for snapshot
coalesce
Status in OpenStack Compute (Nova):
Fix Released
Bug description:
In a nutshell, so everyone doesn't have to read the rest of the bug description:
If a tree is imported with more than two VDIs, then the _snapshot_attached_here method will never succeed as the condition it is waiting for (the parent being equal to the original parent again) will not occur.
The following tree is imported from glance:
May 8 13:41:10 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:41:10 localhost SMGC: [14336]............ *863cc348(40.000M/2.008M)
May 8 13:41:10 localhost SMGC: [14336]................ eccebf0b(40.000M/2.008M)
If we immediately try to perform an operation that uses a snapshot (e.g. resize), we get the 'original' parent as 863cc348 - which means that later we will expect 863cc348 to be the new parent:
2014-05-08 13:41:23.769 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 863cc348-7a28-4dd0-a976-142eb887256c from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
We then take a snapshot and wait for coalesce so we have as few VHDs in the chain as possible:
May 8 13:41:24 localhost SM: [24744] ['/usr/sbin/td-util', 'snapshot', 'vhd', '/var/run/sr-mount/16e0a27c-007c-d083-7459-ee038c92f8a4/eccebf0b-00b9-4df5-b558-8feaf835e850.vhd', '151c921e-dc28-4a66-b286-65cd43bf9a6f.vhd']
2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 151c921e-dc28-4a66-b286-65cd43bf9a6f from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 151c921e-dc28-4a66-b286-65cd43bf9a6f doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119
The snapshot has the effect of creating two children (ecce is the one we're referencing and b554 keeps the snapshot from being deleted):
May 8 13:41:34 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:41:34 localhost SMGC: [14336]............ *863cc348(40.000M/2.008M)
May 8 13:41:34 localhost SMGC: [14336]................ *151c921e(40.000M/14.032M)
May 8 13:41:34 localhost SMGC: [14336].................... b55436a7(40.000M/5.000K)
May 8 13:41:34 localhost SMGC: [14336].................... eccebf0b(40.000M/4.012M)
Coalescing then happens, destroying first 863cc348 then 151c921, merging the differences up the chain into 814ea7ed:
May 8 13:41:59 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:41:59 localhost SMGC: [14336]............ *863cc348(40.000M/2.008M)
May 8 13:41:59 localhost SMGC: [14336]............ *151c921e(40.000M/14.032M)
May 8 13:41:59 localhost SMGC: [14336]................ b55436a7(40.000M/5.000K)
May 8 13:41:59 localhost SMGC: [14336]................ eccebf0b(40.000M/16.036M)
May 8 13:42:11 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:42:11 localhost SMGC: [14336]............ *151c921e(40.000M/14.032M)
May 8 13:42:11 localhost SMGC: [14336]................ b55436a7(40.000M/5.000K)
May 8 13:42:11 localhost SMGC: [14336]................ eccebf0b(40.000M/16.036M)
May 8 13:42:36 localhost SMGC: [5186]........ *814ea7ed(40.000M/40.079M)
May 8 13:42:36 localhost SMGC: [5186]............ b55436a7(40.000M/5.000K)
May 8 13:42:36 localhost SMGC: [5186]............ eccebf0b(40.000M/16.036M)
During this time, we're waiting for coalesce and checking the parent. Eventually we get to the static state where Nova is waiting for the parent of eccebf0b to be the deleted VDI 863cc348 - which can never happen now.
2014-05-08 13:42:41.981 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
2014-05-08 13:42:41.982 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119
2014-05-08 13:44:00.678 ERROR nova.virt.xenapi.vmops [req-62887aaa-
cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595
ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-
852e-7a43bbc992eb] _migrate_disk_resizing_up failed. Restoring orig vm
due_to: VHD coalesce attempts exceeded (20), giving up....
We're lucky in a way that 863cc348 was destroyed before 151c921,
because otherwise this race might not have been noticed if Nova picked
the right time to claim all coalescing had finished - which could have
given more problems when uploading the VHDs.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1317792/+subscriptions
References