yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1319797] [NEW] Restarting destination compute manager during live-migration can cause instance data loss

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Loganathan Parthipan <1319797@xxxxxxxxxxxxxxxxxx>
Date: Thu, 15 May 2014 12:03:19 -0000
Reply-to: Bug 1319797 <1319797@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

During compute manager startup init_host is called. One of the functions
there is to delete instance data that doesn't belong to this host ie.
_destroy_evacuated_instances. But this function only checks if the local
instance belongs to the host or not. It doesn't check the task_state.

Suppose a live-migration is in progress and the destination compute
manager is restarted, it will find the migrating instance as not
belonging to the host and destroy it. This can result in two outomes:

1. If live-migration is in progress, then the source hypervisor would hang, so a rollback is possible to trigger by killing the job.
2. However, if live-migration is completed and the post-live-migration-destination is messaged then by the time the compute manager gets to processing the message, the instance data would have been deleted. Subsequent periodic tasks would only get as far as defining the VM but there wouldn't be any disks left.

014-05-08 20:42:33.058 16724 WARNING nova.virt.libvirt.driver [-] Periodic task is updating the host stat, it is trying to get disk instance-00000002, but disk file was removed by concurrent operations such as resize.
2014-05-08 20:43:33.370 16724 WARNING nova.virt.libvirt.driver [-] Periodic task is updating the host stat, it is trying to get disk instance-00000002, but disk file was removed by concurrent operations such as resize.

Steps to reproduce:

1. Start live-migration
2. Wait for pre-live-migration to define the destination VM
3. Restart destination compute manager

To see what happens for case 2 (Live-migration having completed), put a
breakpoint in init_host and delay till instance is running on the
destination and then continue the nova-compute. In this case you'll end
up with the instance directory like this:

ls -l 06ddbe13-577b-4f9f-ac52-0c038aec04d8
total 8
-rw-r--r-- 1 root root 89 May 8 19:59 disk.info
-rw-r--r-- 1 root root 1548 May 8 19:59 libvirt.xml

I verified this in a tripleo devtest environment.

** Affects: nova
Importance: Undecided
Status: New

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1319797

Title:
Restarting destination compute manager during live-migration can cause
instance data loss

Status in OpenStack Compute (Nova):
New

Bug description:
During compute manager startup init_host is called. One of the
functions there is to delete instance data that doesn't belong to this
host ie. _destroy_evacuated_instances. But this function only checks
if the local instance belongs to the host or not. It doesn't check the
task_state.

Steps to reproduce:

1. Start live-migration
2. Wait for pre-live-migration to define the destination VM
3. Restart destination compute manager

To see what happens for case 2 (Live-migration having completed), put
a breakpoint in init_host and delay till instance is running on the
destination and then continue the nova-compute. In this case you'll
end up with the instance directory like this:

ls -l 06ddbe13-577b-4f9f-ac52-0c038aec04d8
total 8
-rw-r--r-- 1 root root 89 May 8 19:59 disk.info
-rw-r--r-- 1 root root 1548 May 8 19:59 libvirt.xml

I verified this in a tripleo devtest environment.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1319797/+subscriptions

Follow ups

[Bug 1319797] Re: Restarting destination compute manager during live-migration can cause instance data loss
From: Chuck Short, 2014-08-07
[Bug 1319797] Re: Restarting destination compute manager during live-migration can cause instance data loss
From: Chuck Short, 2014-08-07
[Bug 1319797] Re: Restarting destination compute manager during live-migration can cause instance data loss
From: Russell Bryant, 2014-07-23
[Bug 1319797] [NEW] Restarting destination compute manager during live-migration can cause instance data loss
From: Loganathan Parthipan, 2014-05-15

References

[Bug 1319797] [NEW] Restarting destination compute manager during live-migration can cause instance data loss
From: Loganathan Parthipan, 2014-05-15