yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1195947] Re: VM re-scheduler mechanism will cause BDM-volumes conflict

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Alan Pevec <1195947@xxxxxxxxxxxxxxxxxx>
Date: Mon, 16 Dec 2013 20:40:17 -0000
Reply-to: Bug 1195947 <1195947@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: nova/havana
Status: Fix Committed => Fix Released

--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1195947

Title:
VM re-scheduler mechanism will cause BDM-volumes conflict

Status in OpenStack Compute (Nova):
In Progress
Status in OpenStack Compute (nova) havana series:
Fix Released

Bug description:
Due to re-scheduler mechanism, when a user tries to
create (in error) an instance using a volume
which is already in use by another instance,
the error is correctly detected, but the recovery code
will incorrectly affect the original instance.

Need to raise exception directly when the situation above occurred.

------------------------
------------------------
We can create VM1 with BDM-volumes (for example, one volume we called it “Vol-1”).

But when the attached-volume (Vol-1..) involved in BDM parameters to
create a new VM2, due to VM re-scheduler mechanism, the volume will
change to attach on the new VM2 in Nova & Cinder, instead of raise an
“InvalidVolume” exception of “Vol-1 is already attached on VM1”.

In actually, Vol-1 both attached on VM1 and VM2 on hypervisor. But
when you operate Vol-1 on VM1, you can’t see any corresponding changes
on VM2…

I reproduced it and wrote in the doc. Please check the attachment for
details~

-------------------------
I checked on the Nova codes, the problem is caused by VM re-scheduler mechanism:

Now Nova will check the state of BDM-volumes from Cinder now [def
_setup_block_device_mapping() in manager.py]. If any state is “in-
use”, this request will fail, and trigger VM re-scheduler.

According to existing processes in Nova, before VM re-scheduler, it
will shutdown VM and detach all BDM-volumes in Cinder for rollback
[def _shutdown_instance() in manager.py]. As the result, the state of
Vol-1 will change from “in-use” to “available” in Cinder. But,
there’re nothing detach-operations on the Nova side…

Therefore, after re-scheduler, it will pass the BDM-volumes checking
in creating VM2 on the second time, and all VM1’s BDM-volumes (Vol-1)
will be possessed by VM2 and are recorded in Nova & Cinder DB. But
Vol-1 is still attached on VM1 on hypervisor, and will also attach on
VM2 after VM creation success…

---------------

Moreover, the problem mentioned-above will occur when “delete_on_termination” of BDMs is “False”. If the flag is “True”, all BDM-volumes will be deleted in Cinder because the states are already changed from “in-use” to “available” before [def _cleanup_volumes() in manager.py].
(P.S. Success depends on the specific implementation of Cinder Driver)

Thanks~

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1195947/+subscriptions