← Back to team overview

openstack team mailing list archive

instance evacuation from a failed node (rebuild for HA)

 

Dear all,

We have submitted a patch https://review.openstack.org/#/c/11086/ to 
address https://blueprints.launchpad.net/nova/+spec/rebuild-for-ha that 
simplifies recovery from a node failure by introducing an API that 
recreates an instance on *another* host (similar to the existing instance 
'rebuild' operation). The exact semantics of this operations varies 
depending on the configuration of the instances and the underlying storage 
topology. For example, if it is a regular 'ephemeral' instance, invoking 
will respawn from the same image on another node while retaining the same 
identity and configuration (e.g. same ID, flavor, IP, attached volumes, 
etc). For instances running off shared storage (i.e. same instance file 
accessible on the target host), the VM will be re-created and point to the 
same instance file while retaining the identity and configuration. More 
details are available at http://wiki.openstack.org/Evacuate. 

Note that the API must be manually invoked today. 

In addition, this patch modifies nova-compute such that on startup (e.g., 
after it failed and recovered) it verifies with the DB that it is still 
the owner of an instance before starting the VM.

Would be great to hear whether people think that such a capability is 
important to push into Folsom, despite the short runway till F3. Any other 
thoughts/recommendations regarding such capability would be also highly 
appreciated.

Thanks,
Alex

====================================================================================================
Alex Glikson
Manager, Cloud Operating System Technologies, IBM Haifa Research Lab
http://w3.haifa.ibm.com/dept/stt/cloud_sys.html | 
https://www.research.ibm.com/haifa/dept/stt/cloud_sys.shtml 
Email: glikson@xxxxxxxxxx | Phone: +972-4-8281085 | Mobile: 
+972-54-6466667 | Fax: +972-4-8296112

Follow ups