← Back to team overview

openstack team mailing list archive

Re: snapshots, backups of running VMs and compute node recovery

 

I try to implement a simple way to automate the backup mechanism (eg.
every day): https://blueprints.launchpad.net/nova/+spec/backup-schedule

And I though of a solution to respond to your needs: when a node fails
(for any reasons), I disable it, I delete all servers was running on
it and I restart them from the last available backup.

Édouard.

On Fri, Nov 9, 2012 at 8:45 PM, Vishvananda Ishaya
<vishvananda@xxxxxxxxx> wrote:
>
> The libvirt driver has actually gotten quite good at rebuilding all of the data for instances. This only thing it can't do right now is redownload base images from glance. With current state if you simply back up the instances directory (usually /var/lib/nova/instances) then you can recover by bringing back the whole directory and doing a nova reboot <uuid> for each instance.
>
> You could just stick the whole thing on an lvm and snaphot it regularly for dr. The _base directory can be regenerated with images from glance so you could also write a script to regenerate it and not have to worry about backing it up. The code to add to nova to make it automatically re-download the image from glance if it isn't there shouldn't be too bad either, which would mean you could safely ignore the _base directory for backups. Additionally using qcow images in glance and the config option `force_raw_images=False` will keep this directory much smaller.
>
> Vish
>
>
> On Nov 9, 2012, at 2:51 AM, Jānis Ģeņģeris <janis.gengeris@xxxxxxxxx> wrote:
>
> Hello all,
>
> I would like to know the available solutions that are used regarding to backing up and/or snapshotting running
> instances on compute nodes. Documentation does not mention anything related to this. With snapshots I don't mean
> the current snapshot mechanism, that imports image of the running VM into glance. I'm using KVM, but this is
> significant for any hypervisor.
>
> Why is this important?
> Consider simple scenario when hardware on compute node fails and the node goes down immediately and is not recoverable
> in reasonable time. The images of the running instances are also lost. Shared file system is not considered here as it
> may cause IO bottlenecks and adds another layer of complexity.
>
> There have been a few discussions on the the list about this problem, but none have really answered the question.
>
> The documentation speaks of disaster recovery when power loss have happened and failed compute node recovery from
> shared file system. But don't cover the case without shared file system.
>
> I can think of few solutions currently (for KVM):
> a) using LVM images for VMs, and making LVM logical volume snapshots, but then the current nova snapshot mechanism
> will not work (from the docs - 'current snapshot mechanism in OpenStack Compute works only with instances backed
> with Qcow2 images');
> b) snapshot machines with OpenStack snapshotting mechanism, but this doesn't fit somehow, because it has
> other goal than creating backups, will be slow and pollute the glance image space;
>
> Regards
> --janis
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>


Follow ups

References