← Back to team overview

nova-upgrades team mailing list archive

Upgrades

 

Hi,

I just wanted to introduce the (rough) blueprint I drafted before the summit:
https://blueprints.launchpad.net/nova/+spec/upgrade-with-minimal-downtime

There is also a related blueprint from Matt Dietz:
https://blueprints.launchpad.net/nova/+spec/deployability-improvements

There are a few questions I am wondering about:

·         What are people working on right now? Let's talk to stop any duplicate effort!

·         Are there any meetings scheduled yet?

·         Where are we aiming in the Essex timeframe?

·         What requirements/issues do we need to raise with other working groups? (Database clean-up, etc)

To start the discussion, here is an idea of end goal I was imagining in in the blueprint:

·         API endpoints (and dashboard) always available during upgrade

o   Using load balancer graceful shutdown

o   No API messages or tasks lost

·         Minimal loss of instance connectivity

o   Use an IP alias for transparent gateway changes (consider keepalived and conntrackd)

o   New style Network HA to reduce the effected number of VMs

·         Minimal loss of volume connectivity

·         Rolling Upgrades of OpenStack components

o   Different versions can co-exist within a single zone

§   Glance API versions, Message Queue formats, Database Scheme changes, etc.

o   Upgrade each component and/or host in turn to stop large amounts of downtime

o   Ability to migrate the database scheme with minimal disruption

§  Ideally without having to stop connections to the database

o   Support side-by-side upgrades to try and minimize the downtime

·         Transparent Hypervisor Upgrades

o   (where possible) live migrate instances to another hypervisor before upgrade

o   In the worst case, consider suspending instances across upgrade

·         Other upgrades

o   MySQL, RabbitMQ and other supporting systems

·         Support rolling back to the previous version

·         Support upgrades between each milestone release, and between each major release

·         Gating trunk on the ability to upgrade from the previous milestone and previous release

Right now there are quite a few things we need to support all this:

·         Graceful service shutdown

o   Service stops listening to Rabbit queues

o   Service then completes all current work

o   Only then does it stop

o   Prevents getting into an inconsistent state, and minimizes the risk of looking like you have lost a message queue message

o   Alternatively, ensure all the services will recover correctly when they are started again on a new machine

·          Allow different versions of nova-compute, nova-scheduler, glance, swift to co-exist

o   Need to define how the Database Scheme / Database layer can evolve between versions

§  Should we upgrade the database before adding any new components?

§  Should we add all the new components before we upgrade the database?

o   Need to define the Message Queue Message formats, and maybe version them

An interim step could well be to support upgrades where we support different zones using different versions. So during the upgrade you will lose just a zone at a time, and not the whole cloud.

Thanks,
John

Follow ups