← Back to team overview

nova-upgrades team mailing list archive

Re: Upgrades

 

Some comments on John's message are below in red. John, thanks for giving this a kick!

Here are some further thoughts on how to proceed.


1.       Agree on goals. (The goals below are a great start. Need to capture the overall objective of transparent upgrades.)

2.       Define some upgrade scenarios that are felt to be representative of the kinds of upgrades that we will want to make.

a.       API changes

b.      Changes to the network configuration

c.       Upgrades of system components like RabbitMQ, Database

d.      Hypervisor upgrades

e.      Scheduler changes (possibly including changes to changes to data on which scheduling decisions are based)

3.       Define generic upgrade approach. Should include things like sequencing of  components. Require that future milestones support this approach to upgrade from previously released version.

4.       Message versioning

a.       Include upgrades to RabbitMQ and supporting libraries

5.       Database versioning

a.       Include upgrades to MySQL and sqlalchemy

b.      Include changes to things like caching

6.       Network upgrade paths

7.       Coordination with other development groups

8.       Testing - how do we ensure that upgrades are in fact transparent

I welcome comments on the above. I think it's important to pin down items 1 and 2 in order to evaluate work on the other items. I will draft a "goals" document that we can then discuss it. Goals have been previously discussed in the blueprints mentioned below and at the design summit. My objective is to capture the goals in a single place. I'm also going to start working on item 2. Do we have any volunteers for the other items? If people have been working on, for example, database versioning, what progress have you made? What issues have you encountered?

-Ray Hookway (rjh)

From: nova-upgrades-bounces+ray.hookway=hp.com@xxxxxxxxxxxxxxxxxxx [mailto:nova-upgrades-bounces+ray.hookway=hp.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of John Garbutt
Sent: Tuesday, November 08, 2011 1:20 PM
To: 'nova-upgrades@xxxxxxxxxxxxxxxxxxx'
Subject: [Nova-upgrades] Upgrades

Hi,

I just wanted to introduce the (rough) blueprint I drafted before the summit:
https://blueprints.launchpad.net/nova/+spec/upgrade-with-minimal-downtime

There is also a related blueprint from Matt Dietz:
https://blueprints.launchpad.net/nova/+spec/deployability-improvements

There are a few questions I am wondering about:

·         What are people working on right now? Let's talk to stop any duplicate effort!
Defining upgrade sequence and basic approach. My objectives align with yours below.

·         Are there any meetings scheduled yet?
Not yet - we need to get going.

·         Where are we aiming in the Essex timeframe?
Would like to have a transparent upgrade path from Diablo to Essesx. Will take commitment from other workgroups (e.g., database).

·         What requirements/issues do we need to raise with other working groups? (Database clean-up, etc)
Need to review this on a component by component basis.

To start the discussion, here is an idea of end goal I was imagining in in the blueprint:

·         API endpoints (and dashboard) always available during upgrade

o   Using load balancer graceful shutdown

o   No API messages or tasks lost

·         Minimal loss of instance connectivity

o   Use an IP alias for transparent gateway changes (consider keepalived and conntrackd)

o   New style Network HA to reduce the effected number of VMs

·         Minimal loss of volume connectivity

·         Rolling Upgrades of OpenStack components

o   Different versions can co-exist within a single zone
This is the key to transparent upgrades

§   Glance API versions, Message Queue formats, Database Scheme changes, etc.

o   Upgrade each component and/or host in turn to stop large amounts of downtime

o   Ability to migrate the database scheme with minimal disruption

§  Ideally without having to stop connections to the database

o   Support side-by-side upgrades to try and minimize the downtime - is this different from "Different versions can co-exist"?

·         Transparent Hypervisor Upgrades

o   (where possible) live migrate instances to another hypervisor before upgrade

o   In the worst case, consider suspending instances across upgrade

·         Other upgrades

o   MySQL, RabbitMQ and other supporting systems
Need to determine what it takes to do this. Versioning of messages?

·         Support rolling back to the previous version

·         Support upgrades between each milestone release, and between each major release
Not clear to me that transparent upgrades are needed between milestone releases

·         Gating trunk on the ability to upgrade from the previous milestone and previous release
This is really important. Releases that can't be upgraded transparently are not deployable.

Right now there are quite a few things we need to support all this:

·         Graceful service shutdown

o   Service stops listening to Rabbit queues

o   Service then completes all current work

o   Only then does it stop

o   Prevents getting into an inconsistent state, and minimizes the risk of looking like you have lost a message queue message

o   Alternatively, ensure all the services will recover correctly when they are started again on a new machine

·          Allow different versions of nova-compute, nova-scheduler, glance, swift to co-exist

o   Need to define how the Database Scheme / Database layer can evolve between versions

§  Should we upgrade the database before adding any new components?

§  Should we add all the new components before we upgrade the database?

o   Need to define the Message Queue Message formats, and maybe version them

An interim step could well be to support upgrades where we support different zones using different versions. So during the upgrade you will lose just a zone at a time, and not the whole cloud. Would like to be able to upgrade a single zone transparently.

Thanks,
John

Follow ups

References