← Back to team overview

nova team mailing list archive

rolling upgrades


The Problem

Push Nova upgrades out to an existing cluster with minimal delay without
dropping requests.

Specific issues:
- ComputeManager's run_instance can take quite a while to perform, we don't
  want to pause incoming requests until they have all finished before
  the Manager.

Rollin Rollin Rollin

In the general case most of our infrastructure is already rather resilient
downtime due to our use of AMQP, but a few things probably need to be added.

An ideal scenario

0. Execute an upgrade command
1. New code is fetched and installed (apt-get upgrade)
2. Send a SIGTERM to ComputeManager process
3. ComputeManager stops ACKing requests from the queue
4. ComputeManager SIGTERMs its Worker processes
5. Worker processes stop ACKing requests from the queue (filled only by
6. ComputeManager exits.
7. Supervisor process automatically restarts it || the command restarts it
8. When the worker has no more pending jobs it exits.
9. When ComputeManager restarts it fills the Worker pool with new Workers as
   old ones exit.
10. As soon as there is a fresh Worker, ComputeManager begins farming work
    it, starting with anything already queued.

How to get there

0. Managers need to listen for SIGTERM and manage it.
   This is straightforward with python's signal module.

1. Managers need access to their queue consumers so that they can stop them.
   This should be a relatively minor change in service.py and manager.py

2. Managers need to internally keep track of outstanding async calls.
   A DeferredQueue is probably enough, so that it can delay exiting until
   queue is exhausted.

3. ComputeManager, specifically, needs to have detached Worker instances.
   Forking may have some issues with Twisted so some testing will need to be
   done to verify.

4. ComputeManager, specifically, needs to communicate with Worker instances.
   This should be fairly straightforward using AMQP routing and topics.

5. ComputeManager, specifically, needs to know how many old workers exist.
   This could be as simple as writing PIDs to disk named with a UUID decided
   upon at manager start (so all the workers started by a given manager will
   have the same ID, which would not match the restarted manager). There is
   probably some other clever linux hack that will do the same thing.

6. It seems that all non-ComputeManager services besides the public API can
   get by with just #0 through #2, upgrading the public API is out of scope
   for this proposal.

Bonus: We can minimize the backlog for any given ComputeManager by being
       to drop its priority in Scheduler before initiating the upgrade.



Follow ups