nova team mailing list archive
Mailing list archive
Push Nova upgrades out to an existing cluster with minimal delay without
- ComputeManager's run_instance can take quite a while to perform, we don't
want to pause incoming requests until they have all finished before
Rollin Rollin Rollin
In the general case most of our infrastructure is already rather resilient
downtime due to our use of AMQP, but a few things probably need to be added.
An ideal scenario
0. Execute an upgrade command
1. New code is fetched and installed (apt-get upgrade)
2. Send a SIGTERM to ComputeManager process
3. ComputeManager stops ACKing requests from the queue
4. ComputeManager SIGTERMs its Worker processes
5. Worker processes stop ACKing requests from the queue (filled only by
6. ComputeManager exits.
7. Supervisor process automatically restarts it || the command restarts it
8. When the worker has no more pending jobs it exits.
9. When ComputeManager restarts it fills the Worker pool with new Workers as
old ones exit.
10. As soon as there is a fresh Worker, ComputeManager begins farming work
it, starting with anything already queued.
How to get there
0. Managers need to listen for SIGTERM and manage it.
This is straightforward with python's signal module.
1. Managers need access to their queue consumers so that they can stop them.
This should be a relatively minor change in service.py and manager.py
2. Managers need to internally keep track of outstanding async calls.
A DeferredQueue is probably enough, so that it can delay exiting until
queue is exhausted.
3. ComputeManager, specifically, needs to have detached Worker instances.
Forking may have some issues with Twisted so some testing will need to be
done to verify.
4. ComputeManager, specifically, needs to communicate with Worker instances.
This should be fairly straightforward using AMQP routing and topics.
5. ComputeManager, specifically, needs to know how many old workers exist.
This could be as simple as writing PIDs to disk named with a UUID decided
upon at manager start (so all the workers started by a given manager will
have the same ID, which would not match the restarted manager). There is
probably some other clever linux hack that will do the same thing.
6. It seems that all non-ComputeManager services besides the public API can
get by with just #0 through #2, upgrading the public API is out of scope
for this proposal.
Bonus: We can minimize the backlog for any given ComputeManager by being
to drop its priority in Scheduler before initiating the upgrade.