maas-devel team mailing list archive
-
maas-devel team
-
Mailing list archive
-
Message #00197
Re: Re-architecting without cobbler
On 4 May 2012 06:50, Julian Edwards <...> wrote:
> Hi
>
> Gavin also has a proposal to come, but this is the basis of what I
> discussed with Robert earlier.
My general idea was to enhance the pserv a bit, making it an
autonomous functioning headless provider for its nodes. That would
mean both absorbing Cobbler's function, and becoming stateful (which
kind of comes under the absorb-Cobbler point).
Then, in a large deployment, one of these instances would be set up
for every X nodes:
- X would be high when each node is small and/or where service
degradation is less critical (say, a render farm, or a reseller's
cheap tier service).
- X would be low when the the cost of service degradation - e.g. not
being able to reprovision nodes for a period of time - is high.
- When service degradation is very expensive then the pserv instance
could have the HA hammer applied to it.
In front of this a fairly traditionally HA'd web application would
provide a unified UI. This would also be the place to store whatever
global configuration the cluster needs, like authn. This would be
replicated out to each pserv.
There would also be another front end, a fairly stupid, stateless one,
providing a unified view over the cluster for API clients, though I
would want it to provide exactly the same API as individual pservs.
I'm following a principle that I think Google and many others follow:
expect things to break, and design accordingly. For example, in a
100,000 node deployment, we might have 50 pserv instances, each
capable of running without needing to be directed by a master service
(once boot-strapped). This would provide:
- resilience from many kinds of failure: a failure on any pserv would
prevent reprovisioning in 2% of the cluster; a minor degradation
that can, in many cases, be routed around.
- some resilience from bugs, especially during upgrades. A handful of
pservs can be upgraded at a time and the cluster monitored for
problems.
- zero downtime: rolling upgrades.
- a very high degree of scalability.
My dodgy diagram, attached, and which probably employs zero
pre-existing iconographies, tries to convey some of this.
Attachment:
MAAS Architecture, #2.png
Description: PNG image
Follow ups
References