← Back to team overview

maas-devel team mailing list archive

Re: Re-architecting without cobbler

 

On Wed, May 9, 2012 at 7:47 AM, Gavin Panella
<gavin.panella@xxxxxxxxxxxxx> wrote:
> On 4 May 2012 06:50, Julian Edwards <...> wrote:
>> Hi
>>
>> Gavin also has a proposal to come, but this is the basis of what I
>> discussed with Robert earlier.
>
> My general idea was to enhance the pserv a bit, making it an
> autonomous functioning headless provider for its nodes. That would
> mean both absorbing Cobbler's function, and becoming stateful (which
> kind of comes under the absorb-Cobbler point).

I suspect we need rather a lot of details to assess this accurately.
What state does cobbler manage?

AIUI it manages a dhcp server config and the TFTP bootstrap node
mapping symlinks.

Both of these things are strictly redudant with the metadata MAAS has
about what state a node is in: you can reconstruct the entire config
from first principles from MAAS.

> Then, in a large deployment, one of these instances would be set up
> for every X nodes:
>
> - X would be high when each node is small and/or where service
>  degradation is less critical (say, a render farm, or a reseller's
>  cheap tier service).
>
> - X would be low when the the cost of service degradation - e.g. not
>  being able to reprovision nodes for a period of time - is high.
>
> - When service degradation is very expensive then the pserv instance
>  could have the HA hammer applied to it.
>
> In front of this a fairly traditionally HA'd web application would
> provide a unified UI. This would also be the place to store whatever
> global configuration the cluster needs, like authn. This would be
> replicated out to each pserv.

Beyond state, an additional difference, AIUI, between what you're
thinking and what Julian and I discussed, is whether a change request
will hit pserv directly, or whether pserv is an implementation detail
of the MAAS web frontend (which does both API and HTML where needed).

Specifically, they are identical when you consider having N pservs
(1/X machines to manage), the ability to have pservs on multiple
machines.

They aren't identical from the perspective of upgrades: the model you
propose means that two pserv processes cannot run the same set of
machines concurrently (because pserv would be stateful). A stateless
pserv *can* run active-active controlling the same machines (as long
as config file and symlink updates are done approximately atomically,
which is fairly straightforward).

If thats so, I think there is a fair amount of swings and roundabouts,
until you come to analyse failure modes.

Consider a failed pserv machine. If pserv is stateless, start a new
one up, it just asks MAAS for the full config, exports it, and away we
go. If there is state needed for the operation of the system on the
pserv machine, when its lost, we are in trouble.

> There would also be another front end, a fairly stupid, stateless one,
> providing a unified view over the cluster for API clients, though I
> would want it to provide exactly the same API as individual pservs.
>
> I'm following a principle that I think Google and many others follow:
> expect things to break, and design accordingly. For example, in a
> 100,000 node deployment, we might have 50 pserv instances, each
> capable of running without needing to be directed by a master service
> (once boot-strapped). This would provide:
>
> - resilience from many kinds of failure: a failure on any pserv would
>  prevent reprovisioning in 2% of the cluster; a minor degradation
>  that can, in many cases, be routed around.

This is the same in both proposals.

> - some resilience from bugs, especially during upgrades. A handful of
>  pservs can be upgraded at a time and the cluster monitored for
>  problems.

ditto.

> - zero downtime: rolling upgrades.

This isn't the same - a stateful pserv will have short downtime per
pserv; stateless won't.

> - a very high degree of scalability.

Seems the same to me, except that we don't need to write a stateless
API proxy - so it things to create.

> My dodgy diagram, attached, and which probably employs zero
> pre-existing iconographies, tries to convey some of this.

Perhaps I'm missing something, but I don't see pserv on that diagram?

I don't see any particular a-priori reason to avoid having N
state-maintaining services cooperating to provide MAAS as a whole -
thats very much what I advocate - an SOA approach; but OTOH when you
have a state-maintaining service, that service needs an HA story, it
needs failure-mode management in its clients, it needs a
dealing-with-absent-services story, and it needs a backup story. I
don't think the MAAS dataset is large enough or complex for these
things to be a good tradeoff vs maintaining all your state in a HA
core service, with horizontally scaling helper services interrogating
it as you scale.

I guess the key thing you allude to, is that you could in principle
permit provisioning to happen when the main MAAS server is AWOL, but
that implies some significant complexity around authentication - and a
state synchronisation mechanism for when MAAS itself comes back.

If we come back to the core of MAAS - a single tenant API provider for
provisioning hardware like a cloud, this doesn't seem justified to me:
even a very large environment say 100K nodes) won't have a high
frequency of machine role turnover (100's of machines/minute) :
machines will be brought up and put into openstack or hadoop, and
within that environment get lots of use; periodically maintenance will
happen, gracefully, but thats still going to be something where the
impact of a short outage at the MAAS controller has minimal impact.

(Sketch numbers for my model: each piece of hardware gets deployed for
a month or more at a time, except for staging/test environments which
are a) relatively small and b) torn down and replaced a lot)
100K machines
100K * (at most) 12 -- <= 1.2M allocations a year
                                <= 1.2M deallocations a year
525600 minutes/year
-> about 3 allocation-or-deallocation operations per minute, on average.

A 10 minute outage, is about 30 queued operations.

-Rob


Follow ups

References