← Back to team overview

maas-devel team mailing list archive

Re: Re-architecting without cobbler

 

On 9 May 2012 02:41, Robert Collins <...> wrote:
> On Wed, May 9, 2012 at 7:47 AM, Gavin Panella <...> wrote:
>> On 4 May 2012 06:50, Julian Edwards <...> wrote:
>>> Hi
>>>
>>> Gavin also has a proposal to come, but this is the basis of what I
>>> discussed with Robert earlier.
>>
>> My general idea was to enhance the pserv a bit, making it an
>> autonomous functioning headless provider for its nodes. That would
>> mean both absorbing Cobbler's function, and becoming stateful (which
>> kind of comes under the absorb-Cobbler point).
>
> I suspect we need rather a lot of details to assess this accurately.
> What state does cobbler manage?
>
> AIUI it manages a dhcp server config and the TFTP bootstrap node
> mapping symlinks.
>
> Both of these things are strictly redudant with the metadata MAAS has
> about what state a node is in: you can reconstruct the entire config
> from first principles from MAAS.

It records whether a machine has netbooted successfully, once
allocated, so that it doesn't get reinstalled on next boot. I'm not
sure how exactly it determines this; it might be a post-install wget
callback.

Anyway, this isn't really relevant: I was proposing storing all node
state - each partition of nodes would have a fully functional... gah,
I *wish* I had sorted out the nomenclature around this before!

Aside: what we've been referring to as maas is infact the web UI
server, the web API server, and the metadata server. The current
implemetation of these is somewhat intertwined. What I have been
referring to as the future pserv is more like (web API + metadata +
cobbler-assimilated).

Back to the point: I'm suggesting that each partition have a fully
functional (web API + metadata + cobbler-assimilated).

>
>> Then, in a large deployment, one of these instances would be set up
>> for every X nodes:
>>
>> - X would be high when each node is small and/or where service
>>  degradation is less critical (say, a render farm, or a reseller's
>>  cheap tier service).
>>
>> - X would be low when the the cost of service degradation - e.g. not
>>  being able to reprovision nodes for a period of time - is high.
>>
>> - When service degradation is very expensive then the pserv instance
>>  could have the HA hammer applied to it.
>>
>> In front of this a fairly traditionally HA'd web application would
>> provide a unified UI. This would also be the place to store whatever
>> global configuration the cluster needs, like authn. This would be
>> replicated out to each pserv.
>
> Beyond state, an additional difference, AIUI, between what you're
> thinking and what Julian and I discussed, is whether a change request
> will hit pserv directly, or whether pserv is an implementation detail
> of the MAAS web frontend (which does both API and HTML where needed).

I'm suggesting splitting the API responsibilities from the UI
responsibilities.

>
> Specifically, they are identical when you consider having N pservs
> (1/X machines to manage), the ability to have pservs on multiple
> machines.
>
> They aren't identical from the perspective of upgrades: the model you
> propose means that two pserv processes cannot run the same set of
> machines concurrently (because pserv would be stateful). A stateless
> pserv *can* run active-active controlling the same machines (as long
> as config file and symlink updates are done approximately atomically,
> which is fairly straightforward).

Two pserv's could run on the same set of machines, assuming they both
use PostgreSQL for state. Just like the stateless ones they would
still need to be careful about changing the filesystem.

>
> If thats so, I think there is a fair amount of swings and roundabouts,
> until you come to analyse failure modes.
>
> Consider a failed pserv machine. If pserv is stateless, start a new
> one up, it just asks MAAS for the full config, exports it, and away we
> go. If there is state needed for the operation of the system on the
> pserv machine, when its lost, we are in trouble.

Yes, indeed, but only a small portion of the maas cluster is degraded;
the problem is localised to the extent that it can be routed around
for new provisioning requests, and it'll be small enough that manual
intervention by ops is not unreasonable.

In the model where all data is centralised, what happens when the
central database hits a logical fault and needs to be restored from
backup? No one can reprovision anything in the whole cluster, and it's
beyond manual intervention.

>
>> There would also be another front end, a fairly stupid, stateless one,
>> providing a unified view over the cluster for API clients, though I
>> would want it to provide exactly the same API as individual pservs.
>>
>> I'm following a principle that I think Google and many others follow:
>> expect things to break, and design accordingly. For example, in a
>> 100,000 node deployment, we might have 50 pserv instances, each
>> capable of running without needing to be directed by a master service
>> (once boot-strapped). This would provide:
>>
>> - resilience from many kinds of failure: a failure on any pserv would
>>  prevent reprovisioning in 2% of the cluster; a minor degradation
>>  that can, in many cases, be routed around.
>
> This is the same in both proposals.

I don't see how, though I can see how it might seem so because I
wasn't clear; I was using pserv to now mean (web API + metadata +
cobbler-assimilated) without stating it.

>
>> - some resilience from bugs, especially during upgrades. A handful of
>>  pservs can be upgraded at a time and the cluster monitored for
>>  problems.
>
> ditto.

Ditto :)

>
>> - zero downtime: rolling upgrades.
>
> This isn't the same - a stateful pserv will have short downtime per
> pserv; stateless won't.

I meant zero downtime across the cluster as a whole. Individual parts
may blip but the cluster as a whole stays available. Even large schema
changes cause only a degradation of service in one partition at a
time.

>
>> - a very high degree of scalability.
>
> Seems the same to me, except that we don't need to write a stateless
> API proxy - so it things to create.
>
>> My dodgy diagram, attached, and which probably employs zero
>> pre-existing iconographies, tries to convey some of this.
>
> Perhaps I'm missing something, but I don't see pserv on that diagram?

Yeah, sigh, I f**ked up. The big box named MAAS with the cloud haircut
was meant to be (web API + metadata + cobbler-assimilated).

>
> I don't see any particular a-priori reason to avoid having N
> state-maintaining services cooperating to provide MAAS as a whole -
> thats very much what I advocate - an SOA approach; but OTOH when you
> have a state-maintaining service, that service needs an HA story, it
> needs failure-mode management in its clients, it needs a
> dealing-with-absent-services story, and it needs a backup story. I
> don't think the MAAS dataset is large enough or complex for these
> things to be a good tradeoff vs maintaining all your state in a HA
> core service, with horizontally scaling helper services interrogating
> it as you scale.

Okay, that's fair. I think it will be a problem eventually. Servers
are inexorably getting smaller.

>
> I guess the key thing you allude to, is that you could in principle
> permit provisioning to happen when the main MAAS server is AWOL, but
> that implies some significant complexity around authentication - and a
> state synchronisation mechanism for when MAAS itself comes back.

I don't think any state synchronisation would be necessary. Well... in
one direction only: whatever global state is needed should be pushed
out and/or pulled by the (API+...) services. It should never move the
other way.

Coming out of Oakland seems to be the message that MAAS should have a
simpler - than now, even - user management story, which reduces this
problem further.

Overall, I'm suggesting not putting the important parts all in one
place, and instead putting a unified API front (which would be the
stupid stateless bit) on a bunch of (API+) services.

For example, a request for a node might go something like this: Stupid
Front End asks each partition - each (API+...) service - for a node
matching the user's criteria. This might be done via a message queue,
broadcast, RPC, whatever, doesn't matter, but the first or best answer
wins.

>
> If we come back to the core of MAAS - a single tenant API provider for
> provisioning hardware like a cloud, this doesn't seem justified to me:
> even a very large environment say 100K nodes) won't have a high
> frequency of machine role turnover (100's of machines/minute) :
> machines will be brought up and put into openstack or hadoop, and
> within that environment get lots of use; periodically maintenance will
> happen, gracefully, but thats still going to be something where the
> impact of a short outage at the MAAS controller has minimal impact.
>
> (Sketch numbers for my model: each piece of hardware gets deployed for
> a month or more at a time, except for staging/test environments which
> are a) relatively small and b) torn down and replaced a lot)
> 100K machines
> 100K * (at most) 12 -- <= 1.2M allocations a year
>                                <= 1.2M deallocations a year
> 525600 minutes/year
> -> about 3 allocation-or-deallocation operations per minute, on average.
>
> A 10 minute outage, is about 30 queued operations.

An imagined MAAS reseller, and its reputation, would probably want
better. Also, it's a cloud-like environment; if a machine can be
deployed in a few minutes then they'll be used like people use
instances in AWS, i.e. a lot more provisioning operations that you've
guessed at.


Follow ups

References