← Back to team overview

maas-devel team mailing list archive

Re: Preparations for scaling out - defaulting

 

Excerpts from Dave Walker's message of Fri May 25 02:26:34 -0700 2012:
> On Thu, May 24, 2012 at 11:44:32AM -0400, Francis J. Lacoste wrote:
> > Hi Julian,
> > 
> > I agree with Michael here, I think we should look into defaulting based
> > on auto-discovered information rather than creating the concept of
> > profile or group which would still mean manual data entry.
> > 
> > But we should probably leave this for after we introduce the hardware
> > database (which we want for better Juju constraints).
> > 
> > Cheers
> > 
> 
> Hi,
> 
> The issue is actually slightly larger than this.  In the case of a
> freshly racked node we should be doing this:
>  - Node is racked and powered on
>  - Node is enlisted
>  - Node is accepted
>  - Node boots into commissioning
>    - Environment tries to discover if it knows of it's own power type
>      - If IPMI:
>        - Request from MAAS sane defaults for the ipmi tables
>        - Set the local IPMI parameters in the server BMC (akin to
>          'burning into firmware')
>        - Post back results.
> 
> However, for things such as PDU/CDU's and some other power management
> types, this isn't going to be easily viable.  This means that we
> probably need 'power type' templates, which can be set in both the
> webui and crucially, the API.
> 
> This means that MAAS needs to be a 'master of information' and a
> 'consumer of information' for power control.
> 

I have been hanging out with the CEPH guys, and they have some interesting
ideas on how to make datacenter life more consistent on large scale
deployments.

One interesting idea they have is to pre-load CEPH's data structures onto
all spare disks for a CEPH cluster. When a disk goes bad, the system will
mark it as such with a red light of some kind, and then the admin does not
have to think about it, she goes to the box, pops the red light/blinking
light/ whatever drive, and inserts one of these pre-formatted drives. CEPH
sees it, adds it to the pool, and starts putting data on it.

MaaS can learn from this model where we can think "what do we have
available".  Clearly if the system is using PDU's rather than IPMI for
power control, the process of assigning which socket<->which machine is
going to be pretty hard to do without some kind of map. Moving the KVM
cables from machine to machine, recording which power cable it is on is
error prone and tedious.

What if we tell MaaS about the PDU ahead of time, and it turns off all
the power that it doesn't already have allocated to machines. Then it
turns on one plug, and you plug one machine in and enlist it. Then the
next, then the next.

If you have parallel operations going on in this same provisioning
network, perhaps MaaS could generate a USB key image that the admin
can use to plug in to the machine being plugged into the PDU. That way
during enlistment you can be sure that the machine with the USB key is
the one plugged into the PDU you're doing discovery on.

There is no room for error. Only the unused PDU ports are powered on. Once
a machine is plugged in, the PDU<->Machine relationship is established
because MaaS knows the plug that it has just turned on that is not
already allocated. The USB key can also add security to this process,
since we can make that a unique key that is required for enlistment,
so that intruders cannot enlist machines onto the network.

Another way, though perhaps more error prone, is to just plug them all in,
and have MaaS discover, one by one, which machines are on which ports
by turning them on one by one. The USB key works here too, as you can
make the admin move the USB key to the next machine when it turns on,
and the process won't continue until she does so.


References