← Back to team overview

maas-devel team mailing list archive

Re: Re-architecting without cobbler

 

This sounds broadly fine, I want to clarify a couple of small points (below).

On Fri, May 4, 2012 at 5:50 PM, Julian Edwards
<julian.edwards@xxxxxxxxxxxxx> wrote:
> Hi
>
> Gavin also has a proposal to come, but this is the basis of what I discussed
> with Robert earlier.
>
> The main aim is hyperscale. That is, we want to support hundreds of thousands
> of nodes.  Cobbler simply doesn't allow us to do that.  In addition we are
> currently tied into driving it synchronously so that we can be sure it synced
> its own database, which causes bugs like:
> https://bugs.launchpad.net/maas/+bug/989355
>
> The other main issue surrounds DHCP/TFTP and PXE booting.  When we write out
> config files for these services we need to make sure there are no conflicts
> with multiple processes/threads doing the same thing.
>
> Finally, we need to separate appserver requests from provisioning tasks
> because the latter can be long running (as per the above bug).
>
> To this end we discussed:
>
>  * Use Celery as a task queue (can use Rabbit or Django's DB as a broker)
>  → Django has a Celery plugin which makes this very easy
>  * Have one queue per pserv process
>  * Each pserv is responsible for servicing the tasks on that queue
>  * It scales by having multiple queues each with another pserv, or just
> multiple pservs pulling from one queue.
>  * The pserv can read the database by using Django's ORM, and write to it by
> sending API requests back to Django's appserver threads (again, scalable).

I would discourage, *strongly* discourage, any direct DB access from
pserv: our experience with LP with such access has been universally
bad. Let the appserver drive the DB exclusively, and offer appropriate
APIs for getting stuff from/to it. I think we glossed over this on
IRC; celery talking to postgresql might mean this needs some extra
glue for celery, or something.

> We also talked about batching requests on the Rabbit task queues because we
> might want to accumulate several requests that change the configs for DHCP
> etc.  For now though we can process serially and optimise later.  We'd need to
> think about how to implement locking though if there are multiple writers to
> the same files.

If we were to use rabbit to do this batching, you wouldn't need
special locking - pserv would have a single amqp client internally
which serially handled the config change items, so only one thing
would be writing to those files at a time. (I'm not pushing for rabbit
here, I think we can batch just fine with celery; am merely
clarifying).

-Rob


Follow ups

References