← Back to team overview

openstack team mailing list archive

Re: Architecture for Shared Components


Hi Michael,

On Tue, Aug 03, 2010 at 10:06:02AM -0400, Michael Gundlach wrote:
>    But I think we're getting close :)  All three of us have slightly
>    different approaches in mind, but we're narrowing down the areas where we
>    disagree.  I've tried listing our agreements and disagreements below --
>    we'll see if this style is helpful or confusing.

Yes, we're getting close. :)

>    OK, I think we all agree that it is good that
>      * code in the request chain can call out sideways to services
>      * we provide language bindings to those services
>      * the language bindings talk to services using their normal wire
>        protocol, or REST if we write the service ourselves
>      * the language bindings allow dropping in different implementations of
>        the service, including local mocks for testing
>      * it's undesirable to have a lower layer in the request chain have to
>        call up to a higher layer to service a request.

Agree with all the above.

>    Here's where I think we disagree (pardon if I put words in your mouths
>    incorrectly):
>     1. Jorge suggests putting some specific things in the request chain (e.g.
>        caching) which you and I would suggest making into services.

Yes, keep everything as a service. This allows us create new layers
into the chain if needed as well, but it is not part of the required

>     2. Jorge and I would suggest making the request chain longer when
>        possible to simplify each part, while you would suggest collapsing it
>        to reduce the number of hops.

I suggest only making it longer when it is necessary, don't require
components to span multiple layers. Having all services as modules
will will make it trivial to split layers as we need. I see splitting
things into a longer chain as a pre-optimization, and in some cases
not an optimization at all. For example, the overhead of splitting
out a proxy layer for something like rate limiting may not make
sense. The overhead of the proxy (two extra sockets, buffer copying,
...) is fairly expensive compared to just having the check in the API
endpoint directly (consulting a shared data store for current limit
and increment current request type).

>    What makes me uncomfortable
>    about http://oddments.org/tmp/os_module_arch.png is that the "API
>    Endpoint" server is a single rectangle that does multiple things, when we
>    could instead have it be a chain of 2 or 3 rectangles that each do a
>    simpler thing.  For example, if there does exist some basic auth* that can
>    be done on the incoming HTTP request, I would want that teased out of the
>    API Endpoint and put in front of it as a layer that calls sideways to the
>    Auth Service.  Now API Endpoint is a little simpler, and we know that any
>    requests to it have already passed basic auth* checks.  API Endpoint can
>    of course call out to Auth Service again to do more fine grained auth if
>    necessary.

In the .png, imagine the separate boxes inside the API Endpoint
box that represents the different steps, not as a single blob. :)
I understand your concern about keeping the layers separate, and I
completely agree we don't want any spaghetti code/arch coming out of
this. I think place where we don't align yet is how these layers talk
to each other.

Thinking of the request as a state machine and a series of
transformations, I think we all agree that the states need to
be well defined and each state can depend on external services
(like auth). Where I see our difference is how we go about the
transformations from one state to the next. I'm advocating we keep
this generic, and allow for in-process transformations, where the
multi-layer proxy example is forcing a HTTP proxy layer for each

As far as scalability is concerned, there is no reason why SSL
termination, Auth, and rate limiting can't happen within the API
endpoint process. We'll keep each state separate as in the proxy
model, but we'll pass objects in process rather than having to hit
HTTP/buffers/sockets for each step.

If we find we do need to break this layer up (say the caching layer
needs more in-process memory than the API endpoint can provide) we
can replace one of these state transformation with a HTTP proxy. I
just want to default to the simplest, most efficient transformation
by default.

>    You want to avoid extra hops and extra layers for simpler installations to
>    have to think about... maybe WSGI is the solution to this that lets us all
>    meet in the middle.  Build API Endpoint as a stack of as many WSGI servers
>    as possible, so that each one is as simple as possible.  If it turns out
>    we never need to scale an individual piece, we have at least broken a
>    larger problem into smaller ones.  If we do need to scale an individual
>    piece, we can turn the WSGI app into its own layer and scale
>    independently.  What do you (Jorge and Eric) think?

I agree with the concept (what I was stating above), but I'm not sure
yet if WSGI is the best approach. I'd have to look at stacking WSGI
layers before agreeing. :)

>    Summarized as bullet points, can we agree that
>      * large problems should be broken into small problems
>      * if a layer of the request chain can have a piece broken into a WSGI
>        app, let's do that unless there's a good reason not to


>      * by the time we release 1.0, we should figure out which WSGI apps, if
>        any, need to become independent proxies for scaling reasons

Once we get to load testing we should be able to get a good picture
on what we need to break out, if anything. We'll want to capture some
stats on out current production API traffic, see the limits of the
new system, and go from there.

>    Hope this helped and didn't muddy the waters,

It did, thanks for helping break things down! I think we're about on
the same page now, I'm going to go look at how WSGI layers can stack
to see if that is a good fit. :)


Follow ups