openstack team mailing list archive

Thread
Date

Re: Architecture for Shared Components

To: Michael Gundlach <michael.gundlach@xxxxxxxxxxxxx>
From: Jorge Williams <jorge.williams@xxxxxxxxxxxxx>
Date: Sat, 31 Jul 2010 12:22:41 -0500
Accept-language: en-US
Acceptlanguage: en-US
Cc: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <31230_1280521417_o6UKNaSu030566_AANLkTikOycE-AANaMEK0xrS61YQrfONA_aEWT0YAUgn8@mail.gmail.com>
Thread-index: Acsw1PvIaK08w4SgR2S+8K2ZmZvjig==
Thread-topic: [Openstack] Architecture for Shared Components

Guys,

I like this idea a lot. I hadn't thought about the concept of using a language binding to communicate with upstream proxies, but it makes sense. Being able to purge something from an HTTP cache by simply making a "purge" call in whatever language I'm using to write my API is a win. That said, I'm not envisioning a lot of communication going upstream in this manner. An authentication proxy service, for example, may need to communicate with an IDM system, but should require no input from the API service itself. In fact, I would try to discourage such communication just to avoid chatter. In cases where this can't be avoided I would require the proxy services expose a rest endpoint so we can take advantage of it even if a binding isn't available.

Thoughts?

-jOrGe W.

On Jul 30, 2010, at 3:23 PM, Michael Gundlach wrote:

Hi Eric,

Well said. Assuming I understood your argument below, I think we are actually on the same page. Thanks for taking the time to explain your position in detail.

I think that you and I are in agreement if the modules that you describe are always simply bindings to another service's API. For example, I completely agree that we should have a caching module that any layer of the stack can use as needed -- but that it should just be a language-specific binding to a memcached or similar, *running somewhere outside the binary*. It may be on localhost, or another machine, or be a load balancer with 10 memcached servers behind it. Similarly, any layer of the stack can use the authorization module, and authentication, etc etc, and they just facilitate calling out to the authorization or authentication services.

I think it would be a *bad* idea if the caching module was just a standalone caching library, which any layer of the stack could "strap on" as needed. Now that program does two jobs: its original purpose, and caching. It's more complex, and you can't scale it as precisely (you end up scaling for the slowest of the two jobs.) If we strapped on authentication, and authorization, etc etc, now this is one complex binary that does everything.

Are we in agreement? Modules provide language bindings to a URI endpoint but don't provide the logic itself?

I think my argument for "proxies" came across as if one layer of binaries should wrap the API, and everything had to happen at that point:

request
|
v
hardware LB
|
v
proxy layer(s)
|
v
API endpoint
|
v
deeper...

and I agree that that would inevitably cause us to bubble too much information up the request chain. Your module argument is more what I had in mind, where individual services still stay simple while being able to make roundtrip calls to other services as needed:

# In this diagram, => means a roundtrip call

request
|
v
hardware LB
|
v
service A # does the absolute minimum it can,
| # then forwards request downstream
v
service B
=> service C # e.g. authentication
=> cache service
=> storage service
=> some database
|
v
service D # e.g. API endpoint
=> cache service
|
v
service E # e.g. more specific endpoint, maybe cloud servers endpoint
=> cache service
=> service F # e.g. authorization
|
v
deeper... # the more links in the chain, the better:
# each link is a simpler piece to code and to understand.

and what I had *thought* you meant looked like this in my head:

request
|
v
hardware LB
|
v
monolithic binary that did more than one thing
|
v
deeper...

In my experience, drawing #2 is the best, and a huge reason is that I can actually explain to you what each service does. I can tell you what inputs he expects, what outputs he expects, and the one thing that he calculates -- and you can just throw more instances of him in place to scale him. Whenever a service has accreted too much functionality, you split his code into two new services, and put them serially in the request chain or have one call the other to do some work.

Let me know if I've misunderstood you again,
Michael

On Fri, Jul 30, 2010 at 3:09 PM, Eric Day <eday@xxxxxxxxxxxx<mailto:eday@xxxxxxxxxxxx>> wrote:
Hi Everyone,

A number of us have been discussing different ways of sharing
components between different projects in OpenStack, such as auth*,
caching, rate limiting, and so on. There are a few ways to approach
this, and we thought it would be best to put this out on the mailing
list for folks to discuss.

The two approaches proposed so far are a proxy layer that would
sit in front of the service APIs and a library/module that would
encapsulate these shared components and allow the services to consume
it at any layer. The problem we are trying to solve is re-usability
across different services, as well as a layer that can scale along
with the service. I'm leaning towards the library approach so I'll
explain that side.

The basic idea is to provide a set of libraries or modules that could
be reused and expose a common API each service can consume (auth,
caching, ...). They will be modular themselves in that they could have
different backends providing each service. These interfaces will need
to be written for each language we plan to support (or written once
in something like C and write extensions on top of it). Tools like
SWIG can help in this area.

The reasoning behind this approach over the proxy is that you're not
forced to answer questions out of context. Having the appropriate
amount of context, and doing checks at the appropriate layer, are key
in building efficient systems that scale. If we use the proxy model,
we will inevitably need to push more service-specific context up
into that layer to handle requests efficiently (URL structure for the
service, peeking into the JSON/XML request to interpret the request
parameters, and so on). I think questions around authorization and
cached responses can sometimes best be handled deeper in the system.

If we have this functionality wrapped in a library, we can make
calls from the service software at any layer (when the context is
relevant). We still solve the re-usability problem, but in a way that
can both be more efficient and doesn't require context to bubble up
into a generic proxy layer.

As for scalability, the libraries provided can use any methods needed
to ensure they scale across projects. For example, if we're talking
about authentication systems, the module can manage caching, either
local or network based, and still perform any optimizations it needs
to. The library may expose a simple API to the applications, but it
can have it's own scalable architecture underneath.

The service API software will already need the ability to scale out
horizontally, so I don't see this as a potential bottleneck. For
example, in Nova, the API servers essentially act as a HTTP<->message
queue proxy, so you can easily start up as many as is needed with
some form of load balancing in front and workers on the other side
of the queues carry out the bulk of the work. Having the service API
also handle tasks like rate limiting and auth should not be an issue.

You could even write a generic proxy layer for services that need it
based on the set of libraries we would use elsewhere in the system.

Having worked on systems that took both approaches in the past, I can
say the library approach was both more efficient and maintainable. I'm
sure we can make either work, but I want to make sure we consider
alternatives and think through the specifics a bit more first.

Thanks,
-Eric

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack@xxxxxxxxxxxxxxxxxxx<mailto:openstack@xxxxxxxxxxxxxxxxxxx>
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse@xxxxxxxxxxxxx<mailto:abuse@xxxxxxxxxxxxx>, and delete the original message.
Your cooperation is appreciated.

Follow ups

Re: Architecture for Shared Components
From: Michael Gundlach, 2010-08-02

References

Architecture for Shared Components
From: Eric Day, 2010-07-30
Re: Architecture for Shared Components
From: Michael Gundlach, 2010-07-30