← Back to team overview

openstack team mailing list archive

Re: A single cross-zone database?

 

Thanks for raising this Sandy: +1 on keeping separate DBs until a problem
arises.

I don't see a performance problem with recursively querying child zones.  I
guess this will partially depend on our zone topology: if the intent is to
have child zones that are geographically distributed where the latency
becomes an issue (and it's not clear to me that even this will be a
problem), then we could use a 'caching' approach rather than the 'total
consistency' approach that a database implies.

It does seem like there's a separate issue, which is finding the appropriate
zone for a given instance.  Again, we could easily use caching here, and
simply retry on cache miss.

What sort of complexity of zones hierarchies are we contemplating?  Even at
our full target scales, I don't believe we're talking about more than a
dozen zones (?), and I believe that even the naive implementation which
simply recursively queries child zones with no caching will be more than
good enough.

Justin



On Wed, Mar 16, 2011 at 7:53 AM, Sandy Walsh <sandy.walsh@xxxxxxxxxxxxx>wrote:

>  Hi y'all, getting any sleep before Feature Freeze?
>
>  As you know, one of the main design tenants of OpenStack is Share Nothing
> (where possible). http://wiki.openstack.org/BasicDesignTenets
>
>  That's the mantra we've been chanting with Zones. But it does cause a
> problem with a particular Use Case:
>
>  *"Show me all Customer X Instances, across all Zones."*
>
>  This is an expensive request. We have to poll all zones and ask them to
> return a list of matching instances.
>
>  There has been some water cooler chat about some things we could do to
> make this more efficient in the near term. One proposal has been to assume a
> single database, replicated across zones. I'll call it SDB for short. With
> SDB we can have a join table that links Zone to Instance ... keeping a
> record of all instances across zones. Maybe it's a completely separate set
> of tables? Maybe it's a separate replicated db? The intention is to let us
> talk to the appropriate zone directly.
>
>  Sure, there are a ton more optimizations we could make if we go further
> with SDB. We could store all the Zone capabilities in the db to make Zone
> selection faster. We could store all the customers in the db to make
> multi-tenant easier. But that's not what we're talking about here. We're
> talking about the* bare minimum *required to make the get_instances query
> fast.
>
>  Conversely, there are issues with a single DB. The largest being the
> implication it has on Bursting (Hybrid Private/Public clouds) ... a pretty
> funky feature imho.
>
>  Personally, I think the same query gains can be obtained by creating a
> separate db using off-the-shelf ETL tools to create cache/read-only db's.
> http://en.wikipedia.org/wiki/Extract,_transform,_load
>
>  I was considering SDB for Zones (phase 4), but for now, I'm going to
> stick with the original plan of separate databases (1 per zone) and see what
> the performance implications are.
>
>  What are your thoughts on this issue?
>
>  ... let the games begin!
>
>  -S
>
>
>  Confidentiality Notice: This e-mail message (including any attached or
> embedded documents) is intended for the exclusive and confidential use of the
> individual or entity to which this message is addressed, and unless otherwise
> expressly indicated, is confidential and privileged information of Rackspace.
> Any dissemination, distribution or copying of the enclosed material is prohibited.
> If you receive this transmission in error, please notify us immediately by e-mail
> at abuse@xxxxxxxxxxxxx, and delete the original message.
> Your cooperation is appreciated.
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>

References