openstack team mailing list archive
Mailing list archive
Re: A single cross-zone database?
We can handle pagination whether we have a single database, multiple
databases with cache, or query each zone on each request. In the last
case an instance would be identified with the zone it exists in (for
example, the marker would be a fully qualified zone:instance name)
and we can just pick up where we left off, using a deterministic order
of zones/instances for all API frontends. I don't think we need this,
we need an active cache with one db per zone (same thing I've been
saying since the Austin summit).
I have a number of issues with using a central DB for this application,
but I'll save my usual rant and focus on a main issue you already
mentioned: hybrid clouds. If someone stands up a large public cloud,
lets say dozens of zones, and customers are allowed to connect their
private cloud to their account (possibly thousands of zones), do folks
expect to use a central db? If so, please explain in detail with how
this will work focusing on scalability and security.
I propose we stick with the original proposal of each zone having
it's own DB and ability for active caching for zones that need it
(aggregate zones). We should be doing active caching so we don't have
staleness issues that Ed mentions. All records should be timestamped
(and indexed) so parent zones can efficiently ask for "all updates
since X" if they need to resync. Child zones will push updates to any
subscribed parent zones which can keep a list that should hardly ever
be out of sync (for listing/pagination/etc.). We should batch updates
between each zone level to ensure efficient data flow.
On Wed, Mar 16, 2011 at 04:45:46PM +0000, Ed Leafe wrote:
> On Mar 16, 2011, at 12:23 PM, Paul Voccio wrote:
> > Not only is this expensive, but there is no way I can see at the moment to do pagination, which is what makes this really expensive. If someone asked for an entire list of all their instances and it was > 10,000 then I would think they're ok with waiting while that response is gathered and returned. However, since the API spec says we should be able to do pagination, this is where asking each zone for all its children every time gets untenable.
> This gets us into the caching issues that were discussed at the last summit. We could run the query and then cache the results at the endpoint, but this would require accepting some level of staleness of the results. The cache would handle the paging, and some sort of TTL would have to be established as a balance between performance and staleness.
> -- Ed Leafe
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp