← Back to team overview

openstack team mailing list archive

Re: Multi-Cluster/Zone - Devil in the Details ...


On Wed, Feb 16, 2011 at 01:02:22PM -0500, Jay Pipes wrote:
> >> [Sorry, yes the instance name is passed in on the request, but the instance ID is what's needed (assuming of course instance ID is unique across zones.)]
> >
> >        The ID is determined early in the process; well before the request to create an instance is cast onto the queue, and is returned immediately.
> The instance ID is constructed in nova.compute.api.API.create(), which
> is called by nova.api.openstack.servers.Controller.create().
> In other words, Sandy needs to find an appropriate zone to place an
> instance in. Clearly, this logic must happen before the instance is
> created in the database.

On top of this, the instance is created in the DB before being passed
to the scheduler too, which is obviously a problem since the scheduler
may proxy this to another zone, and this old instance row in the
DB is abandoned. We need to not touch the DB until we get to the
compute node, which was what I was working on as a prerequisite for
this blueprint during Bexar. This, as well as some other fundamental
changes, are required before we can move too far along with multi-zone.

We never want to generate the ID any other place besides the final
zone. We should be using a zone-unique ID + zone name for instance
(and other object) naming.

> >        I know that. I'm just stating that this is a natural consequence of the decision not to use a centralized db.
> The data set queried in the database used for a zone only contains
> information necessary for the scheduler workers in that zone to make a
> decision, and nothing more.


> >>>> One alternative is to make Host-Best-Match/Zone-Best-Match stand-alone query operations.
> >>>
> >>>        I don't really like this approach. It requires the requester to know too much about the implementation of the service: e.g, that there are zones, and that an instance will be placed in a particular zone. I would prefer something more along the lines of:
> >>>
> >>> a. User issues a create-instance request, supplying the name of the instance to be created.
> >>> b. The top-level zone that receives the request does a zone-best-match and/or host-best-match call to determine where the instance will be created.
> >>> c. The top-level zone then passes the create-instance request to the selected zone/host.


> Why are we assuming a requester doesn't know much about the
> implementation of the service? I mean, the requester is going to be an
> application like the Cloud Servers console, not just some random user.

But it can be some random user, many folks script against the public API.

>  Of course the requester knows something about the implementation of
> the service, and if they don't, the work Sandy did in the first phase
> of this blueprint allows the requester to query the admin API for
> information about the zones...

Pushing the zone list out to the client just punts on the whole routing
issue. That means the client needs to do the work instead, and need to
either scan or track the zone for each instance they create. Some folks
have said they don't want to expose any of their topology and would
most likely want everything routing through a top-level API endpoint.

For ease of use for the API user, and to accommodate deployments that
don't expose topology, we need to support routing of all requests
inside the parent zones.

> >> [But what about subsequent actions ... the same zone-search would have be performed for each of them, no?]
> >
> >        This was one of the issues we discussed during the sprint planning. I believe (check with cyn) that the consensus was to use a caching strategy akin to DNS: e.g., if zone A got a request for instance ID=12345, it would check to see if it had id 12345 in its cache. If not, it would ask all of its child nodes if they knew about that instance. That would repeat until the instance was found, at which point every upstream server would now know about where to reach 12345.
> Agreed. Each "level" or zone in the overall architecture would cache
> (in our case, cache means a record in the zone's database) information
> about its subordinate nodes (nodes being instances or other zones,
> depending on the "level" of the zone in the overall architecture).

This doesn't help the 'list all instances' search. This would be
very expensive when dealing with a large number of accounts and
instances. We need a more active caching policy, which ends up being
more of a replication subset than a cache. Initially we can just
fanout the query to just make it work, but to be usable in any large
capacity, we need a much smarter data model underneath. These are
all things we discussed at the last design summit, if folks remember
those discussions. :)


Follow ups