openstack team mailing list archive

Thread
Date
Re: Instance IDs and Multiple Zones

To: Erik Carlin <erik.carlin@xxxxxxxxxxxxx>
From: Justin Santa Barbara <justin@xxxxxxxxxxxx>
Date: Tue, 22 Mar 2011 19:42:58 -0700
Cc: "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <27513_1300842182_p2N12ucM028382_C9AE9CA8.C0E6%erik.carlin@rackspace.com>
> Pure numeric ids will not work in a federated model at scale.
>

Agreed


> Maybe I'm missing something, but I don't see how you could inject a
> collision ID downstream - you can just shoot yourself in your own foot.


I think that you can get away with it only in simple hierarchical
structures.  Suppose cloud users are combining multiple public clouds into
their own 'megaclouds'.  If I'm an evil public cloud operator, I can start
handing out UUIDs that match any UUIDs I can discover on the Rackspace
cloud, and anyone that has constructed a cloud that combines my cloud and
Rackspace would have collisions.  Users wouldn't easily know who to blame
either.

The other option apart from UUID is a globally unique string prefix.  If
> Rackspace had 3 global API endpoints (ord, dfw, lon) each with 5 zones,
> the ID would need to be something like rax:dfw:1:12345 (I would actually
> want to hash the zone id "1" portion with something unique per customer so
> people couldn't coordinate info about zones and target attacks, etc.).
> This is obviously redundant with the Rackspace URI since we are
> representing Rackspace and the region twice, e.g.
> http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789.
>

I am in favor of this option, but with a few tweaks:

1) We use DNS, rather than inventing and administering our own scheme
2) I think the server ID looks like
dfw.rackspace.com/servers/a282-a6-cj7aks89.  It's not necessarily a valid
HTTP endpoint, because there's a mapping to a protocol request
3) The client maps it by "filling in" the http/https protocol (or whatever
protocol it is e.g. direct queuing), and it fills in v1.1 because that's the
dialect it speaks.
4) Part of the mapping could be to map from a DNS name to an endpoint,
perhaps using _srv records (I'm sure I'm mangling all the terminology here)
5) This also allows a form of discovery ... if I tell my cloud controller I
want to use rackspace.com, it can then look up the _srv record, find the
endpoint (e.g. openstack.rackspace.com), then do a zone listing request and
find child zones etc.  If I ask my monitoring system to monitor "
rackspace.com/servers/a6cj7aks89", it knows how to map that to an openstack
endpoint.  Auth is another story of course.

Using strings also means people could make ids whatever they want as long
> as they obeyed the prefix/suffix.  So one provider could be
> rax:dfw:1:12345 and another could be osprovider:8F792#@*jsn.  That is
> technically not a big deal, but there is something for consistency and
> simplicity.


True.  We could restrict the character set to A-Z,0-9 and a few other 'safe
characters' if this is a real problem.  We probably should eliminate
difficult-to-encode characters anyway, whether encoding means umlauts or
url-encoding.


> The fundamental problem I see here is URI is intended to be the universal
> resource identifier but since zone federation will create multiple URIs
> for the same resource, the server id now has to be ANOTHER universal
> resource identifier.
>

I think the server ID should be the unique identifier, and is more important
than the REST representation.  I think we should avoid remapping the URI
unless we have to... (more later)

It will be obvious in which deployment the servers live.  This will
> effectively prevent whitelabel federating.  UUID would be more opaque.
>

Whitelabel federation for reselling an underlying provider can easily be
done by rewriting strings: id.replace("rackspace.com", "a.justinsbcloud.com
").replace("citrix.com", "b.justinsbcloud.com").  I suspect the same
approach would work for internal implementation zones also.  The truly
dedicated will discover the underlying structure whatever scheme you put in
place.

Would users ever get returned a "downstream" URI they could hit directly?
>

So now finally I think I can answer this (with my opinion)... Users should
usually get the downstream URI.  Just like DNS, they can either use that URI
directly, or - preferably - use a local openstack endpoint, which acts a bit
like a local DNS resolver.  Your local openstack "proxy" could also do
things like authentication, so - for example - I authenticate to my local
proxy, and it then signs my request before forwarding it.  This could also
deal with the billing issue - the proxy can do charge-back and enforce
internal spending limits and policies, and the public clouds can then bill
the organization in aggregate.

If you need the proxy to sign your requests, then you _can't_ use the
downstream URI directly, which is a great control technique.

Some clouds will want to use zones for internal operational reasons, and
will want to keep the inner zones secret.  So there, we need something like
NAT: the front-end zone translates between "public" and "private" IDs as
they travel in and out.  How that translation works is
deployment-dependent... they could have a mapping database, or could try to
figure out a function which is aware of their internal structure to do this
algorithmically.

Let me try an example:

Consider someone like Rackspace.  Rackspace has geographically distributed
datacenters: DFW, ORD, LON etc  In addition, within each datacenter there
will be public sub-zones (floor1, floor2, floor3), so I can be sure that my
failover machine isn't on the same switch or host as my primary.  Then, a
public sub-zone may have further private sub-zones for operational reasons -
room1, room2, room3.

rackspace.com
    ORD
        ...
    LON
        ...
    DFW
        floor1
            ...
        floor2
            ...
        floor3
            room1
            room2
            room3

Now, I don't want room1, room2 and room3 to be visible, for whatever reason.
 So the parent zone remaps child resources as they pass through, so room1/AB
=> 1AB and room2/CD <=> 2CD (keeping it simple!)

The floors are customer visible, so I can either do floor1.???/servers/2CD
or ???/servers/floor1/2CD.  I think the determining factor is that here I
don't want customers hitting the floor1 endpoint directly (I probably
haven't exposed it), so I choose the latter: ???/servers/floor1/2CD

I have the same choice again at the datacenter level.
 dfw.???/servers/floor1/2CD or ???/servers/dfw/floor1/2CD.  Here, I think I
probably want to allow customers to hit the DFW endpoint directly, so I
choose the former: dfw.???/servers/floor1/2CD

I want dfw.??? to resolve to the endpoint (possibly via an _srv record), so
it is dfw.rackspace.com, so our server ID is:
dfw.rackspace.com/servers/floor1/2CD

Rackspace can also choose to operate a cloud-aggregator endpoint
rackspace.com, which combines all the datacenters into a unified view.  If I
pass it a server ID like dfw.rackspace.com/servers/floor1/2CD, it knows
immediately to forward it to the dfw.rackspace.com endpoint.  It doesn't
need to rewrite this ID... there's no reason to do so.  If something
directly hit the dfw.rackspace.com endpoint that actually needed to take
place "globally", DFW could forward the request "up" (I'm not sure whether
this would actually happen in practice - maybe a cross-datacenter migration)

The beauty is that the rackspace.com proxy is largely the same code as a
private cloud proxy, which is much the same code as the zone controllers at
every level.  There are some differences in terms of whether we're remapping
IDs, but those differences are relatively small.  (Imaginary coding is
always easy though!)

A potential tricky issue is in e.g. the cloudbursting scenario, when a user
gets a cloud server; should they get a remapped ID or not, if we want them
always to go through a proxy (e.g. for auth/billing)?  I think there are two
viable answers here.  Either they should always get the 'true' ID, but the
client needs to have a config option that says 'always hit the local proxy
endpoint'.  Or the local proxy should remap it if it isn't happy with
relying on clients to get it right... so
dfw.rackspace.com/servers/floor1/2CD becomes
cloud.megacorp.com/servers/dfw.rackspace.com/servers/floor1/2CD.  I think I
prefer the latter - less scope for error.


Justin
References

Re: Instance IDs and Multiple Zones
From: Eric Day, 2011-03-22
Re: Instance IDs and Multiple Zones
From: Erik Carlin, 2011-03-23