← Back to team overview

openstack team mailing list archive

Re: Agreeing a common set of Image Properties

 

On Sat, 7 Apr 2012, Justin Santa Barbara wrote:

> Is there a (de-facto) standard for image metadata/properties?  I'd like to
> be able to able to launch e.g. the Debian Squeeze image provided by the
> cloud.  This is particularly important for clouds that don't allow image
> upload, but likely this will remain useful because different clouds will
> have different tweaks needed (e.g installing the right drivers based on the
> hypervisor).
>
> I could try "smart"-parsing the names, but it seems like metadata is the
> right way to do this, and I see no reason why any cloud would gain any
> advantage from not adopting a common convention.  I know some clouds have
> started implementing their own approaches, but I don't believe anyone is
> locked into anything.
>
> In the interest of efficiency, I'm going to make a proposal for people to
> attack:
>
> 3 main pieces of metadata: os:distro, os:version_major, os:version_minor

I generally disagree that these are the 3 main pieces of meta-data.
It might seems strange that someone with a @distro.com address would say
this, but I believe that "distro" is largely irrelevant.

If I'm looking for a image to run, the thing that is most important to me
is the provider of the image.  In some clouds, with no public images,
the owner of the image is quite likely the cloud provider, and I have
little option other than to trust them on the contents of an image (I
already trust them almost completely by launching an instance there).

However, in a large public cloud (amazon for example) there are literally
thousands of providers, and thousands that claim "Ubuntu", and "10.04",
and would (justifiably) also upload package lists that contained a list
like you suggested later in this thread.

The problem is that I don't trust those image providers, I know nothing
about them, and any metadata provided by them other than GPG signed data
or well-known IDS is not going to change my lack of trust (ie, Canonical
images on EC2 are owned by id 099720109477, redhat publishes under
432018295444 and 309956199498).

The data you're after might be useful to you, and might scratch an itch.
I will not discount that, but I would much prefer a bit of metadata
associated with an image that was signed by an entity I trusted that
identified the image as good.

The ubuntu name strings look like this:
  ubuntu-oneiric-11.10-i386-server-20120401

If I see that string, and content signed by a key that I trust, then I can
be assured that it is what I think it is.

I may well trust 'cloud-provider-foowhiz's signature on that string, and
not require it to be signed by an Ubuntu key. I may not.
I may accept
  centos-6.04-i386-server-20120401
signed by centos.org key to be a replacement for
  rhel-6.04-i386-server-20120401

What i'm getting at, is that I think vendor is more important than os in
the long run (and even the short run).

OS distro, version_major, version_minor are even less important where you
don't care (or know) that your OS came from Canonical or RedHat, what you
were really interested in is running "WhizBang! Fooberator" version 2.0.

One feature thing I think glance could (and probably does provide me with)
is md5/sha1 sums of the images that were uploaded.  If I know that an
image I'm about to run has md5sum that matches that of a published image
on http://cloud-images.ubuntu.com then I'm more willing to trust it (given
that I already trust the cloud provider).

I guess if Canonical provided a signed sum of:
  ubuntu-oneiric-11.10-i386-server-20120401:md5sum

Then the cloud uploader could just upload the existing Canonical sum as a
tag on that image, and then my signature checking would work also.  I
trust the cloud provider out of necessity, and I trust Canonical, so the 2
things work out.

So, I guess in summary,
 * I don't think os:version and the like are encompassing or particularly
   useful by themselves.
 * I can't trust tagged data by itself
 * I wouldn't trust any scraping of package contents by libguestfs, as it
   can most certainly be fooled or broken.

I can see that some tagged info on the contents of the image would be
useful for certain things, but specifically OS specific information is
just not that important.

The vast, *vast* majority of us don't care that e2fsprogs is in the Ubuntu
image, or that it is version 1.4.1-0ubuntu1.  I personally hope that I've
done a good job of making them more interested in the fact that it is an
Ubuntu cloud image.



Follow ups

References