openstack team mailing list archive

Thread
Date

Public image repository via a synchronization script

To: openstack@xxxxxxxxxxxxxxxxxxx
From: Justin Santa Barbara <justin@xxxxxxxxxxxx>
Date: Wed, 25 Apr 2012 13:29:49 -0700

I think it would be very convenient to have public image repositories and
an easy way to use them, so someone installing OpenStack can easily get
images into Glance with minimal work.

At the design summit, there were a number of talks which all seemed to
hinge on how to get an image into glance from a public store. I'd like to
make a proposal to get the ball rolling, which we can collectively pick
apart to figure out the best way.

1. A public image repository lives on an HTTP / FTP / ... site and can
be dumb. (Glance could one day be the server here as well, but this should
also work on a CDN or simple webserver etc).
2. Each image is stored as a (possibly compressed) file.
3. Each image has a .properties file alongside it.
4. There is a 'directory' file which lists the property files. An
md5sum of the directory is easy, and the hashes be useful generally.
5. If there are different sets of images, we have multiple directory
files. They can point to the same images.
6. We don't try to solve security initially. But this could be as
simple as signing the directory files, if they had hashes of all the other
files (e.g. md5sum).

Then we have a client-side utility which takes a property file URL,
download it, and copies the associated image into Glance if it doesn't yet
exist. It can verify checksums, and it uploads any image properties which
are present.

Finally, a simple bash script can download the directory file and call the
utility for each image.

This script could be run as a cron job if desired, to download new images
e.g. as Canonical publishes them.

---

An example 'properties' file might look like this:

org.openstack.sync__1__image=debian_squeeze_20120425.qcow2.gz
org.openstack.sync__1__expand=gzip
org.openstack.sync__1__size=627965952
org.openstack.sync__1__checksum=224237ee1b9a341ac7d79bcebda0580d
disk_format=qcow2
container_format=bare
is_public=True
org.openstack__1__os_distro=debian.org
org.openstack__1__os_version=6.0.4
org.openstack__1__architecture=x64
org.openstack__1__updated_at=20120425

A few things to note:

- The org.openstack.sync__ prefix is used for controlling the mirror
script. It specifies the path to the image, the uncompressed size, the
compression technique, the checksum etc.
- The image can be compressed using e.g. gzip. My 8GB raw image became
a 600MB qcow2 image, which became a 220MB gzip image. A bzip2 image was
200MB (10% smaller), but bzip2 does not have as easy decompression in many
languages.
- Properties which aren't in the org.openstack.sync__ prefix become
image properties in glance
- This example is using the common image properties, which I've just got
around to writing up: http://wiki.openstack.org/CommonImageProperties

---

I've implemented this in the Java bindings, and it works great:

https://github.com/platformlayer/openstack-java/blob/master/bin/image-sync
https://github.com/platformlayer/openstack-java/blob/master/openstack-cli/src/main/java/org/openstack/client/cli/commands/MirrorImage.java

Called with...

bin/image-sync http://images.platformlayer.org/ --prefix
platformlayerImages/

This then syncs the images, prefixing the names to avoid collisions. And
yes, there is a real image up there - check out
http://images.platformlayer.org/md5sum ...

Justin

---

Justin Santa Barbara
Founder, FathomDB

Follow ups

Re: Public image repository via a synchronization script
From: Scott Moser, 2012-04-27