← Back to team overview

launchpad-dev team mailing list archive

Making Packages, .deb update downloads faster

 

I have been finding ways to optimize update downloads. There are two parts
to this:

* Updating the list of packages in an apt repository (.../Packages.gz).

* Downloading .debs with updates to the system.

In both cases, the user has most of the data already on the local
system, but the current download mechanism downloads everything anyway.

I did some benchmarking. See [1] and [2] for further details, but in
summary: zsync for Packages.gz and debdelta for .debs would allow the
user to download much less data, making the update process go faster for
them. The savings should be on the order of 75-99% for Packages.gz, and
40% for .debs. For people with limited bandwidth, either due to speed or
download caps, these are quite significant savings. The rest of this
e-mail discusses the implementation.

    [1] https://lists.ubuntu.com/archives/ubuntu-devel/2009-July/028568.html
    [2] https://lists.ubuntu.com/archives/ubuntu-devel/2009-July/028600.html

I'm e-mailing this to ask the following question:

    How long would it take to get the Launchpad changes done?
    
I am not asking for a firm commitment; I appreciate that the LP team is busy 
with a lot of things and has a number of conflicting external requests. Since 
there are two sides to this, it is a bit unnecessary to start the work on apt, 
until it's known approximately when Launchpad will add the support too.

(I'm making the bold assumption that the Launchpad changes are uncontroversial.)


Updating Packages.gz with zsync
-------------------------------

Zsync[3] is an implementation of the rsync algorithm, which pre-computes
the block signatures and stores them in a .zsync file. It uses HTTP
range requests on the client side to retrieve parts of file from the server
and therefore does not require any special server side software, just
a standard HTTP server.

    [3] http://zsync.moria.org.uk/

Zsync has some magic to allow it to do updates also on gzipped files.
This lets us avoid putting up uncompressed Packages files.

To implement this, the following needs to happen:

    * IS needs to install zsync on the relevant machine(s).

    * Launchpad should generate Package.gz using the --rsyncable option
      to gzip. This improves zsync's magic for updates gzipped files and
      speeds up downloads. The space impact should be very small, on the
      order of a few percent at most.

    * Launchpad needs to generate a Packages.gz.zsync file:
        zsyncmake -e -Z -u Packages.gz Packages.gz
      (-e makes sure the resulting file is bit-by-bit identical to what
      the server has; this is required for checksums).
      
      Publication of the Packages.gz.zsync file should be synchronized
      with publication of the corresponding version of the Package.gz
      file: if one is updated they both should be.

    * apt needs to be modified to use zsync to download the Packages.gz
      file if Packages.gz.zsync exists.


Downloading update .debs with debdelta
--------------------------------------

Zsync/rsync are based on fairly large blocks, and break badly for .debs
for that reason: a tiny change in the source code may end up changing all
pointer values, resulting in the rsync algorithm thinking almost every
block has changed.

This is unfortunate, because zsync would make it really easy to update
packages from any version the user has installed to whatever is current
in the archive. However, since it works so badly, it is necessary to
use deltas.

There are several programs to compute binary deltas, and some of the are
specific for executables. One such program is bsdiff.

The debdelta[4] program wraps around bsdiff to generate deltas for .debs.
My proposal is to use debdelta to generate deltas for packages in the
-security, -proposed, and -updates pockets.

    [4] http://packages.qa.debian.org/d/debdelta.html

To implement this, the following needs to happen:

    * IS needs to install debdelta on the relevant machine(s).
    
    * When Launchpad puts a .deb in a relevant pocket, it should also
      add .debdelta files to the new version from the version in
      the release pocket (e.g., hardy), and the previous version in
      the same pocket (e.g., hardy-security). This is a fairly small
      number of files.
      
      The .debdelta files can be published at a later time than the
      corresponding .debs. Generating them is fairly slow, and it is 
      not a good idea to delay security updates for them.

    * apt needs to be changed to be able to make use of the .debdelta
      files, if available.

    * debdelta needs to be fixed to work for packages that use lzma
      compression. Until this is done, it is OK to just not generate
      .debdelta files for such packages.





Follow ups