← Back to team overview

touch-packages team mailing list archive

[Bug 972077] Re: apt repository disk format has race conditions

 

** Description changed:

  Apt archives are accessed over HTTP; this has resulted in a cluster of
  bugs (reported here, and upstream) about problems behind intercepting
  caches, problems with squid etc.
  
  There are 3 interlocking issues:
  A - mirror networks may be out of sync with each other (e.g. a file named on one mirror may no longer exist, or may not yet exist, on another mirror)
  B - updating files on a single mirror is not atomic - and even small windows of inconsistency will, given enough clients, cause headaches.
  C - caches exacerbate race conditions - when one happens, until the cached data expires, all clients of the cache will suffer from the race
  
  Solving this requires one of several things:
   - file system transactions
   - an archive format that requires only weakly ordered updates to the files at particular urls with the assumption that only one file may be observed to change at a time (because a lookup of file A, then B, may get a cache miss on A and a cache hit on B, so even if all clients strictly go A, then B, updates may still see old files when paths are reused).
   - super robust clients that repeatedly retry with progressively less cache friendly headers until they have a consistent view. (This is very tricky to do).
  
  It may be possible to do a tweak to the apt repository format though,
  which would allow publishing a race-free format in parallel with the
  existing layout, while clients migrate. To be safe against issue (A) the
  mirror network would need some care around handling of dns round-robin
  mirrors [to minimise the situation where referenced data is not
  available], but this should be doable - or alternatively clients doing
  'apt-get update' may need to be willing to retry to accommodate round-
  robin skew.
  
  What would such an archive format look like?
- It would have only one well known file name (e.g. Releases-2), which would be internally signed. Rather than signing e.g. Packages.gz, it would sign a uniquely named packages and sources file - e.g. Packages-$HASH.gz or Packages-$serialno.gz.
+ It would have only one well known file name (InRelease), which would be internally signed. Rather than signing e.g. Packages.gz, it would sign a uniquely named packages and sources file - e.g. Packages-$HASH.gz or Packages-$serialno.gz.
  
  Backwards compatibility is achieved by using the same filenames for
  deb's and the like. We need to keep writing Packages.gz though, and
  Releases, until we no longer worry about old apt clients. We can
  optimise disk space a little by making Packages.gz a symlink to a
  Packages-$HASH.gz (and so on for Sources..), but it may be simpler and
  less prone to unexpected behaviour to keep using regular files.
  
  tl;dr
   * Unique file names for all unique file content with one exception
-  * Releases-2, a self-signed file that provides hashes and names the index files (Packages, Sources, Translations etc)
+  * InRelease, a self-signed file that provides hashes and names the index files (Packages, Sources, Translations etc)
   * Coexists with existing archive layout
  
- 
  Related bugs:
-  * bug 804252: Please support InRelease files
-  * bug 1430011: support apt by-hash mirrors
+  * bug 804252: Please support InRelease files
+  * bug 1430011: support apt by-hash mirrors

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to apt in Ubuntu.
https://bugs.launchpad.net/bugs/972077

Title:
  apt repository disk format has race conditions

Status in apt package in Ubuntu:
  Confirmed

Bug description:
  Apt archives are accessed over HTTP; this has resulted in a cluster of
  bugs (reported here, and upstream) about problems behind intercepting
  caches, problems with squid etc.

  There are 3 interlocking issues:
  A - mirror networks may be out of sync with each other (e.g. a file named on one mirror may no longer exist, or may not yet exist, on another mirror)
  B - updating files on a single mirror is not atomic - and even small windows of inconsistency will, given enough clients, cause headaches.
  C - caches exacerbate race conditions - when one happens, until the cached data expires, all clients of the cache will suffer from the race

  Solving this requires one of several things:
   - file system transactions
   - an archive format that requires only weakly ordered updates to the files at particular urls with the assumption that only one file may be observed to change at a time (because a lookup of file A, then B, may get a cache miss on A and a cache hit on B, so even if all clients strictly go A, then B, updates may still see old files when paths are reused).
   - super robust clients that repeatedly retry with progressively less cache friendly headers until they have a consistent view. (This is very tricky to do).

  It may be possible to do a tweak to the apt repository format though,
  which would allow publishing a race-free format in parallel with the
  existing layout, while clients migrate. To be safe against issue (A)
  the mirror network would need some care around handling of dns round-
  robin mirrors [to minimise the situation where referenced data is not
  available], but this should be doable - or alternatively clients doing
  'apt-get update' may need to be willing to retry to accommodate round-
  robin skew.

  What would such an archive format look like?
  It would have only one well known file name (InRelease), which would be internally signed. Rather than signing e.g. Packages.gz, it would sign a uniquely named packages and sources file - e.g. Packages-$HASH.gz or Packages-$serialno.gz.

  Backwards compatibility is achieved by using the same filenames for
  deb's and the like. We need to keep writing Packages.gz though, and
  Releases, until we no longer worry about old apt clients. We can
  optimise disk space a little by making Packages.gz a symlink to a
  Packages-$HASH.gz (and so on for Sources..), but it may be simpler and
  less prone to unexpected behaviour to keep using regular files.

  tl;dr
   * Unique file names for all unique file content with one exception
   * InRelease, a self-signed file that provides hashes and names the index files (Packages, Sources, Translations etc)
   * Coexists with existing archive layout

  Related bugs:
   * bug 804252: Please support InRelease files
   * bug 1430011: support apt by-hash mirrors

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/972077/+subscriptions