← Back to team overview

ubuntu-phone team mailing list archive

system-image updates, and LP: #1277589

 

Over the last week or so, we've had a lot of reports of folks not being able
to update their devices using System Settings.  This has been quite tricky to
track down.  I think I have a fix, and I'd like to ask the adventurous among
you to test it out.  If you do, please follow up here, or in LP: #1277589, or
to me directly (via email or irc).

Below I'll go into some gory detail of what I think was going on, but you
don't need to know all that if you just want to test updates.  I don't think
you should worry about bricking your device, since nothing's changing in the
way updates are applied during recovery, but OTOH take prudent precautions if
you're going to attempt this on your only phone. ;)

Here are the steps you need to take in order to test the new package.

0a) You'll need adb shell access to your device, and you'll need to turn on
    writable mode:

    - adb shell
    - touch /userdata/.writable_image
    - reboot

0b) I've been flashing my device to r174 and upgrading it to the latest
    available release.  r174 was a revision that I could semi-reliably see
    update crashes.  Now, after many reflashes and updates in a row, I have
    not seen any crashes.  To flash to r174 (on desktop):

    - ubuntu-device-flash --channel trusty --revision 174 --wipe=true

    Note that --wipe=true *will* delete your data.  If this is a problem for
    you, skip that option, although you'll be testing something different.

    Please double check that the following directories are empty on your
    device:

    - /var/lib/system-image/keyrings
    - /var/lib/system-image [1]
    - /var/log/system-image
    - /android/cache/recovery [2]

    [1] Except of course for the keyrings directory.
    [2] You may see a log and last_log file there, they are harmless.

    Remove any .xz or .xz.asc files you see in any of the above.  Remove any
    ubuntu_command file in /android/cache/recovery

1) You now have a fairly pristine image.  Be sure you have an older version of
   system-image installed (r174 has 2.0.3):

   - adb shell
   - system-image-cli --version

2) You'll need to install the .debs from my PPA.  On your desktop:

   - wget all the .debs on this page:
     https://launchpad.net/~barry/+archive/systemimage/+build/5616752
   - for i in *.deb; do adb push $i /tmp; done

3) adb shell into your device and install the .debs, then be sure you're
   running system-image 2.1.  Note that due to the Python 3.4 transition going
   on, you may have to apt-get update and install the 3.4 packages, like so:

    - cd /tmp
    - dpkg -i *.deb
    - apt-get update
    - apt-get install -fy
    - system-image-cli --version

3) Now, you are going to start the update via the u/i.  Timing is really
   important here, and the recipe below is the best way I've managed to
   trigger the bug (before this fix).  You may want to tail
   /var/log/system-image/client.log while doing this, or at least review it to
   make sure you do not get a traceback.

   From the u/i:

   - click on System Setting
   - wait a second or two, and then click on Updates

If you see the prompt to reboot, you are good to go.  You don't need to reboot
the device, but you can if you want.  If you get errors, there are probably
tracebacks in your client.log file.  Please arrange to pull them off the
device and attach them to the bug, or pastebin it for me.

If you've got a bunch of free time on your hands, reflash your device, reset
it to pristine state as described above, and try the whole thing again!
Before this fix, I was able to provoke a crash once every say 2 or 3 times.
So far in my own live testing, no crashes.

Thanks for your testing!  If I don't hear about any problems in the next day
or so, I'll propose the new system-image package for the CI-train.

Here are the gory details, as best I understand them.

A change was made to System Settings, such that the behavior changed from
checking for updates when you clicked on the Update icon, to checking for
updates as soon as the System Settings panel was opened.  This also caused the
download to start since by default auto-download is enabled.

A second check for update is *also* started when you click on the Update icon.
There was a small race window where these two checks could sneak past a
barrier in the code, and two checks would be running at the same time.

This would cause many multiple calls to Ubuntu Download Manager to download
certain files, such as the image master, image signing, and blacklist
keyrings.  Along the way, UDM would at some point begin to write zero-sized
files in the download destinations, e.g. such that the image-master.tar.xz and
image-master.tar.xz.asc would both be empty *even though UDM would send a
successful download signal*.  This was verified by md5 checksum matching the
checksum of an empty byte sequence.

The zero sized files caused a cascading failure to validate the keyrings,
including the blacklist keyring, which for security purposes, is *always*
downloaded.  This was technically correct behavior, because if a data file
failed to GPG validate, we can't use it, and in this case the signature
failure was correct due to the corrupt files.

This cascading keyring failure eventually works its way up to the
archive-master keyring.  For security purposes, this keyring cannot be
downloaded - it is pre-installed on the devices.  So once the validation chain
reaches the archive-master, system-image has to give up.  Nothing passes its
signature checks, and we've moved as high up the chain as we're able.  The
traceback reported in the bug is a bit misleading (or at best unhelpful) but
after further analysis it makes sense, since it was the downloading of the
blacklist file that started the cascading failures.

Several fixes were implemented in the proposed system-image package, and some
fixes are already in the works for ubuntu-download-manager (and we'll still
need good integration tests once those changes land).  In the meantime, I've
implemented workarounds in system-image that seem to do the trick.  These
include:

 - Requesting udm to download to temporary files, and then atomically renaming
   the files into place once all downloads succeed.  Because each destination
   file name is different, this avoids the zero sized files.  Manuel is
   already working on adding atomic renames to udm.

 - Closed the race condition in system-image.  What I think was happening was
   that the critical piece of code was being guarded by a boolean, but in
   hindsight, there's an obvious race window during the check-and-set
   operations, since that is not guaranteed to be atomic.  The fix of course
   is to use a lock object, which can be atomically checked and set.  The
   tricky bit here is that the lock must be released at different places in
   the code, depending on whether automatic downloads or manual downloads are
   enabled.  Yay for an extensive test suite!

That's everything I can think of.  I'm feeling pretty confident about the
proposed package, so don't burst my bubble. :)  If you have any questions,
feel free to contact me on IRC.

Cheers,
-Barry

Attachment: signature.asc
Description: PGP signature


Follow ups