← Back to team overview

ubuntu-phone team mailing list archive

Re: system-image updates, and LP: #1277589

 

On Wed, Feb 19, 2014 at 5:29 PM, Barry Warsaw <barry@xxxxxxxxxx> wrote:

> Over the last week or so, we've had a lot of reports of folks not being
> able
> to update their devices using System Settings.  This has been quite tricky
> to
> track down.  I think I have a fix, and I'd like to ask the adventurous
> among
> you to test it out.  If you do, please follow up here, or in LP: #1277589,
> or
> to me directly (via email or irc).
>
> Below I'll go into some gory detail of what I think was going on, but you
> don't need to know all that if you just want to test updates.  I don't
> think
> you should worry about bricking your device, since nothing's changing in
> the
> way updates are applied during recovery, but OTOH take prudent precautions
> if
> you're going to attempt this on your only phone. ;)
>
> Here are the steps you need to take in order to test the new package.
>
> 0a) You'll need adb shell access to your device, and you'll need to turn on
>     writable mode:
>
>     - adb shell
>     - touch /userdata/.writable_image
>     - reboot
>
> 0b) I've been flashing my device to r174 and upgrading it to the latest
>     available release.  r174 was a revision that I could semi-reliably see
>     update crashes.  Now, after many reflashes and updates in a row, I have
>     not seen any crashes.  To flash to r174 (on desktop):
>
>     - ubuntu-device-flash --channel trusty --revision 174 --wipe=true
>
>     Note that --wipe=true *will* delete your data.  If this is a problem
> for
>     you, skip that option, although you'll be testing something different.
>
>     Please double check that the following directories are empty on your
>     device:
>
>     - /var/lib/system-image/keyrings
>     - /var/lib/system-image [1]
>     - /var/log/system-image
>     - /android/cache/recovery [2]
>
>     [1] Except of course for the keyrings directory.
>     [2] You may see a log and last_log file there, they are harmless.
>
>     Remove any .xz or .xz.asc files you see in any of the above.  Remove
> any
>     ubuntu_command file in /android/cache/recovery
>
> 1) You now have a fairly pristine image.  Be sure you have an older
> version of
>    system-image installed (r174 has 2.0.3):
>
>    - adb shell
>    - system-image-cli --version
>
> 2) You'll need to install the .debs from my PPA.  On your desktop:
>
>    - wget all the .debs on this page:
>      https://launchpad.net/~barry/+archive/systemimage/+build/5616752
>    - for i in *.deb; do adb push $i /tmp; done
>
> 3) adb shell into your device and install the .debs, then be sure you're
>    running system-image 2.1.  Note that due to the Python 3.4 transition
> going
>    on, you may have to apt-get update and install the 3.4 packages, like
> so:
>
>     - cd /tmp
>     - dpkg -i *.deb
>     - apt-get update
>     - apt-get install -fy
>     - system-image-cli --version
>
> 3) Now, you are going to start the update via the u/i.  Timing is really
>    important here, and the recipe below is the best way I've managed to
>    trigger the bug (before this fix).  You may want to tail
>    /var/log/system-image/client.log while doing this, or at least review
> it to
>    make sure you do not get a traceback.
>
>    From the u/i:
>
>    - click on System Setting
>    - wait a second or two, and then click on Updates
>
> If you see the prompt to reboot, you are good to go.  You don't need to
> reboot
> the device, but you can if you want.  If you get errors, there are probably
> tracebacks in your client.log file.  Please arrange to pull them off the
> device and attach them to the bug, or pastebin it for me.
>
> If you've got a bunch of free time on your hands, reflash your device,
> reset
> it to pristine state as described above, and try the whole thing again!
> Before this fix, I was able to provoke a crash once every say 2 or 3 times.
> So far in my own live testing, no crashes.
>
> Thanks for your testing!  If I don't hear about any problems in the next
> day
> or so, I'll propose the new system-image package for the CI-train.
>
> Here are the gory details, as best I understand them.
>
> A change was made to System Settings, such that the behavior changed from
> checking for updates when you clicked on the Update icon, to checking for
> updates as soon as the System Settings panel was opened.  This also caused
> the
> download to start since by default auto-download is enabled.
>
> A second check for update is *also* started when you click on the Update
> icon.
> There was a small race window where these two checks could sneak past a
> barrier in the code, and two checks would be running at the same time.
>
> This would cause many multiple calls to Ubuntu Download Manager to download
> certain files, such as the image master, image signing, and blacklist
> keyrings.  Along the way, UDM would at some point begin to write zero-sized
> files in the download destinations, e.g. such that the image-master.tar.xz
> and
> image-master.tar.xz.asc would both be empty *even though UDM would send a
> successful download signal*.  This was verified by md5 checksum matching
> the
> checksum of an empty byte sequence.
>
> The zero sized files caused a cascading failure to validate the keyrings,
> including the blacklist keyring, which for security purposes, is *always*
> downloaded.  This was technically correct behavior, because if a data file
> failed to GPG validate, we can't use it, and in this case the signature
> failure was correct due to the corrupt files.
>
> This cascading keyring failure eventually works its way up to the
> archive-master keyring.  For security purposes, this keyring cannot be
> downloaded - it is pre-installed on the devices.  So once the validation
> chain
> reaches the archive-master, system-image has to give up.  Nothing passes
> its
> signature checks, and we've moved as high up the chain as we're able.  The
> traceback reported in the bug is a bit misleading (or at best unhelpful)
> but
> after further analysis it makes sense, since it was the downloading of the
> blacklist file that started the cascading failures.
>
> Several fixes were implemented in the proposed system-image package, and
> some
> fixes are already in the works for ubuntu-download-manager (and we'll still
> need good integration tests once those changes land).  In the meantime,
> I've
> implemented workarounds in system-image that seem to do the trick.  These
> include:
>
>  - Requesting udm to download to temporary files, and then atomically
> renaming
>    the files into place once all downloads succeed.  Because each
> destination
>    file name is different, this avoids the zero sized files.  Manuel is
>    already working on adding atomic renames to udm.
>

As from my part I can let you know that I have proposed a new  branch that
does what is requested + it ensure that no to files will ever write in the
same temp path via a mutex. This does no have a huge performance impact. I
hope to land this asap I have a silo assigned in the CI Train.


>  - Closed the race condition in system-image.  What I think was happening
> was
>    that the critical piece of code was being guarded by a boolean, but in
>    hindsight, there's an obvious race window during the check-and-set
>    operations, since that is not guaranteed to be atomic.  The fix of
> course
>    is to use a lock object, which can be atomically checked and set.  The
>    tricky bit here is that the lock must be released at different places in
>    the code, depending on whether automatic downloads or manual downloads
> are
>    enabled.  Yay for an extensive test suite!
>
> That's everything I can think of.  I'm feeling pretty confident about the
> proposed package, so don't burst my bubble. :)  If you have any questions,
> feel free to contact me on IRC.
>
> Cheers,
> -Barry
>
> --
> Mailing list: https://launchpad.net/~ubuntu-phone
> Post to     : ubuntu-phone@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~ubuntu-phone
> More help   : https://help.launchpad.net/ListHelp
>
>

Follow ups

References