← Back to team overview

touch-packages team mailing list archive

[Bug 1421009] Re: unity8 sometimes hangs on boot

 

Some weekend numbers of how many reboots needed to reproduce the problem
on mako: 30, 12, 15, 59, 15, 9, 26, 10, 6. There's no clear upper limit.

In other testing, reverting the libusermetrics landing from February
does not seem to cure the problem - I was able to reproduce the problem
also with the revert I've pushed to https://launchpad.net/~ci-train-ppa-
service/+archive/ubuntu/landing-007/+packages

I've improved the test case in the description several times to get fuller backtraces. With the latest version I was able to see that both in case of the current libusermetrics and the reverted one the backtrace leads back to usermetrics's DBus usage. Which is not to say there's anything wrong with libusermetrics, as the bug is in multi-threaded handling of DBus inside Qt. The DBus call being called is:
#17 0xffffffff in dbus_bus_add_match (connection=0xe9d140, rule=0x11ba818 "type='signal',sender='com.canonical.UserMetrics',path='/com/canonical/UserMetrics/DataSource/2',interface='com.canonical.usermetrics.DataSource',member='emptyDataStringChanged'", error=0x0) at ../../dbus/dbus-bus.c:1553
        msg = 0xf76058
        __FUNCTION__ = "dbus_bus_add_match"

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to autopilot in Ubuntu.
https://bugs.launchpad.net/bugs/1421009

Title:
  unity8 sometimes hangs on boot

Status in the base for Ubuntu mobile products:
  In Progress
Status in autopilot package in Ubuntu:
  Fix Released
Status in libusermetrics package in Ubuntu:
  Invalid
Status in lxc-android-config package in Ubuntu:
  Incomplete
Status in qtbase-opensource-src package in Ubuntu:
  In Progress
Status in ubuntu-system-settings-online-accounts package in Ubuntu:
  New
Status in unity8 package in Ubuntu:
  Invalid

Bug description:
  The following gdbus call is failing with a "Error: Timeout was
  reached" message:

  gdbus call --session --dest com.canonical.UnityGreeter --object-path /
  --method org.freedesktop.DBus.Properties.Get
  com.canonical.UnityGreeter IsActive

  This is being seen on krillin devices starting with image 106 from
  ubuntu-touch/devel-proposed. It doesn't happen every time, so far
  today, I've seen it 3 times from about 12 tests. On the most recent
  failure, I grabbed a console and tried repeatedly to run the command
  from the shell, even after 2 hours the timeout was still being
  returned (after about 28 seconds).

  A copy of ~/.cache/upstart/unity8.log is here:
  http://paste.ubuntu.com/10179482/

  I have 3 test cases where the problem was observed:
  http://d-jenkins.ubuntu-ci:8080/job/vivid-boottest-qtchooser/1/console
  http://d-jenkins.ubuntu-ci:8080/job/vivid-boottest-gsettings-ubuntu-touch-schemas/1/console
  http://d-jenkins.ubuntu-ci:8080/job/fjg-boottest/3/console

  In all cases, the test is using adt-run (from autopkgtest) to drive a
  test on the phone device. adt-run uses the above gdbus call to
  determine if the desktop is active. In all the examples, the device
  was freshly flashed.

  == Test Case ==

  # Prepare debugging
  adb shell
  sudo apt install qtbase5-dbg libc6-dbg libdbus-glib-1-2-dbg dbus-1-dbg libglib2.0-0-dbg
  echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list
  sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 428D7C01
  sudo apt-get update
  sudo apt install libusermetricsoutput1-dbgsym=1.1.1+15.04.20150219-0ubuntu1

  # Start the reboot loop
  # This reboots the device in a loop, and if this bug is not fixed by whatever proposed solution, it will hang eventually. Current highest amount of reboots without errors is 54, so it's probable a 100 reboots is needed for testing.

  bzr branch lp:unity8
  cd unity8
  while true; do adb shell rm -R "~phablet/.cache/QML"; ./tools/unlock-device || break; done

  # When it fails
  adb shell
  sudo gdb -p $(pidof unity8)
  bt

  --
  At this point, the backtrace should show:
  #0  syscall () at ../sysdeps/unix/sysv/linux/arm/syscall.S:37
  #1  0xb6301e12 in _q_futex (op=0, val=3, timeout=0x0, addr=<optimized out>)
      at thread/qmutex_linux.cpp:146
  #2  lockInternal_helper<false> (timeout=-1, elapsedTimer=0x0, d_ptr=...)
      at thread/qmutex_linux.cpp:187
  #3  QBasicMutex::lockInternal (this=this@entry=0x1523b44)
      at thread/qmutex_linux.cpp:203
  #4  0xb6301eb6 in lock (this=0x1523b44) at thread/qmutex.h:59
  #5  lock (timeout=-1, this=0x1523b38) at thread/qmutex.cpp:620
  #6  QMutex::lock (this=this@entry=0x1523d6c) at thread/qmutex.cpp:215
  #7  0xb5f39586 in QDBusMutexLocker (m=0x1523d6c, s=0x1523d48,
      a=ToggleWatchAction, this=<synthetic pointer>) at qdbusthreaddebug_p.h:183
  #8  QDBusDispatchLocker (s=0x1523d48, a=ToggleWatchAction,
      this=<synthetic pointer>) at qdbusthreaddebug_p.h:198
  #9  qDBusRealToggleWatch (d=0x1523d48, watch=0x1524dd0, fd=46)
      at qdbusintegrator.cpp:346
  #10 0xb5ae18f6 in ?? () from /lib/arm-linux-gnueabihf/libdbus-1.so.3

  With this, it's know that it was a QDBus locking related problem.
  --

  ---

  Timeline/Updates:
  2015-02-20: libusermetrics lands, causing (apparently) this boot problem to start happening rarely. http://people.canonical.com/~ogra/touch-image-stats/106.changes / http://launchpadlibrarian.net/198152771/libusermetrics_1.1.1%2B14.10.20141020-0ubuntu1_1.1.1%2B15.04.20150219-0ubuntu1.diff.gz ”I got a symbolic trace out of all the threads. It seems to be a dbus lock between usermetrics and networkmanager bits. We suspect a relation to QTBUG https://bugreports.qt.io/browse/QTBUG-44836.”;
  2015-03-25: qtbase dbus update to support threads (instead of one main thread) in PPA 018 fixes the boot issue, but autopilot test suites start failing randomly.
  2015-03-27: an autopilot fix fixes a simple test case, and seems to fix UITK suite as a whole, but on krillin only
  2015-04-10: Further patches from upstream fix all AP tests.
  2015-04-23: Upstream continues to work on the patches but they have not yet been merged. AP:s pass, but U1 account gets removed usually after a reboot, even though apps can be installed after adding U1 account flawlessly for the duration of that boot.

To manage notifications about this bug go to:
https://bugs.launchpad.net/canonical-devices-system-image/+bug/1421009/+subscriptions


References