touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #73574
[Bug 1421009] Re: unity8 sometimes hangs on boot
Some weekend numbers of how many reboots needed to reproduce the problem
on mako: 30, 12, 15, 59, 15, 9, 26, 10, 6. There's no clear upper limit.
In other testing, reverting the libusermetrics landing from February
does not seem to cure the problem - I was able to reproduce the problem
also with the revert I've pushed to https://launchpad.net/~ci-train-ppa-
service/+archive/ubuntu/landing-007/+packages
I've improved the test case in the description several times to get fuller backtraces. With the latest version I was able to see that both in case of the current libusermetrics and the reverted one the backtrace leads back to usermetrics's DBus usage. Which is not to say there's anything wrong with libusermetrics, as the bug is in multi-threaded handling of DBus inside Qt. The DBus call being called is:
#17 0xffffffff in dbus_bus_add_match (connection=0xe9d140, rule=0x11ba818 "type='signal',sender='com.canonical.UserMetrics',path='/com/canonical/UserMetrics/DataSource/2',interface='com.canonical.usermetrics.DataSource',member='emptyDataStringChanged'", error=0x0) at ../../dbus/dbus-bus.c:1553
msg = 0xf76058
__FUNCTION__ = "dbus_bus_add_match"
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to autopilot in Ubuntu.
https://bugs.launchpad.net/bugs/1421009
Title:
unity8 sometimes hangs on boot
Status in the base for Ubuntu mobile products:
In Progress
Status in autopilot package in Ubuntu:
Fix Released
Status in libusermetrics package in Ubuntu:
Invalid
Status in lxc-android-config package in Ubuntu:
Incomplete
Status in qtbase-opensource-src package in Ubuntu:
In Progress
Status in ubuntu-system-settings-online-accounts package in Ubuntu:
New
Status in unity8 package in Ubuntu:
Invalid
Bug description:
The following gdbus call is failing with a "Error: Timeout was
reached" message:
gdbus call --session --dest com.canonical.UnityGreeter --object-path /
--method org.freedesktop.DBus.Properties.Get
com.canonical.UnityGreeter IsActive
This is being seen on krillin devices starting with image 106 from
ubuntu-touch/devel-proposed. It doesn't happen every time, so far
today, I've seen it 3 times from about 12 tests. On the most recent
failure, I grabbed a console and tried repeatedly to run the command
from the shell, even after 2 hours the timeout was still being
returned (after about 28 seconds).
A copy of ~/.cache/upstart/unity8.log is here:
http://paste.ubuntu.com/10179482/
I have 3 test cases where the problem was observed:
http://d-jenkins.ubuntu-ci:8080/job/vivid-boottest-qtchooser/1/console
http://d-jenkins.ubuntu-ci:8080/job/vivid-boottest-gsettings-ubuntu-touch-schemas/1/console
http://d-jenkins.ubuntu-ci:8080/job/fjg-boottest/3/console
In all cases, the test is using adt-run (from autopkgtest) to drive a
test on the phone device. adt-run uses the above gdbus call to
determine if the desktop is active. In all the examples, the device
was freshly flashed.
== Test Case ==
# Prepare debugging
adb shell
sudo apt install qtbase5-dbg libc6-dbg libdbus-glib-1-2-dbg dbus-1-dbg libglib2.0-0-dbg
echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 428D7C01
sudo apt-get update
sudo apt install libusermetricsoutput1-dbgsym=1.1.1+15.04.20150219-0ubuntu1
# Start the reboot loop
# This reboots the device in a loop, and if this bug is not fixed by whatever proposed solution, it will hang eventually. Current highest amount of reboots without errors is 54, so it's probable a 100 reboots is needed for testing.
bzr branch lp:unity8
cd unity8
while true; do adb shell rm -R "~phablet/.cache/QML"; ./tools/unlock-device || break; done
# When it fails
adb shell
sudo gdb -p $(pidof unity8)
bt
--
At this point, the backtrace should show:
#0 syscall () at ../sysdeps/unix/sysv/linux/arm/syscall.S:37
#1 0xb6301e12 in _q_futex (op=0, val=3, timeout=0x0, addr=<optimized out>)
at thread/qmutex_linux.cpp:146
#2 lockInternal_helper<false> (timeout=-1, elapsedTimer=0x0, d_ptr=...)
at thread/qmutex_linux.cpp:187
#3 QBasicMutex::lockInternal (this=this@entry=0x1523b44)
at thread/qmutex_linux.cpp:203
#4 0xb6301eb6 in lock (this=0x1523b44) at thread/qmutex.h:59
#5 lock (timeout=-1, this=0x1523b38) at thread/qmutex.cpp:620
#6 QMutex::lock (this=this@entry=0x1523d6c) at thread/qmutex.cpp:215
#7 0xb5f39586 in QDBusMutexLocker (m=0x1523d6c, s=0x1523d48,
a=ToggleWatchAction, this=<synthetic pointer>) at qdbusthreaddebug_p.h:183
#8 QDBusDispatchLocker (s=0x1523d48, a=ToggleWatchAction,
this=<synthetic pointer>) at qdbusthreaddebug_p.h:198
#9 qDBusRealToggleWatch (d=0x1523d48, watch=0x1524dd0, fd=46)
at qdbusintegrator.cpp:346
#10 0xb5ae18f6 in ?? () from /lib/arm-linux-gnueabihf/libdbus-1.so.3
With this, it's know that it was a QDBus locking related problem.
--
---
Timeline/Updates:
2015-02-20: libusermetrics lands, causing (apparently) this boot problem to start happening rarely. http://people.canonical.com/~ogra/touch-image-stats/106.changes / http://launchpadlibrarian.net/198152771/libusermetrics_1.1.1%2B14.10.20141020-0ubuntu1_1.1.1%2B15.04.20150219-0ubuntu1.diff.gz ”I got a symbolic trace out of all the threads. It seems to be a dbus lock between usermetrics and networkmanager bits. We suspect a relation to QTBUG https://bugreports.qt.io/browse/QTBUG-44836.”;
2015-03-25: qtbase dbus update to support threads (instead of one main thread) in PPA 018 fixes the boot issue, but autopilot test suites start failing randomly.
2015-03-27: an autopilot fix fixes a simple test case, and seems to fix UITK suite as a whole, but on krillin only
2015-04-10: Further patches from upstream fix all AP tests.
2015-04-23: Upstream continues to work on the patches but they have not yet been merged. AP:s pass, but U1 account gets removed usually after a reboot, even though apps can be installed after adding U1 account flawlessly for the duration of that boot.
To manage notifications about this bug go to:
https://bugs.launchpad.net/canonical-devices-system-image/+bug/1421009/+subscriptions
References