← Back to team overview

ubuntu-phone team mailing list archive

Re: Touch image 120 results

 

On 10 January 2014 09:15, Alexander Sack <asac@xxxxxxxxxxxxx> wrote:
> On Fri, Jan 10, 2014 at 6:23 AM, Paul Larson <paul.larson@xxxxxxxxxxxxx> wrote:
>> = Mako =
>> 100% pass (no reruns for anything)
>> But we saw several crashes - dialer-app (which has been going on for a
>> while) as well as unity8 crash in default and in click-image-tests.
>> Default tests also saw a crash in whoopsie:
>> http://ci.ubuntu.com/smokeng/trusty/touch/mako/120:20140109.1:20140107.1/5973/click_image_tests/
>
>
> Why do we see whoopsie crashes? Thought we disabled it to not auto
> process crashes on the phone sometimes last cycle.

I'm being pedantic, but they're technically apport crashes. Whoopsie
is just the daemon that shovels .crash files to
https://daisy.ubuntu.com.

The phone does not presently do a second-phase processing of crash
files (adding package information, hooks, etc), nor does it feed the
crash files to whoopsie (using whoopsie-upload-all), as Steve
established the upstart job that ran whoopsie-upload-all was busted:

https://bugs.launchpad.net/ubuntu/+source/apport/+bug/1235436

However, Brian Murray has fixed the bug in question, so we should be
largely ready to go on accepting crash reports from phones. The last
remaining piece is getting armhf retracers online as part of the move
of the retracing infrastructure to Prodstack. I've asked Brian to take
this task from me and finish it up. All that's needed is working with
webops to verify that the stagingstack deployment is functional:

https://rt.admin.canonical.com//Ticket/Display.html?id=58019

Now, to your question of why we're seeing whoopsie-upload-all crashes
collected in the CI infrastructure. As Michał points out, that script
is being run over a corrupted crash file. I've filed this bug to
better deal with that particular case:

https://bugs.launchpad.net/ubuntu/+source/apport/+bug/1267774

There's a deeper problem here. Didier informs me that they were seeing
a lot of crashes in unity8 with a smashed stacktrace. They realised
the dying unity process was getting reaped and restarted by upstart
while still being processed by apport because it was taking a long
time to collect and process the core file. They set a timeout 30s
(data/unity8.override in unity8-autopilot), which seemed to work
around the problem, but perhaps that value is being exceeded.

We need a better solution than increasing a timeout. James, does
upstart provide us with a better mechanism for telling it to not kill
a process in this state? Can we add one if not? :)

Thanks everyone.


Follow ups

References