touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #77456
[Bug 1447756] Re: segfault in log.c code causes phone reboot loops
I think I have nailed it down now, here is brief description what is happening (if I read code right)
There seems to be race, when we get new log data for one of the jobs after job has been terminated, and while processing it we call log_io_reader and eventually log_file_write which will try to flash unflashed buffer to drive.
This succeeds, mind this is before we got disk writable signal.
Since it succeeds, unflashed->len becomes 0, but we don't remove that log instance from list of logs which needs to be flashed (log_unflushed_files).
So next time when we get signal that disk is writable, we try again to flash that log and BOOM it panics on assert checking that log has something to be flashed, but it was already flashed.
Actual change of unfleshed log len changes on line 562, that's where we shrink unflashed buffer by amount we managed to write to disk, which in our case if full len, making buffer after shrinking zero length.
So I can see at least three fixes:
1) we should after calling nih_io_buffer_shrink (log->unflushed, (size_t)wlen); try to check and if log->unflashed->len is 0, and if so then remove it from the log_unflushed_files list.
2) we need to make log_clear_unflushed more tolerant to logs which has been already flushed successfully before reaching to this point.
3) we don't try to flash unfleshed buffers till we get disk writable signal
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1447756
Title:
segfault in log.c code causes phone reboot loops
Status in the base for Ubuntu mobile products:
Fix Committed
Status in Upstart:
New
Status in upstart package in Ubuntu:
Confirmed
Bug description:
We recently started getting reprots from phone users that their
devices go into a reboot loop after changing the language or getting
an OTA upgrade (either of both end with a reboot of the phone)
after a bit of research we collected the log at
http://pastebin.ubuntu.com/10872934/
this shows a segfault of upstarts init binary in the log.c code:
[ 6.999083]init: log.c:819: Assertion failed in log_clear_unflushed: log->unflushed->len
[ 7.000279]init: Caught abort, core dumped
[ 7.467176]Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600
To manage notifications about this bug go to:
https://bugs.launchpad.net/canonical-devices-system-image/+bug/1447756/+subscriptions
References