touch-packages team mailing list archive
-
touch-packages team
-
Mailing list archive
-
Message #23135
[Bug 1354318] Re: whoopsie-upload-all has a wrong check for whether the upload is done
whoopsie-upload-all now returns an upload_stamp for already processed
crash files (.crash files with a .upload file) in addition to returning
an upload_stamp for ones which it just did data collection.
Subsequently, one can call whoopsie-upload-all --timeout 300 and
wait_uploaded will now wait for very .upload file without a .uploaded
file. This is what the CI machinery should run to "wait until
everything is uploaded".
--
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to apport in Ubuntu.
https://bugs.launchpad.net/bugs/1354318
Title:
whoopsie-upload-all has a wrong check for whether the upload is done
Status in “apport” package in Ubuntu:
New
Bug description:
Currently, the way whoopsie-upload-all works is that, whenever a new
.crash file is seen, /etc/init/apport-noui.conf triggers to run
whoopsie-upload-all from a single-threaded job that iterates through
all as-yet-unprocessed .crash files, process them via apport, marks
them for upload, and then waits for apport to upload them.
There are several problems with this:
- The "wait for apport to upload them" is racy. Once a report is submitted, it is legitimate for something to remove all of the related files (.crash, .upload, .uploaded). On our test infrastructure, that has resulted in buggy behavior because the test harness is removing these files /while whoopsie-upload-all is polling/, causing it to poll for a very long time waiting for a file that will never be found and then time out.
- waiting for apport to upload is pointless anyway for its designed use, since whoopsie-upload-all is called from an upstart job that's inotify-triggered so nothing waits for the return from this command anyway. The utah scripts appear to currently be waiting for it; however this should be non-default, exceptional behavior, because...
- since apport-noui is not an 'instance' job, as long as the job is running (presumably blocking on files that are never going to show up), *newly* created .crash files will not be processed.
- the job furthermore is written to assume that it will be called separately for crashes done as different users. However, since whoopsie-upload-all will process all the crashes that it can each time that it runs, there is a race condition whereby a crash written as non-root will be processed by whoopsie-upload-all running as root, resulting in wrong permissions on the related files anyway.
Proposed solution to this:
- Change apport-noui to be 'instance $MATCH'. This will result in a separate whoopsie-upload-all process spawned each time a new .crash is created.
- Keep the current semantics of whoopsie-upload-all iterating over *all* .crash files that it finds and submitting them. This ensures that if a previous crash is missed out for any reason, we manage to catch it later.
- Ensure that process_report() can be safely called from multiple processes at the same time on the same .crash file (by locking with correct semantics and having subsequent calls skip)
- Handle any necessary file uid changes /within/ whoopsie-upload-all, instead of relying on a racy sudo invocation from the upstart job.
- Disable the current wait_uploaded() behavior unless a timeout argument is given explicitly; and also fix the behavior to detect when the .upload file has been removed after we started watching.
I believe with those changes, whoopsie-upload-all should give us
reliable report processing in all cases.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apport/+bug/1354318/+subscriptions
References