← Back to team overview

touch-packages team mailing list archive

[Bug 1354318] Re: whoopsie-upload-all has a wrong check for whether the upload is done

 

Quick notice: whoopsie-upload-all was never designed to be called in
parallel from an upstart job. It was a script that we want to run at the
end of merge proposals etc. in the CI machinery to upload all crashes
that occurred during the test run. For that reason it must wait until
everything has been uploaded, otherwise the testbed will be torn down
and the .crash is lost.

For upstart we could call this with a --timeout=0, so that it behaves
asynchronously. If we rely on the upstart job in the CI machinery now,
we instead need a new script which does the "wait until everything is
uploaded".

Adding flock()ing to the .crash files in process_report() sounds fine to
me.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to apport in Ubuntu.
https://bugs.launchpad.net/bugs/1354318

Title:
  whoopsie-upload-all has a wrong check for whether the upload is done

Status in “apport” package in Ubuntu:
  New

Bug description:
  Currently, the way whoopsie-upload-all works is that, whenever a new
  .crash file is seen, /etc/init/apport-noui.conf triggers to run
  whoopsie-upload-all from a single-threaded job that iterates through
  all as-yet-unprocessed .crash files, process them via apport, marks
  them for upload, and then waits for apport to upload them.

  There are several problems with this:

   - The "wait for apport to upload them" is racy.  Once a report is submitted, it is legitimate for something to remove all of the related files (.crash, .upload, .uploaded).  On our test infrastructure, that has resulted in buggy behavior because the test harness is removing these files /while whoopsie-upload-all is polling/, causing it to poll for a very long time waiting for a file that will never be found and then time out.
   - waiting for apport to upload is pointless anyway for its designed use, since whoopsie-upload-all is called from an upstart job that's inotify-triggered so nothing waits for the return from this command anyway.  The utah scripts appear to currently be waiting for it; however this should be non-default, exceptional behavior, because...
   - since apport-noui is not an 'instance' job, as long as the job is running (presumably blocking on files that are never going to show up), *newly* created .crash files will not be processed.
   - the job furthermore is written to assume that it will be called separately for crashes done as different users.  However, since whoopsie-upload-all will process all the crashes that it can each time that it runs, there is a race condition whereby a crash written as non-root will be processed by whoopsie-upload-all running as root, resulting in wrong permissions on the related files anyway.

  Proposed solution to this:

   - Change apport-noui to be 'instance $MATCH'.  This will result in a separate whoopsie-upload-all process spawned each time a new .crash is created.
   - Keep the current semantics of whoopsie-upload-all iterating over *all* .crash files that it finds and submitting them.  This ensures that if a previous crash is missed out for any reason, we manage to catch it later.
   - Ensure that process_report() can be safely called from multiple processes at the same time on the same .crash file (by locking with correct semantics and having subsequent calls skip)
   - Handle any necessary file uid changes /within/ whoopsie-upload-all, instead of relying on a racy sudo invocation from the upstart job.
   - Disable the current wait_uploaded() behavior unless a timeout argument is given explicitly; and also fix the behavior to detect when the .upload file has been removed after we started watching.

  I believe with those changes, whoopsie-upload-all should give us
  reliable report processing in all cases.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apport/+bug/1354318/+subscriptions


References