← Back to team overview

launchpad-dev team mailing list archive

Re: RFC: Stopping apport-driven +filebug requests from timing out

 

On Thu, 22 Oct 2009 10:55:00 +0100
Graham Binns <graham@xxxxxxxxxxxxx> wrote:

> Hi folks,
>
> We've been seeing repeated timeouts in the +filebug page where users are
> coming from apport (so +filebugs/$token, to be precise). These timeouts
> often look like they're happening in the LoginStatus code, but some
> investigation proves this to be a red herring.
>
> My current theory is that:
>
>  1. A user comes to +filebug/$token
>  2. FileBugViewBase.publishTraverse() handles the token, fetches the
>     LibraryFileAlias to which it points and then passes that to a
>     FileBugDataParser.
>  3. FileBugViewBase.publishTraverse() calls FileBugDataParser.parse()
>  4. Time passes.
>  5. More time passes.
>  6. FileBugDataParser.parse() completes and the request continues, but
>     parse() has taken so long to run (~30 seconds) that by the time the
>     LoginStatus code is being run the timeout limit kicks in and the
>     request is given the nuclear boot of doom.

IIRC, this version of the parser has already been Bjornified and is
about a tillenion times faster than it was previously. I guess we're
hitting a new limit now.

>
> FileBugDataParser.parse() is in fact pretty much one big while loop
> (lib/lp/bugs/browser/bugtarget.py:187), looping over the contents of the
> file that apport has attached and dealing with them appropriately. I'm
> pretty certain that the problem we're having is just one of too much
> data; the files that were being uploaded by apport in the cases I looked
> at were circa 90MB in size, and they're going to take a while to parse,
> whichever way you look at it.
>
> Now, as far as I can tell - without studying the loop in detail and
> trying to find ways to slim it down - the only real way to fix this is
> to move the processing of the apport data elsewhere, so that it doesn't
> impact on the user's session. As I see it, the options are:
>
>  1. Create a script that processes apport data and make it possible for
>     the +filebug process to tell it "Hey, this LibraryFileAlias is mine,
>     please process it and update this bug appropriately" after the bug
>     has been filed.
>  2. Make it so that the apport data get processed before the user is
>     pointed at +filebug, so that the requisite data are available to
>     +filebug as via a series of queries instead of locked away in a
>     BLOB.
>  3. A variation on option 1, whereby +filebug will only use the
>     asynchronous method for files over a certain size, e.g. 25MB or so).
>
> The problem with options 1 and 3 is that we need the apport data before
> filing the bug, as far as I can tell. The docstring of
> FileBugDataParser.parse states that the following items are gleaned from
> apport:
>
>   * The initial bug summary.
>   * The initial bug tags.
>   * The visibility of the bug.
>   * Additional initial subscribers
>
> In addition:
>
>   * The first inline part will be added to the description.
>   * All other inline parts will be added as separate comments.
>   * All attachment parts will be added as attachment.
>
> So at this point, as far as I can tell, only option 2 is actually
> viable, though it may require changes to apport, too (probably not, but
> I'm just tossing it in there for the sake of being paranoid). Unless
> there's some other way of fixing this that I've not thought about at
> this point (as I said, I haven't had time yet to properly profile the
> offending while loop to find out if there are savings to be made).

It should be possible to stop parsing one we have:

  * The initial bug summary.
  * The initial bug tags.
  * The visibility of the bug.
  * Additional initial subscribers
  * The first inline part.

These are all early on in the apport blob.

Then, later, parse the remainder of the blob:

  * All other inline parts.
  * All attachment parts

(We could also add a notification to the response saying that this
kind of stuff is happening.)

A problem with parsing ahead of time is that we then have to figure
out how and where to store the results, which may involve some
additional serialisation and parsing.



References