← Back to team overview

launchpad-dev team mailing list archive

Re: capturing backtraces on every 'expensive' operation

 

On Wed, Nov 16, 2011 at 3:31 PM, Robert Collins
<robertc@xxxxxxxxxxxxxxxxx> wrote:
> On Wed, Nov 16, 2011 at 2:11 PM, Gary Poster <gary.poster@xxxxxxxxxxxxx> wrote:
>> +1 on the goal.
>>
>> I'm guessing that this is using plain python stack traces, rather than the extended ones that include tal information?  I'd be very pleasantly surprised if the code that includes stack traces were as fast as you quote, particularly given actual tal (-like) debugging content. I don't have the code in front of me, but I think the pertinent code is in lp.services.stacktrace.

Round 1 is live on qastaging. You can see the results in
https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-4a5b9cd9bd803fdc1960fbd685d6871d#statementlog
- scroll far right and you have backtraces. There are 1ms queries in
there, our finest granularity so the overhead is certainly <= 1ms. The
timeout in question was a 8948 ms query, so not a good indicator of
death-by-1000 cuts :).

The file size on disk is 540K, which isn't tiny, but isn't dire either
- at that size ~5G a day before garbage collection.

I'd love feedback on whether this genuinely helps determine the cause
of late evaluation or not. I seem to recall StevenK jumping through
some hoops to figure out a late evaluation case a couple weeks back -
has anyone else had trouble determining the source of repeated
queries? If so, please have a look at the oops above and let me know
what you think(*).

*: I know its awkward to read. Patches appreciated.

-Rob


Follow ups

References