← Back to team overview

launchpad-dev team mailing list archive

Re: capturing backtraces on every 'expensive' operation

 

On 18/11/11 21:07, Robert Collins wrote:
> On Wed, Nov 16, 2011 at 3:31 PM, Robert Collins
> <robertc@xxxxxxxxxxxxxxxxx> wrote:
>> On Wed, Nov 16, 2011 at 2:11 PM, Gary Poster <gary.poster@xxxxxxxxxxxxx> wrote:
>>> +1 on the goal.
>>>
>>> I'm guessing that this is using plain python stack traces, rather than the extended ones that include tal information?  I'd be very pleasantly surprised if the code that includes stack traces were as fast as you quote, particularly given actual tal (-like) debugging content. I don't have the code in front of me, but I think the pertinent code is in lp.services.stacktrace.
> 
> Round 1 is live on qastaging. You can see the results in
> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-4a5b9cd9bd803fdc1960fbd685d6871d#statementlog
> - scroll far right and you have backtraces. There are 1ms queries in
> there, our finest granularity so the overhead is certainly <= 1ms. The
> timeout in question was a 8948 ms query, so not a good indicator of
> death-by-1000 cuts :).
> 
> The file size on disk is 540K, which isn't tiny, but isn't dire either
> - at that size ~5G a day before garbage collection.

I'm confused -- how is that not dire? We historically kept at least the
last 30 days, but now that will be hundreds of gigabytes. OOPSes with
sensible query counts used to be 10-15KB.

Particularly with oops-prune no longer running, we are likely to be in a
pretty bad situation by the end of next week, even with all the disk
space we freed on carob yesterday.

Attachment: signature.asc
Description: OpenPGP digital signature


Follow ups

References