← Back to team overview

launchpad-dev team mailing list archive

Re: capturing backtraces on every 'expensive' operation

 

+1 on the goal.

I'm guessing that this is using plain python stack traces, rather than the extended ones that include tal information?  I'd be very pleasantly surprised if the code that includes stack traces were as fast as you quote, particularly given actual tal (-like) debugging content. I don't have the code in front of me, but I think the pertinent code is in lp.services.stacktrace. 

If, as I suspect, it's not, we can try optimizing it; if that fails, i'd like the ability to switch that codepath on. It can be very helpful to see what a template is actually trying to render, and I'd hate to lose that ability. 

Gary

On Nov 15, 2011, at 7:23 PM, Robert Collins <robert.collins@xxxxxxxxxxxxx> wrote:

> So, I'd like to capture a backtrace whenever we make a SQL query or
> HTTP service call.
> 
> I think this should be ok, because we don't *want* thousands of
> queries or thousands of backend service calls. We want efficient batch
> based operations, because doing a query or making an HTTP call is at
> least 1ms itself.
> 
> The extra data this would generate would help us more quickly identify
> the call paths leading to horrendous query counts.
> 
> I've done some rough profiling - see the attached file for methodology.
> 
> It looks like grabbing and stringifying a 20 frame deep backtrace on
> amd64 is about 1ms: doing 1000 takes 1 second on my laptop.
> 
> So, doing this naively would add 1 second to a page that does 1000
> queries; and 100ms to a page that does 100 queries.
> 
> Given that a page doing 1000 queries is almost certainly failing
> *now*, I think doing this naively is probably ok.
> 
> However, we could do some things to limit the impact. I'm trying to
> decide if they are worth doing.
> 
> Idea 1: Backtrace on the first 50 queries. There should be some
> repetition in there, and after that stop getting backtraces.
> 
> Idea 2: Only start capturing backtraces after the first 2 seconds of
> request time - small fast pages won't be penalised at all, and slower
> ones will be. Optionally combine with idea 1.
> 
> Idea 3: capture 5% of queries randomly, allowing a broad overview of
> behaviour across the whole request, without imposing 100ms on a 100
> query page.
> 
> At this point I'm planning on just naively capturing everything: good
> pages with 30 or so queries will hardly notice the overhead (in todays
> world), and poor pages with > 30 queries will notice more, but already
> need fixing - and the act of fixing their query count will
> intrinsically fix their measurement overheads.
> 
> -Rob
> <backtracetiming.py>
> _______________________________________________
> Mailing list: https://launchpad.net/~launchpad-dev
> Post to     : launchpad-dev@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~launchpad-dev
> More help   : https://help.launchpad.net/ListHelp


Follow ups

References