← Back to team overview

launchpad-dev team mailing list archive

Re: performance dashboard?

 

On Sat, May 8, 2010 at 1:49 AM, Sidnei da Silva
<sidnei.da.silva@xxxxxxxxx> wrote:
> Hi Robert,
>
>> Now, I'm not suggesting we go out and invent such a dashboard itself -
>> there's going to be a tonne of investment needed to do that, but
>> perhaps there is an open source version of this out there already for
>> zope apps? Or perhaps we could look at providing a zope plugin to talk
>> to newrelic?
>
> I'd go as far as saying that there would be nothing Zope-specific to
> collecting the kind of metrics that would be interesting.

I think you're wrong there, but I do agree that there are many non
Zope metrics. Some Zope metrics:
 - template engine time.
 - ORM time [while you can claim its storm, I think in terms of
'appserver' here; you'd want it glued together well and outputting in
sync with the zope transaction ending; so definitely need *some* zope
glue to do it]
 - accept() backlog delay


> The great
> majority of stats would be collected at the HAProxy/Squid/Apache
> level, and the remaining ones would likely be Storm or other
> subsystems like RabbitMQ for Landscape. Maybe finding out if the
> threads of a certain Zope app server are exausted would be useful, but
> that's the only thing that comes to my mind.

Grabbing stats for a single request across haproxy + squid + apache
would be *awesome*. SSL handshake time; cache lookup time etc. Oh and
we need to add memcached too these days.

> The situation is the same. We have a way to collect some metrics but
> if they are not aggregated there's not much point in having them
> except during development.

Thats exactly it!

> I have a lot of interest in the subject of collecting metrics and
> analyzing bottlenecks, I even have a book or two around that I
> recommend for people that want to dig into the subject (eg: The Art of
> Capacity Planning). However, when it comes down to actually doing it I
> feel like our developers are way too distant from the LOSAs. It might
> be that I just never tried to get a new metric graphed, and I've never
> seen any graph from Apache or HAProxy internally, though I trust that
> they exist and someone is watching over them.

The thing about the tuolumne graphs and nagios meters is that they are
very manual: you can't 'drill down' into a bad metric to find where
its coming from, unless the lower data is already configured in Just
The Right Way. Key metrics, for crisis handling and detection are
great; they aren't great for exploring things - and thats what I feel
we're missing as developers.

-Rob



References