← Back to team overview

launchpad-dev team mailing list archive

Re: OOPS-related questions

 

On Wed, Dec 21, 2011 at 7:14 AM, Gary Poster <gary.poster@xxxxxxxxxxxxx> wrote:
> - What's the current policy for deleting old OOPSes?  If they are not
> kept forever, we need to change our bug reporting policies somehow.

They are kept unconditionally for 2 weeks, then pruned if they are not
mentioned in LP.

> - On a related note, any idea why the OOPS mentioned in this bug does
> not exist?  https://bugs.launchpad.net/launchpad/+bug/899123 ->
> https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-93c9602da4a55064c5b25f5db810f54b

Either the pruning code is buggy, or it was deleted when we were
having to manually delete days of OOPSes due to carob running out of
disk space. I hope its the latter. The pruning code lives in three
places:
 - LP has an API to return OOPS references made during a date range on a project
 - oops-datedir-repo has code to prune the contents of a date directory
 - oops-tools has code to prune the contents of the postgresql DB

> - On another related note, AIUI the OOPS file no longer are likely to be
> directly on the devpad filesystem, but in the OOPS DB; and if they are
> on devpad, then they will have a filename that has no correlation to the
> OOPS id.  Therefore, to do a full manual search for an id that you
> suspect might be missing a character or two, one would need to do
> something like the following at the moment.  Is this correct?

They are still directly on the devpad filesystem, and in the oops DB,
just as before. The location has changed for new OOPSes - its not
/srv/oops-amqp/<instance>/<day>/<local-hash>.

Previously received oopses will be in the launchpad.net-logs
directory, and there isn't really a good way to combine them all, so
unless someone comes up with one, this split will remain.

> 1) something like this:
>
> gary@carob:/srv/launchpad.net-logs/lpnet$ find -L . -path '*/OOPS-*'
> -exec grep -Hn '93c9602da4a55064c5b25f5db810f54b' {} \;
>
> (as you might expect, this went on so long I did a control-C.  I didn't
> think harder about this to try and constrain further, though I might
> have constrained the find with )

I would expect 'grep OOPS-93c9602da4a55064c5b25f5db810f54b -r
/srv/oops-amqp/production' to be reasonably quick at finding an oops,
but...

> 2) a LIKE search in the OOPS DB for an id containing
> 93c9602da4a55064c5b25f5db810f54b

Just hitting the web UI is the best way:
 - all OOPSes are transmitted over AMQP now(*)
 - the AMQP consumer takes care of writing to disk, and inserting into
the DB, before it acks the message. So short of wiping the mnesia DB
of rabbit itself, there won't be any OOPSes that are on disk and not
(in the DB or unacked in rabbit).

The web UI shows the oops filename, if you are wanting to dig into the
raw data for some reason.

HTH,
Rob


References