← Back to team overview

launchpad-dev team mailing list archive

Re: Reliable bug syncing - UI changes

 

On Thu, 14 Jan 2010 00:04:24 +0000
Tom Berger <tom.berger@xxxxxxxxxxxxx> wrote:
...
> >> For the bugtracker as a whole, we need to display information about sync
> >> runs,
> >> and a way to trigger a new sync.
> >
> > I'd re-use the same pattern for forcing a sync as the other page.
> > Can we show a log?  Maybe next to the last sync, we could show a log so
> > people can figure out what the problem is for themselves.
>
> The problem with logs is that we don't run updates for inidividual
> bugtrackers necessarily, so we'll have to try and figure out how to
> pull out of the log the bits pertaining to the bugtracker. Gavin, any
> ideas on how we might be able to do that?

Right now, the bugwatch table has a lastchecked field (a timestamp)
and a last_error_type field (an enum), both of which will be
straighforward to summarize on the bug tracker page.

Tom also suggested borrowing Deryck's graph, and I think that would be
perfect to show on the bug tracker page. This shows how many bug
watches are out of date (over 24 hours old iirc), so it gives us an
idea of how well the checkwatches machinery is working (lastchecked is
updated even if there's an error).

A quite different chart would be a line chart of errors against time
on the x-axis, with one line for each last_error_type:

  NULL (i.e. okay), UNKNOWN, BUG_NOT_FOUND, CONNECTION_ERROR,
  INVALID_BUG_ID, TIMEOUT, UNPARSABLE_BUG, UNPARSABLE_BUG_TRACKER,
  UNSUPPORTED_BUG_TRACKER, PRIVATE_REMOTE_BUG.

We don't record the historic information needed for this, but we could
consider it. A chart of this to summarize all bug trackers could be
useful too. In both cases I think the NULL, UNKNOWN, TIMEOUT and
CONNECTION_ERROR errors would be the most relevant. Maybe
UNSUPPORTED_BUG_TRACKER too.

<takes a breath>

Concurrent with the UI changes we're going to move checkwatches over
to a job system (the same as used for the generation of preview diffs,
for example). The job table has a few more fields that give away
information about the health of the job: status (enum), progress
(int), attempt_count (int), log (text).

However, as with the bugwatch fields, a record in the job table will
represent a single bug watch update, so we'll need to summarize for
the bug tracker page:

- In addition to the lastchecked (Deryck's) graph, have a graph
  showing how long since the last *successful* update.

- Normalize the free-form log information and summarize
  that. (Probably more bother than it's worth.)

- Show, say, the 5 most recent watch update failures in the past 24
  hours (click through for more detail).

- Some errors are caused by bad data (BUG_NOT_FOUND, INVALID_BUG_ID,
  UNPARSABLE_BUG_TRACKER, UNPARSABLE_BUG), some by bugs in our code
  (UNKNOWN, ...), some by misunderstandings (PRIVATE_REMOTE_BUG), some
  by outages (UNKNOWN, CONNECTION_ERROR, TIMEOUT). Try to coarsely
  categorize and count errors to determine health indicators (traffic
  lights), one for checkwatches, one for the remote system, etc.

I don't think we can do all of those though!

Gavin.



References