← Back to team overview

launchpad-dev team mailing list archive

Re: Is it ok to report slightly inflated bug counts in portlets; counting bugs; accuracy vs speed

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I have put together a prototype at the SQL level; it performs
> adequately: we can satisfy queries for ubuntu scale tags (queries are
> at the end of this mail for those interested - or read the gory
> details in the bug):
> top 10 tags (logged in) completes in < 0.5 seconds and considers 33K rows
> open bugs (logged in) completes in < 0.3 seconds and considers 14K rows
> top 10 tags (anonymous) completes in < 0.2 seconds and considers 10K rows
> open bugs (with privacy) completes in < 0.2 seconds and considers 110 rows

I'm a little curious about 'blowout'. Because you have one row per
tag*status*milestone. 'status' is fortunately bounded, but milestone can
grow considerably. (In case of bzr, we currently have 150 milestones on
LP, though only a few open ones.)

'tag' also has the tendency to accrue many items. I think you mentioned
that Launchpad has 151 *official* tags, and I don't know how many more
unofficial tags you have.

So for a bzr/launchpad target, you could have 150*151*10 = 226,500 rows.
I'm not 100% sure how many actual bugs we have, but it is on the order
of <5,000 bugs (including fixed, open, and invalid bugs).
...

I'm sure you could optimize and not include the full cross product. Any
rows that sum to 0 would not be included, etc. If I read your comments
correctly, the full expansion on staging was only 2.6M rows? (and 1.3M
were for ubuntu itself?)

I do realize your queries are nicely efficient at getting a subset. But
turning a 900k BugTask table + 700k Bug table into a 2.6M row table
doesn't seem to be scaling in the right direction.

> 
>> Secondly if it does happen that people get an accurate count through
>> the api or through paging through all the results, it may be
>> disconcerting if it doesn't match; but perhaps that can be handled by
>> just indicating that the numbers are approximate, or again being
>> accurate at low numbers.
> 
> I would like the API to use these summaries as well where we can. We
> have a bunch of work to do to break the assumption that the results
> set is at all like a list object first; my current batchnav 1.2.4 work
> goes someway towards that, but there is more to do.
> I agree that it *could* be disconcerting, but as I say: these summary
> numbers were broken for *7 years* and we had 1 - yes '1' - bug report
> about their accuracy.
> 
> -Rob

I know that bzr has driven New and Critical to 0 quite often. If privacy
is the only inaccurate measure, you could always split the numbering up.
So you would see New 10 (~2 private). You could even include the 2 in
the count, and then put ~2 private in a mouse over tooltip.

Certainly I found it annoying when closing the last New bug and I
returned to the page to have it say "1 New" and clicking on the New link
gave me a search result with 0 entries in it. (I don't know that I
bothered to file a bug, maybe.)

I know in general privacy is a bit annoying. Martin just posted to
canonical-bazaar about all the bugs in 'bzr(Ubuntu)' that never got
triaged. For him, there were ~50 bugs marked New, for the rest of us, we
could only see 4. I don't think it was clear to him from the search page
that some of the bugs he saw were private. (UI thing only, but still
somewhat relevant. You expect a search to give you similar results to
your summary statistics.)

Below a certain threshold, I imagine the actual queries for counts are
actually pretty fast. I don't know the queries, but if you have only 10
New bugs, it shouldn't be very slow to compute. For counts above some
threshold (where the DB wants to do a table scan), you could use
aggregate numbers and just truncate them. So instead of seeing 81,962
New, you would see 82k New. (That makes it hard to truncate at, eg 100
bugs, though.)


Anyway, my big concern is that it turns into a BranchRevision table.
Which AFAIK also has good selectivity but its sheer size makes it very
cumbersome to deal with. Also, as far as accuracy, if you only update it
in Batch mode, then it is going to have the same accuracy problem that
memcache did. And you'll close the last bug without the satisfaction of
seeing "0 New". (There is a real feedback loop of 'I accomplished
something' that I think you want to preserve.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2lhcMACgkQJdeBCYSNAANgrwCfd0ESH3aqX6G0S6+jHOaJLplf
HrwAoMXymlODQ4U9i5PqVi/BlHDOZ36U
=hFdD
-----END PGP SIGNATURE-----



Follow ups

References