launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #06926
Re: Is it ok to report slightly inflated bug counts in portlets; counting bugs; accuracy vs speed
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
>> I'm a little curious about 'blowout'. Because you have one row per
>> tag*status*milestone. 'status' is fortunately bounded, but milestone can
>> grow considerably. (In case of bzr, we currently have 150 milestones on
>> LP, though only a few open ones.)
>>
>> 'tag' also has the tendency to accrue many items. I think you mentioned
>> that Launchpad has 151 *official* tags, and I don't know how many more
>> unofficial tags you have.
>>
>> So for a bzr/launchpad target, you could have 150*151*10 = 226,500 rows.
>> I'm not 100% sure how many actual bugs we have, but it is on the order
>> of <5,000 bugs (including fixed, open, and invalid bugs).
>> ...
>>
>> I'm sure you could optimize and not include the full cross product. Any
>> rows that sum to 0 would not be included, etc. If I read your comments
>> correctly, the full expansion on staging was only 2.6M rows? (and 1.3M
>> were for ubuntu itself?)
>>
>> I do realize your queries are nicely efficient at getting a subset. But
>> turning a 900k BugTask table + 700k Bug table into a 2.6M row table
>> doesn't seem to be scaling in the right direction.
>
> Its not that bad :) - and the selectivity is great.
>
> select id from product where name='bzr';
> id
> ------
> 1186
> (1 row)
>
> Time: 257.583 ms
> lpmain_staging=> select count(*) from bugsummary where product=1186;
> count
> -------
> 1170
> (1 row)
>
> select count(*) from bugtask where product=1186;
> count
> -------
> 5833
>
>
> Time: 13.447 ms
> lpmain_staging=> select count(*) from bugsummary;
> count
> --------
> 371308
>
> So its 1/5th the size for bzr, and 1/3 the size for everything - but
> its one table not three, so its actually closer to 1/10th for bzr and
> 1/6th for everything.
That isn't the numbers you had on the bug.
https://bugs.launchpad.net/launchpad/+bug/758587/comments/3
lpmain_staging=> select count(*) from foo;
count
- ---------
2636126
That is 2.6M rows, which is very than your 370k rows you're mentioning now.
...
> Sure, but we have that problem now with memcache. What we have now is:
> - stale figures (cached at all)
> - cached wrongly (you can see the figures Martin did a few minutes ago)
>
> I think we can easily move to
> - cached (for a few minutes) and inflated-on-private-bugs only
>
> with a second pass to get to
> - live but inflated-on-private-bugs only
>
> And thus the question - should we? :)
>
> -Rob
For *me*, it would be fine. Partially because I pretty much have <10
private bugs. I don't know how people who more actively participate in
private bugs feel.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk2mp2EACgkQJdeBCYSNAAP4hgCdFt37gyjpFjXduXnxCVSNViyo
anEAn3vCp3vuZV4KlggiTX2wsOnWoTrf
=T+pu
-----END PGP SIGNATURE-----
Follow ups
References