← Back to team overview

launchpad-dev team mailing list archive

Re: Is it ok to report slightly inflated bug counts in portlets; counting bugs; accuracy vs speed

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...
>> I'm a little curious about 'blowout'. Because you have one row per
>> tag*status*milestone. 'status' is fortunately bounded, but milestone can
>> grow considerably. (In case of bzr, we currently have 150 milestones on
>> LP, though only a few open ones.)
>>
>> 'tag' also has the tendency to accrue many items. I think you mentioned
>> that Launchpad has 151 *official* tags, and I don't know how many more
>> unofficial tags you have.
>>
>> So for a bzr/launchpad target, you could have 150*151*10 = 226,500 rows.
>> I'm not 100% sure how many actual bugs we have, but it is on the order
>> of <5,000 bugs (including fixed, open, and invalid bugs).
>> ...
>>
>> I'm sure you could optimize and not include the full cross product. Any
>> rows that sum to 0 would not be included, etc. If I read your comments
>> correctly, the full expansion on staging was only 2.6M rows? (and 1.3M
>> were for ubuntu itself?)
>>
>> I do realize your queries are nicely efficient at getting a subset. But
>> turning a 900k BugTask table + 700k Bug table into a 2.6M row table
>> doesn't seem to be scaling in the right direction.
> 
> Its not that bad :) - and the selectivity is great.
> 
>  select id from product where name='bzr';
>   id
> ------
>  1186
> (1 row)
> 
> Time: 257.583 ms
> lpmain_staging=> select count(*) from bugsummary where product=1186;
>  count
> -------
>   1170
> (1 row)
> 
> select count(*) from bugtask where product=1186;
>  count
> -------
>   5833
> 
> 
> Time: 13.447 ms
> lpmain_staging=> select count(*) from bugsummary;
>  count
> --------
>  371308
> 
> So its 1/5th the size for bzr, and 1/3 the size for everything - but
> its one table not three, so its actually closer to 1/10th for bzr and
> 1/6th for everything.

That isn't the numbers you had on the bug.
https://bugs.launchpad.net/launchpad/+bug/758587/comments/3

lpmain_staging=> select count(*) from foo;
  count
- ---------
 2636126

That is 2.6M rows, which is very than your 370k rows you're mentioning now.

...

> Sure, but we have that problem now with memcache. What we have now is:
>  - stale figures (cached at all)
>  - cached wrongly (you can see the figures Martin did a few minutes ago)
> 
> I think we can easily move to
>  - cached (for a few minutes) and inflated-on-private-bugs only
> 
> with a second pass to get to
>  - live but inflated-on-private-bugs only
> 
> And thus the question - should we? :)
> 
> -Rob

For *me*, it would be fine. Partially because I pretty much have <10
private bugs. I don't know how people who more actively participate in
private bugs feel.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2mp2EACgkQJdeBCYSNAAP4hgCdFt37gyjpFjXduXnxCVSNViyo
anEAn3vCp3vuZV4KlggiTX2wsOnWoTrf
=T+pu
-----END PGP SIGNATURE-----



Follow ups

References