launchpad-dev team mailing list archive

Thread
Date

Re: APIs and len() of collections

To: launchpad-dev@xxxxxxxxxxxxxxxxxxx
From: Aaron Bentley <aaron@xxxxxxxxxxxxx>
Date: Mon, 26 Jul 2010 11:02:31 -0400
In-reply-to: <AANLkTimXJnKYRuGxc9+ogDjUaA4FYm6sms9YR=aNwSd6@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100528 Thunderbird/3.0.5

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/24/2010 05:08 AM, Robert Collins wrote:
> I know that there is some work underway at the moment to defer the
> point where we call len() on collections. I'd like feedback on an even
> more ambitious proposal:
> 
>  - not calling len() ever

What supports len() that is not a collection?

> I need input here - where do people use len(), why do they use len(),
> what would the impact of nuking it be? We need this input to build
> better interfaces - ones that scale and perform well.

I use len when testing for the empty list.  I don't think that
evaluating the boolean value of a list is precise enough, especially
since lists are sometimes None, and that can have radically different
semantics from [].

I recently used len when calculating the median duration of a build.

I also used it to decide how many recent recipe builds to show, based on
the number of pending builds.

> Some inputs that lead me to proposing this goal:
>  - len() is a precise interface
> 
>  - highly precise counting is extremely expensive.

Are you sure?  I believe that highly-precise counting *may* be
expensive, but I believe that there are counterexamples where it's quite
cheap, e.g. python lists.

>  - the results of such counting are also stale almost immediately:
> API's query in separate transactions each time
> 
>  - its not useful for users [200000 open bugs vs 200001 is a
> near-valueless distinction]

How often is it sensible to deal with that many bugs at once anyhow?

I hold that there are huge distinctions between zero and one, and pretty
significant distinctions between values in the range of 1-10.  After
that, the significance of a single increment gets pretty small.

> So far, I've thought of two replacement interfaces:
>  - estimate_size(collection) => {0..99, hundreds, thousands, millions...}
>    This would be used for providing UI feedback on collections
> 
>  - closed_since|changed_since parameters on various searches, so that
> the use of len() to generate trend lines is able to be done - we can
> precisely identify recent work without precisely identifying total
> unfiltered collection size.
> 
> What do you think?

I don't think your rationale justifies avoiding len entirely.  I also
could have sworn that ResultSet didn't support len anyhow-- if I want
the number of results, I call ResultSet.count.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxNo4cACgkQ0F+nu1YWqI0OAwCfbKAbywTsPzihXxU6Kr979xYH
u84AoIAojzoPqvPpXE/Sro/tyDKg/rSr
=De6U
-----END PGP SIGNATURE-----

References

APIs and len() of collections
From: Robert Collins, 2010-07-24