← Back to team overview

launchpad-dev team mailing list archive

Re: APIs and len() of collections

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/24/2010 05:08 AM, Robert Collins wrote:
> I know that there is some work underway at the moment to defer the
> point where we call len() on collections. I'd like feedback on an even
> more ambitious proposal:
> 
>  - not calling len() ever

What supports len() that is not a collection?

> I need input here - where do people use len(), why do they use len(),
> what would the impact of nuking it be? We need this input to build
> better interfaces - ones that scale and perform well.

I use len when testing for the empty list.  I don't think that
evaluating the boolean value of a list is precise enough, especially
since lists are sometimes None, and that can have radically different
semantics from [].

I recently used len when calculating the median duration of a build.

I also used it to decide how many recent recipe builds to show, based on
the number of pending builds.

> Some inputs that lead me to proposing this goal:
>  - len() is a precise interface
> 
>  - highly precise counting is extremely expensive.

Are you sure?  I believe that highly-precise counting *may* be
expensive, but I believe that there are counterexamples where it's quite
cheap, e.g. python lists.


>  - the results of such counting are also stale almost immediately:
> API's query in separate transactions each time
> 
>  - its not useful for users [200000 open bugs vs 200001 is a
> near-valueless distinction]

How often is it sensible to deal with that many bugs at once anyhow?

I hold that there are huge distinctions between zero and one, and pretty
significant distinctions between values in the range of 1-10.  After
that, the significance of a single increment gets pretty small.


> So far, I've thought of two replacement interfaces:
>  - estimate_size(collection) => {0..99, hundreds, thousands, millions...}
>    This would be used for providing UI feedback on collections
> 
>  - closed_since|changed_since parameters on various searches, so that
> the use of len() to generate trend lines is able to be done - we can
> precisely identify recent work without precisely identifying total
> unfiltered collection size.
> 
> What do you think?

I don't think your rationale justifies avoiding len entirely.  I also
could have sworn that ResultSet didn't support len anyhow-- if I want
the number of results, I call ResultSet.count.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxNo4cACgkQ0F+nu1YWqI0OAwCfbKAbywTsPzihXxU6Kr979xYH
u84AoIAojzoPqvPpXE/Sro/tyDKg/rSr
=De6U
-----END PGP SIGNATURE-----



References