launchpad-dev team mailing list archive

Thread
Date

Re: APIs and len() of collections

To: Robert Collins <robert.collins@xxxxxxxxxxxxx>
From: Martin Pool <mbp@xxxxxxxxxxxxx>
Date: Sat, 24 Jul 2010 12:17:49 +0200
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <AANLkTimXJnKYRuGxc9+ogDjUaA4FYm6sms9YR=aNwSd6@mail.gmail.com>
Sender: martinpool@xxxxxxxxx

On 24 July 2010 11:08, Robert Collins <robert.collins@xxxxxxxxxxxxx> wrote:
> I know that there is some work underway at the moment to defer the
> point where we call len() on collections. I'd like feedback on an even
> more ambitious proposal:
>
>  - not calling len() ever
>
> I need input here - where do people use len(), why do they use len(),
> what would the impact of nuking it be? We need this input to build
> better interfaces - ones that scale and perform well.

Making a general distinction between "doesn't need the length", "needs
approximate length", "needs the precise length" may be useful and
pushing things down the scope may be useful.  I think there are cases
where we need both ends of the spectrum: I don't care about seeing the
number of bugs in a batched list, but it is useful to see the number
of open or high bugs.

At the moment some API clients want the precise length (to plot it
etc) and don't get it (because I think of lplib bugs that are now
being fixed?)  But, tragically, they do pay the cost of calculating
the full length.  Perhaps we should reconsider what conceptual API we
present.  One other notable batching problem (for which there is a
bug) is that the collection being batched is sometimes hilariously
unstable, or at best somewhat unstable.

istm to me that api clients want to say:
 - just tell me how many high bugs so I can plot it, I don't actually want them
 - stream me all the inprogress bugs so I can do a batch operation on
them; I don't care about knowing the length up front but I do want to
get them all exactly once, ideally without round trips
 - give me just some new bugs so I can ask a human to triage them; I
don't expect the user to actually triage all the new bugs so I don't
want all of them, and I don't really care about the whole length.
(Interesting cases here if we later want now "show me some more".)

I don't think batching is a great fit for any of these.  In the second
case it causes round trips and inconsistency, and if the client really
does want to read 100000 bug making them do 1000 round trips is not
really helping Launchpad.

If we at the moment repeat queries of "tell me how many bugs" "give me
the first 20" then we wouldn't want to just push that antipattern into
the api clients.

Adding an approximate length depends on us actually being able to
approximate it cheaply and easily.  Perhaps we should start by pushing
things into the "no length at all" category when we can.

It seem like many sorted queries pg could actually know the total
length of the set, but does it expose that if we impose a limit?

-- 
Martin

References

APIs and len() of collections
From: Robert Collins, 2010-07-24