← Back to team overview

launchpad-dev team mailing list archive

Re: "subscribe to search": implementation questions

 

On Tue, Aug 17, 2010 at 10:34 PM, Abel Deuring
<abel.deuring@xxxxxxxxxxxxx> wrote:
> On 17.08.2010 12:16, Robert Collins wrote:
>> On Tue, Aug 17, 2010 at 10:12 PM, Abel Deuring
>> <abel.deuring@xxxxxxxxxxxxx> wrote:
>>> On 17.08.2010 11:45, Robert Collins wrote:
>>>> So there is some conflation/confusion here I think.
>>>>
>>>> *subscribing* to a ft search - +1
>>>>
>>>> putting a tsearch vector in the *subscription* - I'm lost why that is useful.
>>>
>>> It's not a tsearch vector but a tsquery I want to store :)
>>>
>>> If you have a number of subscriptions to a full text search -- how else
>>> would you remove the not matching searches in something like
>>>
>>>  SELECT whatever FROM bugsubscription
>>>   WHERE bugsubscription.bug=our_current_bug_id
>>>     AND there_is_a_match(
>>>        (SELECT full_text FROM bug WHERE id=our_current_bug_id),
>>>        bugsubscription.fulltext_search_words)
>>>
>>> With a canned tsquery you can use an WHERE expression like
>>>
>>>  bugsubscription.tsquery @@ bug.fti
>>
>> I'd _really_ like to see a performance test of that; if it behaves
>> like some of the ts2 stuff we may be very disappointed.
>
> Admittedly, I don't expect such a query to be very fast. But remember:
> We are not talking about web requests but about a script (or a job) that
> should generate emails not-too-long after a bug has been filed, somebody
> had commented on a bug, after bug status changes etc. I am all for
> using/writing efficient code, and if we get millions of subscriptions,
> performance is indeed an issue -- but if a script runs 10 or 30 seconds
> for a few hundred or thousand subscriptions, does not really matter, I
> think. (The same applies, BTW, for filtering on Python level.)
>
> Abel

Respectfully, I have to disagree.

Slow processing means high consumption of resources. If it takes 30
seconds to process a single bug subscription notifications, and we
have more than 1 bug filed every 30 seconds: we'll need 2 concurrent
tasks doing nothing but that.

I'm of the opinion that there are extremely few places in our system
where performance does not matter.

Its ok to say 'we will start with something that will be tolerable,
and iterate to faster' - but we have to have done *something* to
convince ourselves that tolerable will be the starting point.

-Rob



Follow ups

References