← Back to team overview

launchpad-dev team mailing list archive

Re: "subscribe to search": implementation questions

 

On 17.08.2010 12:40, Robert Collins wrote:
> On Tue, Aug 17, 2010 at 10:34 PM, Abel Deuring
> <abel.deuring@xxxxxxxxxxxxx> wrote:
>> On 17.08.2010 12:16, Robert Collins wrote:
>>> On Tue, Aug 17, 2010 at 10:12 PM, Abel Deuring
>>> <abel.deuring@xxxxxxxxxxxxx> wrote:
>>>> On 17.08.2010 11:45, Robert Collins wrote:
>>>>> So there is some conflation/confusion here I think.
>>>>>
>>>>> *subscribing* to a ft search - +1
>>>>>
>>>>> putting a tsearch vector in the *subscription* - I'm lost why that is useful.
>>>>
>>>> It's not a tsearch vector but a tsquery I want to store :)
>>>>
>>>> If you have a number of subscriptions to a full text search -- how else
>>>> would you remove the not matching searches in something like
>>>>
>>>>  SELECT whatever FROM bugsubscription
>>>>   WHERE bugsubscription.bug=our_current_bug_id
>>>>     AND there_is_a_match(
>>>>        (SELECT full_text FROM bug WHERE id=our_current_bug_id),
>>>>        bugsubscription.fulltext_search_words)
>>>>
>>>> With a canned tsquery you can use an WHERE expression like
>>>>
>>>>  bugsubscription.tsquery @@ bug.fti
>>>
>>> I'd _really_ like to see a performance test of that; if it behaves
>>> like some of the ts2 stuff we may be very disappointed.
>>
>> Admittedly, I don't expect such a query to be very fast. But remember:
>> We are not talking about web requests but about a script (or a job) that
>> should generate emails not-too-long after a bug has been filed, somebody
>> had commented on a bug, after bug status changes etc. I am all for
>> using/writing efficient code, and if we get millions of subscriptions,
>> performance is indeed an issue -- but if a script runs 10 or 30 seconds
>> for a few hundred or thousand subscriptions, does not really matter, I
>> think. (The same applies, BTW, for filtering on Python level.)
>>
>> Abel
> 
> Respectfully, I have to disagree.
> 
> Slow processing means high consumption of resources. If it takes 30
> seconds to process a single bug subscription notifications, and we
> have more than 1 bug filed every 30 seconds: we'll need 2 concurrent
> tasks doing nothing but that.

Agreed, a processing time in the order of dozens of seconds would be an
issue. But I think nevertheless that it is worth a try to allow a full
text search for bug subscriptions. even a sequential search should not
take very long, because we check just one bug/FTI vector. I suspect that
delivering bug mail, or "spamming" the mail server with bug mail, is
much more likely to cause congestion problems.

> 
> I'm of the opinion that there are extremely few places in our system
> where performance does not matter.
> 
> Its ok to say 'we will start with something that will be tolerable,
> and iterate to faster' - but we have to have done *something* to
> convince ourselves that tolerable will be the starting point.

OK, so let's try to use queries with WHERE expressions containing
"bug.fti @@ subscription.tsquery".

Abel



References