← Back to team overview

launchpad-dev team mailing list archive

Re: RFC: Should we keep the logical operators "&|!-" as full text search operators?

 

On 17.07.2012 18:01, Curtis Hovey wrote:
> On 07/17/2012 11:38 AM, Abel Deuring wrote:
>> I am working currently on
>> https://bugs.launchpad.net/launchpad/+bug/1020443 (Full text search
>> broken for certain search terms having a "bad combination" of
>> punctuation characters, like "?!.". This a fallout bug from my previous
>> work on another full text search related bug:
>> https://bugs.launchpad.net/launchpad/+bug/29713 )
>>
>> As explained by stub in a comment, the stored procedure ftq() does no
>> longer
>>
>> I see two options to fix this bug:
>>
>> (A) We can either fix the immediate problem (the fix would be quite
>> simple) and keep the feature "treat the characters '&|!' as logical
>> operators in full text searches".
>>
>> (B) Let ftq() simply remove "&|!" from queries.
> 
> I like option B because you also get to close
> #69628 Need to advertise "OR"/"|" operator for searches

Right, that's a nice side effect ;)

> 
> It also makes it easier to fix this bug
> #660283 Bug search pages should document valid search expressions

Agreed. Documenting the core features should not be too difficult, but
the full text search still has some quirks^Wfeatures, like those
described in bug 29227

> 
> PS. Maybe these bugs are fixable now
> #29227 Full text search only understands whitespace as a word seperator

No, this bug has a different cause. Postgres' text parser can detect a
number of different tokens: Most words are detected as "asciiwords" or
"words", but the string mentioned in this bug, "/dev/pmu", is detected
as a file/path name and stored as a whole in the FTI. But it would be
useful to review how LP's search and indexing machinery is configured.
This is not the only problem with non-word/asciiword tokens: Bug 1015511
and bug 1015519 show other issues.

> #56244 Can't search for phrases in bug reports

That would boil down to checking the position of all words in the index.
I believe that this is not supported out of the box by Postgres. But the
FTI stores already the position of words.

> #111956 Cannot search for identifier containing underscores

No, my work on bug 1020443 will not fix this, but I think that a fix for
bug 56244 could be easily extended to fix this one too.


References