[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ayatana] something wrong ??



On 10/04/2011 10:57 AM, Matthew Paul Thomas wrote:
Mikkel Kamstrup Erlandsen wrote on 03/10/11 14:16:
>
On 10/03/2011 03:06 PM, Mikkel Kamstrup Erlandsen wrote:
>>
On 10/03/2011 01:26 PM, Anup Verma wrote:
...
Let us search for MultiGet. When I write "Mult" in the search
MultiGet appears at the 6th position. As soon as I add 'i', I see
that surprisingly, MultiGet now appears at 10th position with some
rather irrelevant options before it.
...
Looking more into this I realise what the real problem is. The problem
lies in applications which contain the term "multi" as a single token.
Fx. applications which incorrectly spells "multimedia" as
"multi-media".

None of them do.

Thanks


This latter form becomes indexed as two words "multi" and "media".
This gives and exact match on the word "multi" when searching and this
ranks higher than a prefix-match on words such as "MultiGet".

Academics aside - the fix is still to make sure that any form of match
in the app title scores higher than matches in the description.
...

I expect that would make things even worse, as MultiGet (which at least has "multi-" in its description) would then face stiffer competition from Auto Multiple Choice, Multiple Screens, Multilingual Terminal, Multiplication Puzzle, Multimedia Systems Selector, etc.

But if you'd like to try, feel free to make your own branch, and compare its results with 5.0 on <https://wiki.ubuntu.com/SoftwareCenter/SearchTesting>.



As I suggested elsewhere in this thread we may have more luck if we simply tokenize CamelCase words. This is unfortunately not very easy to do in Xapian, and hacking around it will be likely to have side effects such as breaking CJK search.

For someone interested in doing this I'd suggest patching Xapian::TermGenerator to add a flag that turns on CamelCase tokenization. That will provide an upstreamable solution that will be easy and clean to test in Ubuntu applications.

Cheers,
Mikkel