On 10/04/2011 10:57 AM, Matthew Paul Thomas wrote:
Mikkel Kamstrup Erlandsen wrote on 03/10/11 14:16: >On 10/03/2011 03:06 PM, Mikkel Kamstrup Erlandsen wrote:>>On 10/03/2011 01:26 PM, Anup Verma wrote:...Let us search for MultiGet. When I write "Mult" in the search MultiGet appears at the 6th position. As soon as I add 'i', I see that surprisingly, MultiGet now appears at 10th position with some rather irrelevant options before it.... Looking more into this I realise what the real problem is. The problem lies in applications which contain the term "multi" as a single token. Fx. applications which incorrectly spells "multimedia" as "multi-media".None of them do.
Thanks
This latter form becomes indexed as two words "multi" and "media". This gives and exact match on the word "multi" when searching and this ranks higher than a prefix-match on words such as "MultiGet". Academics aside - the fix is still to make sure that any form of match in the app title scores higher than matches in the description. ...I expect that would make things even worse, as MultiGet (which at least has "multi-" in its description) would then face stiffer competition from Auto Multiple Choice, Multiple Screens, Multilingual Terminal, Multiplication Puzzle, Multimedia Systems Selector, etc.But if you'd like to try, feel free to make your own branch, and compare its results with 5.0 on <https://wiki.ubuntu.com/SoftwareCenter/SearchTesting>.
As I suggested elsewhere in this thread we may have more luck if we simply tokenize CamelCase words. This is unfortunately not very easy to do in Xapian, and hacking around it will be likely to have side effects such as breaking CJK search.
For someone interested in doing this I'd suggest patching Xapian::TermGenerator to add a flag that turns on CamelCase tokenization. That will provide an upstreamable solution that will be easy and clean to test in Ubuntu applications.
Cheers, Mikkel