unity-design team mailing list archive
-
unity-design team
-
Mailing list archive
-
Message #06693
Search algorithm in Ubuntu Software Center
-
To:
ayatana@xxxxxxxxxxxxxxxxxxx
-
From:
Matthew Paul Thomas <mpt@xxxxxxxxxxxxx>
-
Date:
Mon, 03 Oct 2011 13:29:58 +0100
-
In-reply-to:
<CAL4drP1cmskL7iBzeF6U0S-g75e9RCnadYqU0YuhQYpOdY=JgQ@mail.gmail.com>
-
Organization:
Canonical Ltd
-
User-agent:
Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
Anup Verma wrote on 03/10/11 12:26:
>
Friends, don't you find any problem with the algorithm used to search
in the software center. Giving you an example:
Let us search for MultiGet. When I write "Mult" in the search MultiGet
appears at the 6th position. As soon as I add 'i', I see that
surprisingly, MultiGet now appears at 10th position with some rather
irrelevant options before it.
Not understandable...
...
These are the items that appear for "multi" before MultiGet:
* Qtractor, "a multi-track sequencer"
* qutIM, "a multi-protocol IM client"
* Teeworlds, "an online multi-player platform 2D shooter"
* BasKet, "a multi-purpose note-taking application for KDE"
* ROXTerm, "Multi-tabbed GTK/VTE terminal emulator"
* Babiloo, "dictionary viewer with multi-languages support"
* Jokosher, "Simply and easily create multi-track audio"
* Empathy, "GNOME multi-protocol chat and call client"
* ACM, "a multi-player aerial combat simulation".
Though USC does take into account the possibility that the last word you
typed is only part of a word, it gives more weight to results where that
last word is a complete word. And it can't tell the difference between a
complete word and a hyphenated prefix like "multi-".
All of those higher results use "multi-" in their summary or synopsis.
MultiGet uses "multi-" only in its description.
So that's why more items appear above MultiGet for "multi" than for
"mult": because as soon as you type the "i", the packages that use
"multi-anything" start being more important than those which use
"multianything". And the packages that use "multi-anything" in their
title or summary are treated as more important than those that use
"multi" in their name *or* "multi-" in their description.
If there is a bug here, it is that BasKet, ROXTerm, and Empathy use
"multi-" words only in their package synopsis, which USC doesn't show at
all -- so it's not obvious why they're being weighted as they are. I've
reported that now. <http://launchpad.net/bugs/865294>
Thanks
--
mpt
References