mahara-contributors team mailing list archive
-
mahara-contributors team
-
Mailing list archive
-
Message #12963
[Bug 1072972] Re: Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'
I tried it using the latest version of PREG_CLASS_SEARCH_EXCLUDE from
Drupal 6, and the value of PREG_CLASS_UNICODE_WORD_BOUNDARY in Drupal 7.
Using the latest from Drupal 6 resolves the bug, while using the new
constant from Drupal 7 the bug still occurs, because as Robert noted, it
doesn't include the Prolonged Sound mark.
So, I've pushed a patch to gerrit with the latest Drupal 6 version:
https://reviews.mahara.org/2394
Here's how to test it:
1. Go to Adminisration/Configure Site
2. Under "Search Settings", make sure you've got the "internal" search plugin activated
3. Under "General Settings", tick the "Enable Profile Search" feature
4. Create a journal entry whose text contains the string サーバー which means "server".
5. Create another journal entry, whose text contains the string サバ which means "mackerel".
6. Navigate to Portfolio->Pages.
7. You should have a sideblock called "Search my portfolio". Search for サバ
Expected Result: You will only find the journal entry containing サバ
Erroneous Result: You will find both journal entries, the one with サーバー and the one with サバ
--
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contrib members
https://bugs.launchpad.net/bugs/1072972
Title:
Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'
Status in Mahara ePortfolio:
In Progress
Status in Mahara 1.7 series:
New
Bug description:
Mahara's (1.5.6) internal search cannot handle Japanese character
'KATAKANA-HIRAGANA PROLONGED SOUND MARK'. This character 'ー' is
frequently used. For example 'データ (data)', 'サーバー (server)' or
'ポートフォリオ (portfolio)'.
The cause of problem is line 1102 in search/internal/lib.php.
1102: $text = preg_replace('/['. PREG_CLASS_SEARCH_EXCLUDE .
']+/u', ' ', $text);
In this line, mahara replaces special characters specified by
PREG_CLASS_SEARCH_EXCLUDE with ' '. And 'KATAKANA-HIRAGANA PROLONGED
SOUND MARK' is included in PREG_CLASS_SEARCH_EXCLUDE.
The solution of this problem is very simple. Just remove 'KATAKANA-
HIRAGANA PROLONGED SOUND MARK' (code 0x30fc) from
PREG_CLASS_SEARCH_EXCLUDE. We can find the definition on line
1198-1225.
1221:
'\x{3099}-\x{309e}\x{30a0}\x{30fb}-\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'.
should be replaced with
1221:
'\x{3099}-\x{309e}\x{30a0}\x{30fb}\x{30fd}\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'.
P.S. The definition of PREG_CLASS_SEARCH_EXCLUDE is originally from
Drupal, and this fix was already applied.
http://api.drupal.org/api/drupal/modules!search!search.module/constant/PREG_CLASS_SEARCH_EXCLUDE/6
To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1072972/+subscriptions
References