← Back to team overview

mahara-contributors team mailing list archive

[Bug 1072972] Re: Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'

 

I tried it using the latest version of PREG_CLASS_SEARCH_EXCLUDE from
Drupal 6, and the value of PREG_CLASS_UNICODE_WORD_BOUNDARY in Drupal 7.
Using the latest from Drupal 6 resolves the bug, while using the new
constant from Drupal 7 the bug still occurs, because as Robert noted, it
doesn't include the Prolonged Sound mark.

So, I've pushed a patch to gerrit with the latest Drupal 6 version:
https://reviews.mahara.org/2394

Here's how to test it:

1. Go to Adminisration/Configure Site
2. Under "Search Settings", make sure you've got the "internal" search plugin activated
3. Under "General Settings", tick the "Enable Profile Search" feature
4. Create a journal entry whose text contains the string サーバー which means "server".
5. Create another journal entry, whose text contains the string サバ which means "mackerel".
6. Navigate to Portfolio->Pages.
7. You should have a sideblock called "Search my portfolio". Search for サバ

Expected Result: You will only find the journal entry containing サバ
Erroneous Result: You will find both journal entries, the one with サーバー and the one with サバ

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contrib members
https://bugs.launchpad.net/bugs/1072972

Title:
  Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'

Status in Mahara ePortfolio:
  In Progress
Status in Mahara 1.7 series:
  New

Bug description:
  Mahara's (1.5.6) internal search cannot handle Japanese character
  'KATAKANA-HIRAGANA PROLONGED SOUND MARK'.  This character 'ー' is
  frequently used.  For example 'データ (data)', 'サーバー (server)' or
  'ポートフォリオ (portfolio)'.

  The cause of problem is line 1102 in search/internal/lib.php.

  1102:        $text = preg_replace('/['. PREG_CLASS_SEARCH_EXCLUDE .
  ']+/u', ' ', $text);

  In this line, mahara replaces special characters specified by
  PREG_CLASS_SEARCH_EXCLUDE with ' '.  And 'KATAKANA-HIRAGANA PROLONGED
  SOUND MARK' is included in PREG_CLASS_SEARCH_EXCLUDE.

  The solution of this problem is very simple.  Just remove 'KATAKANA-
  HIRAGANA PROLONGED SOUND MARK' (code 0x30fc) from
  PREG_CLASS_SEARCH_EXCLUDE.  We can find the definition on line
  1198-1225.

  1221:
  '\x{3099}-\x{309e}\x{30a0}\x{30fb}-\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'.

  should be replaced with

  1221:
  '\x{3099}-\x{309e}\x{30a0}\x{30fb}\x{30fd}\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'.


  P.S. The definition of PREG_CLASS_SEARCH_EXCLUDE is originally from
  Drupal, and this fix was already applied.

  http://api.drupal.org/api/drupal/modules!search!search.module/constant/PREG_CLASS_SEARCH_EXCLUDE/6

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1072972/+subscriptions


References