← Back to team overview

mahara-contributors team mailing list archive

[Bug 1072972] Re: Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'

 

Hi Takahiro,

Sorry it's been so long since there's been an update on this.

I was unable to replicate the problem. I created a user with the first
name "ポートフォリオ", and searched for "ポートフォリオ" in the users and found them.
Likewise, I created a page called "ポートフォリオ", and searched for that on
the "Groups -> Shared pages" page, and was also able to find it there.

Searching for "ー" by itself didn't work for the user search, but that
seems to be because it's set for exact matching only. Searching for "ー"
in the page search, however, did successfully pull up my page titled
"ポートフォリオ".

What part of the site is the functionality currently not working on?

Cheers,
Aaron

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
https://bugs.launchpad.net/bugs/1072972

Title:
  Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'

Status in Mahara ePortfolio:
  In Progress

Bug description:
  Mahara's (1.5.6) internal search cannot handle Japanese character
  'KATAKANA-HIRAGANA PROLONGED SOUND MARK'.  This character 'ー' is
  frequently used.  For example 'データ (data)', 'サーバー (server)' or
  'ポートフォリオ (portfolio)'.

  The cause of problem is line 1102 in search/internal/lib.php.

  1102:        $text = preg_replace('/['. PREG_CLASS_SEARCH_EXCLUDE .
  ']+/u', ' ', $text);

  In this line, mahara replaces special characters specified by
  PREG_CLASS_SEARCH_EXCLUDE with ' '.  And 'KATAKANA-HIRAGANA PROLONGED
  SOUND MARK' is included in PREG_CLASS_SEARCH_EXCLUDE.

  The solution of this problem is very simple.  Just remove 'KATAKANA-
  HIRAGANA PROLONGED SOUND MARK' (code 0x30fc) from
  PREG_CLASS_SEARCH_EXCLUDE.  We can find the definition on line
  1198-1225.

  1221:
  '\x{3099}-\x{309e}\x{30a0}\x{30fb}-\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'.

  should be replaced with

  1221:
  '\x{3099}-\x{309e}\x{30a0}\x{30fb}\x{30fd}\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'.


  P.S. The definition of PREG_CLASS_SEARCH_EXCLUDE is originally from
  Drupal, and this fix was already applied.

  http://api.drupal.org/api/drupal/modules!search!search.module/constant/PREG_CLASS_SEARCH_EXCLUDE/6

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1072972/+subscriptions


References