enterprise-support team mailing list archive
-
enterprise-support team
-
Mailing list archive
-
Message #06152
[Bug 1679135] [NEW] Ignore INNODB_FT_DEFAULT_STOPWORD for ngram indexes
Public bug reported:
Originally reported at https://bugs.mysql.com/bug.php?id=84420
[5 Jan 11:19] Miguel Angel Nieto
Description:
Ngram indexes also check the stopwords list, to see if any indexed element *contain* one of the words on that list. This looks good and it is the normal behaviour, but I don't think that the default table is suitable to use it with ngram.
For example, any item that contains 'a' or 'i' will be ignored. So for
example, if you have word "east", you cannot search for "ea" because it
has been ignored.
Ngram should have a different default list of stopwords, or an empty
list.
How to repeat:
mysql> CREATE TABLE `articles` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`body` text,
PRIMARY KEY (`id`),
FULLTEXT KEY `ftx` (`body`) /*!50100 WITH PARSER `ngram` */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
mysql> insert into articles (body) values ('east');
mysql> insert into articles (body) values ('east area');
mysql> insert into articles (body) values ('east job');
mysql> insert into articles (body) values ('eastnation');
mysql> insert into articles (body) values ('eastway, try try');
mysql> SELECT * FROM articles WHERE MATCH(body) AGAINST('ea' IN BOOLEAN MODE);
Empty set (0.00 sec)
====
There is a workaround for this bug: create custom
INNODB_FT_DEFAULT_STOPWORD table for ngram indexes. But issue with this
workaround is that such a table used by other fulltext indexes, such as
mecab.
Suggested fix: either have special INNODB_FT_DEFAULT_STOPWORD table for
ngram indexes or ignore it at all.
There is also code in fts_check_token:
4791 bool
4792 fts_check_token(
4793 const fts_string_t* token,
4794 const ib_rbt_t* stopwords,
4795 bool is_ngram,
4796 const CHARSET_INFO* cs)
4797 {
4798 ut_ad(cs != NULL || stopwords == NULL);
4799
4800 if (!is_ngram) {
4801 ib_rbt_bound_t parent;
4802
4803 if (token->f_n_char < fts_min_token_size
4804 || token->f_n_char > fts_max_token_size
4805 || (stopwords != NULL
4806 && rbt_search(stopwords, &parent, token) == 0)) {
4807 return(false);
4808 } else {
4809 return(true);
4810 }
4811 }
4812
4813 /* Check token for ngram. */
4814 DBUG_EXECUTE_IF(
4815 "fts_instrument_ignore_ngram_check",
4816 return(true);
4817 );
So only job is to replace DBUG_EXECUTE_IF with some new option.
** Affects: mysql-server
Importance: Unknown
Status: Unknown
** Affects: percona-server
Importance: Undecided
Status: Confirmed
** Affects: percona-server/5.5
Importance: Undecided
Status: Invalid
** Affects: percona-server/5.6
Importance: Undecided
Status: Invalid
** Affects: percona-server/5.7
Importance: Undecided
Status: Confirmed
** Tags: i180635
** Also affects: percona-server/5.5
Importance: Undecided
Status: New
** Also affects: percona-server/5.7
Importance: Undecided
Status: Confirmed
** Also affects: percona-server/5.6
Importance: Undecided
Status: New
** Changed in: percona-server/5.6
Status: New => Invalid
** Changed in: percona-server/5.5
Status: New => Invalid
** Bug watch added: MySQL Bug System #84420
http://bugs.mysql.com/bug.php?id=84420
** Also affects: mysql-server via
http://bugs.mysql.com/bug.php?id=84420
Importance: Unknown
Status: Unknown
--
You received this bug notification because you are a member of Ubuntu
Server/Client Support Team, which is subscribed to MySQL.
Matching subscriptions: Ubuntu Server/Client Support Team
https://bugs.launchpad.net/bugs/1679135
Title:
Ignore INNODB_FT_DEFAULT_STOPWORD for ngram indexes
To manage notifications about this bug go to:
https://bugs.launchpad.net/mysql-server/+bug/1679135/+subscriptions
Follow ups