← Back to team overview

enterprise-support team mailing list archive

[Bug 1689268] [NEW] Fulltext search can not find word which contains punctuation marks

 

Public bug reported:

Originally reported at https://bugs.mysql.com/bug.php?id=86164

Description:
Fulltext search can not find word which contains punctuation marks like "!?,.{}/" in boolean mode BUT it does in natural language mode.

There is one case reported for "." and "," but according our tests it
includes much more punctuation marks.

The reported case is https://bugs.mysql.com/bug.php?id=85876

How to repeat:
The problem can be reproduced as follows:

================================= 
CREATE TABLE `test` ( 
`seq` int(11) NOT NULL AUTO_INCREMENT, 
`cont` text COLLATE utf8mb4_bin, 
PRIMARY KEY (`seq`), 
FULLTEXT KEY `fx_txts` (`cont`) /*!50100 WITH PARSER `ngram` */ 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

insert into test (cont) values ('aBc!efg'); 
insert into test (cont) values ('aBc@efg'); 
insert into test (cont) values ('aBc#efg'); 
insert into test (cont) values ('aBc$efg'); 
insert into test (cont) values ('aBc%efg'); 
insert into test (cont) values ('abc^efg'); 
insert into test (cont) values ('abc&efg'); 
insert into test (cont) values ('abc*efg'); 
insert into test (cont) values ('abc(efg'); 
insert into test (cont) values ('abc)efg');

Then try to select whatever value using boolean mode:
mysql [localhost] {msandbox} (test) > select * from test where match(cont) against ('"c!e"' in boolean mode); 
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from test where match(cont) against ('"c#e"' in boolean mode); 
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from test where match(cont) against ('"c$e"' in boolean mode); 
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from test where match(cont) against ('"c!e"' in boolean mode);
Empty set (0.00 sec)

If change to natural language mode it works:
mysql [localhost] {msandbox} (test) > select * from test where match(cont) against ('"c!e"' in natural language mode);
+-----+---------+
| seq | cont    |
+-----+---------+
|   1 | aBc!efg |
|  11 | abc!efg |
+-----+---------+
2 rows in set (0.00 sec)

The index entries are created as can be seen in
INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE:

mysql [localhost] {msandbox} (test) > SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE;                                                                                                                                                 +------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+------+--------------+-------------+-----------+--------+----------+
| !e   |            2 |          12 |         2 |      2 |        3 |
| !e   |            2 |          12 |         2 |     12 |        3 |
| #e   |            4 |           4 |         1 |      4 |        3 |
| $e   |            5 |           5 |         1 |      5 |        3 |
| %e   |            6 |           6 |         1 |      6 |        3 |
| &e   |            8 |           8 |         1 |      8 |        3 |
| (e   |           10 |          10 |         1 |     10 |        3 |
| )e   |           11 |          11 |         1 |     11 |        3 |
| *e   |            9 |           9 |         1 |      9 |        3 |
| @e   |            3 |           3 |         1 |      3 |        3 |
| ^e   |            7 |           7 |         1 |      7 |        3 |
<...>
| c!   |            2 |          12 |         2 |      2 |        2 |
| c!   |            2 |          12 |         2 |     12 |        2 |
| c#   |            4 |           4 |         1 |      4 |        2 |
| c$   |            5 |           5 |         1 |      5 |        2 |
| c%   |            6 |           6 |         1 |      6 |        2 |
| c&   |            8 |           8 |         1 |      8 |        2 |
| c(   |           10 |          10 |         1 |     10 |        2 |
| c)   |           11 |          11 |         1 |     11 |        2 |
| c*   |            9 |           9 |         1 |      9 |        2 |
| c@   |            3 |           3 |         1 |      3 |        2 |
| c^   |            7 |           7 |         1 |      7 |        2 |
<...>
+------+--------------+-------------+-----------+--------+----------+
55 rows in set (0.00 sec)

** Affects: mysql-server
     Importance: Unknown
         Status: Unknown

** Affects: percona-server
     Importance: Undecided
         Status: Confirmed

** Affects: percona-server/5.6
     Importance: Undecided
         Status: Confirmed

** Affects: percona-server/5.7
     Importance: Undecided
         Status: Confirmed


** Tags: i185729

** Also affects: percona-server/5.6
   Importance: Undecided
       Status: New

** Also affects: percona-server/5.7
   Importance: Undecided
       Status: Confirmed

** Changed in: percona-server/5.6
       Status: New => Confirmed

** Bug watch added: MySQL Bug System #86164
   http://bugs.mysql.com/bug.php?id=86164

** Also affects: mysql-server via
   http://bugs.mysql.com/bug.php?id=86164
   Importance: Unknown
       Status: Unknown

-- 
You received this bug notification because you are a member of Ubuntu
Server/Client Support Team, which is subscribed to MySQL.
Matching subscriptions: Ubuntu Server/Client Support Team
https://bugs.launchpad.net/bugs/1689268

Title:
  Fulltext search can not find word which contains punctuation marks

To manage notifications about this bug go to:
https://bugs.launchpad.net/mysql-server/+bug/1689268/+subscriptions


Follow ups