maria-developers team mailing list archive
Mailing list archive
Re: regexp review
Thanks for review.
On 09/30/2013 11:00 AM, Oleksandr Byelkin wrote:
Everything is OK.
But there are some small issues:
--- mysql-test/include/ctype_utf8mb4.inc 2010-03-05 08:17:19 +0000
+++ mysql-test/include/ctype_utf8mb4.inc 2013-09-26 14:02:17 +0000
@@ -234,15 +234,15 @@ set names utf8mb4;
set names utf8mb4;
# This should return TRUE
-select 'вася' rlike '[[:<:]]вася[[:>:]]';
-select 'вася ' rlike '[[:<:]]вася[[:>:]]';
-select ' вася' rlike '[[:<:]]вася[[:>:]]';
-select ' вася ' rlike '[[:<:]]вася[[:>:]]';
+select 'вася' rlike '\\bвася\\b';
+select 'вася ' rlike '\\bвася\\b';
+select ' вася' rlike '\\bвася\\b';
+select ' вася ' rlike '\\bвася\\b';
Is above unsupported pattern?
The above is a non-standard pattern which was supported by
the Henry Spencer regex library which we bundle in the /regex
This is what regex/regex.7 says:
There are two special cases|.- of bracket expressions: the bracket
expressions `[[:<:]]' and `[[:>:]]' match the null string at the begin-
ning and end of a word respectively. A word is defined as a sequence
of word characters which is neither preceded nor followed by word char-
acters. A word character is an alnum character (as defined by
ctype(3)) or an underscore. This is an extension, compatible with but
not specified by POSIX 1003.2, and should be used with caution in soft-
ware intended to be portable to other systems.
I added this test a few years ago into MySQL after reading regex.7.
(it was not a part of any bug report).
This incompatibility should not be a big problem.
=== modified file 'sql/mysqld.cc'
--- sql/mysqld.cc 2013-09-18 11:07:31 +0000
+++ sql/mysqld.cc 2013-09-26 14:02:17 +0000
@@ -1898,7 +1898,7 @@ void clean_up(bool print_message)
/* End the debug sync facility. See debug_sync.cc. */
@@ -3904,10 +3904,10 @@ static int init_common_variables()
- my_regex_init(&my_charset_latin1, check_enough_stack_size);
+ //my_regex_init(&my_charset_latin1, check_enough_stack_size);
- my_regex_init(&my_charset_latin1, NULL);
+ //my_regex_init(&my_charset_latin1, NULL);
Process a comma-separated character set list and choose
Remove it please (I think it was just forgotten).
Done. Thanks for noticing this.
As discussed on IRC, there is still one issue left
related to the crash in pcre_compile() when using a recursive
pattern with a lot of nested parenthesizes, like:
SELECT 'x' RLIKE CONCAT(
I.e. when the pattern is like '((((((x))))))' but with
more nested levels. Crash happens because pcre_compile()
goes recursively and eats up all available stack.
I'm currently discussing this problem on email with Philip Hazel
(the author of PCRE). If he does not have quick ideas how
to fix this, we'll just use the same trick that we used
with the old regex library.
See my_regex_enough_mem_in_stack in regex/regcomp.c.
I'll tell you what we ended up with.