← Back to team overview

maria-developers team mailing list archive

Re: regexp review

 

Hi Sanja,


Thanks for review.


On 09/30/2013 11:00 AM, Oleksandr Byelkin wrote:
Hi!

Hi!

Everything is OK.

But there are some small issues:

    --- mysql-test/include/ctype_utf8mb4.inc 2010-03-05 08:17:19 +0000
    +++ mysql-test/include/ctype_utf8mb4.inc    2013-09-26 14:02:17 +0000
    @@ -234,15 +234,15 @@ set names utf8mb4;
      set names utf8mb4;

      # This should return TRUE
    -select  'вася'  rlike '[[:<:]]вася[[:>:]]';
    -select  'вася ' rlike '[[:<:]]вася[[:>:]]';
    -select ' вася'  rlike '[[:<:]]вася[[:>:]]';
    -select ' вася ' rlike '[[:<:]]вася[[:>:]]';
    +select  'вася'  rlike '\\bвася\\b';
    +select  'вася ' rlike '\\bвася\\b';
    +select ' вася'  rlike '\\bвася\\b';
    +select ' вася ' rlike '\\bвася\\b';

Is above unsupported pattern?

The above is a non-standard pattern which was supported by
the Henry Spencer regex library which we bundle in the /regex
directory.

This is what regex/regex.7 says:

There are two  special  cases|.-  of  bracket  expressions:  the  bracket
expressions `[[:<:]]' and `[[:>:]]' match the null string at the begin-
ning and end of a word respectively.  A word is defined as  a  sequence
of word characters which is neither preceded nor followed by word char-
acters.  A  word  character  is  an  alnum  character  (as  defined  by
ctype(3))  or an underscore.  This is an extension, compatible with but
not specified by POSIX 1003.2, and should be used with caution in soft-
ware intended to be portable to other systems.

I added this test a few years ago into MySQL after reading regex.7.
(it was not a part of any bug report).

This incompatibility should not be a big problem.



    === modified file 'sql/mysqld.cc'
    --- sql/mysqld.cc    2013-09-18 11:07:31 +0000
    +++ sql/mysqld.cc    2013-09-26 14:02:17 +0000
    @@ -1898,7 +1898,7 @@ void clean_up(bool print_message)
        delete global_rpl_filter;
        end_ssl();
        vio_end();
    -  my_regex_end();
    +  //my_regex_end();
      #if defined(ENABLED_DEBUG_SYNC)
        /* End the debug sync facility. See debug_sync.cc. */
        debug_sync_end();
    @@ -3904,10 +3904,10 @@ static int init_common_variables()
          return 1;
        item_init();
      #ifndef EMBEDDED_LIBRARY
    -  my_regex_init(&my_charset_latin1, check_enough_stack_size);
    +  //my_regex_init(&my_charset_latin1, check_enough_stack_size);
        my_string_stack_guard= check_enough_stack_size;
      #else
    -  my_regex_init(&my_charset_latin1, NULL);
    +  //my_regex_init(&my_charset_latin1, NULL);
      #endif
        /*
          Process a comma-separated character set list and choose

Remove it please (I think it was just forgotten).


Done. Thanks for noticing this.


As discussed on IRC, there is still one issue left
related to the crash in pcre_compile() when using a recursive
pattern with a lot of nested parenthesizes, like:

SELECT 'x' RLIKE CONCAT(
  REPEAT('(',300),
  'x',
  REPEAT(')',300));


I.e. when the pattern is like '((((((x))))))' but with
more nested levels. Crash happens because pcre_compile()
goes recursively and eats up all available stack.


I'm currently discussing this problem on email with Philip Hazel
(the author of PCRE). If he does not have quick ideas how
to fix this, we'll just use the same trick that we used
with the old regex library.
See  my_regex_enough_mem_in_stack in regex/regcomp.c.


I'll tell you what we ended up with.

Greetings.


References