← Back to team overview

maria-developers team mailing list archive

Re: mdev6027 RLIKE: "." no longer matching new line (default_regex_flags)

 

Hi Jan, Sergei,


On 04/23/2014 11:25 AM, Sergei Golubchik wrote:
Hi, Alexander!

On Apr 22, Sergei Golubchik wrote:
On Apr 17, Alexander Barkov wrote:
Hello Serg,

Please review a patch implementing a new system variable
default_regex_flags, to address the remaining incompatibilities
between PCRE and the old regex library.

Ah, something else.
Please, make sure this new variable is documented.

Yeah, I just finished writing a description for Jan :)

Regards,
Sergei


Jan, can you please update the manual?




A new system variable default_regexp_flags was added,
to set the default behaviour of the PCRE regex engine.

Scope: global, session.

Affected functions and operators: RLIKE, REGEXP_SUBSTR, REGEXP_REPLACE.

Possible values: any combination of zero or more of the following options, comma separated:

DOTALL
DUPNAMES
EXTENDED
EXTRA
MULTILINE
UNGREEDY

Default value: empty (all options are off).

Example:

SET default_regex_flags='';
SET default_regex_flags='DOTALL';
SET default_regex_flags='DOTALL,DUPNAMES,EXTENDED,EXTRA,MULTILINE,UNGREEDY';


The meaning of the values:

Value       Pattern equivalent  Meaning
---------   ------------------  -------
DOTALL      (?s)                . matches anything including NL
DUPNAMES    (?J)                Allow duplicate names for subpatterns
EXTENDED    (?x)                Ignore white space and # comments
EXTRA (?X) extra features (e.g. error on unknown escape character)
MULTILINE   (?m)                ^ and $ match newlines within data
UNGREEDY    (?U)                Invert greediness of quantifiers

See here for the list of the equivalent PCRE options:
https://mariadb.com/kb/en/pcre-regular-expressions/#option-setting


Examples:

# The default behaviour (multiline match is off)

mysql> SELECT 'a\nb\nc' RLIKE '^b$';
+---------------------------+
| '(?m)a\nb\nc' RLIKE '^b$' |
+---------------------------+
|                         0 |
+---------------------------+

# Enabling the multiline option using the PCRE option syntax:

mysql> SELECT 'a\nb\nc' RLIKE '(?m)^b$';
+---------------------------+
| 'a\nb\nc' RLIKE '(?m)^b$' |
+---------------------------+
|                         1 |
+---------------------------+


# Enabling the miltiline option using default_regex_flags

mysql> SET default_regex_flags='MULTILINE';
mysql> SELECT 'a\nb\nc' RLIKE '^b$';
+-----------------------+
| 'a\nb\nc' RLIKE '^b$' |
+-----------------------+
|                     1 |
+-----------------------+


The goal of the new variable is to simplify writing PCRE patterns,
as well as to have a way to configure the default behaviour of the PCRE
engine in a more compatible way with the old regex engine used in
MariaDB-5.5 and MySQL.

Note, unlike the old regex engine, dot (.) does not match a
new line character in PCRE by default. Those who need a better
compatibility with the old regex engine might consider adding this
command into /etc/my.cnf:

[mysqld]
default-regex-flags=DOTALL



Thanks.


Follow ups

References