← Back to team overview

zorba-coders team mailing list archive

[Merge] lp:~zorba-coders/zorba/bug-1123161 into lp:zorba

 

Paul J. Lucas has proposed merging lp:~zorba-coders/zorba/bug-1123161 into lp:zorba.

Commit message:
Fixed bugs.

Requested reviews:
  Paul J. Lucas (paul-lucas)
Related bugs:
  Bug #1123161 in Zorba: "FOTS: fn:matches failures (at least 109 failures)"
  https://bugs.launchpad.net/zorba/+bug/1123161

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/bug-1123161/+merge/148773

Fixed bugs.
-- 
https://code.launchpad.net/~zorba-coders/zorba/bug-1123161/+merge/148773
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'src/util/regex.cpp'
--- src/util/regex.cpp	2012-10-08 12:09:36 +0000
+++ src/util/regex.cpp	2013-02-15 17:42:25 +0000
@@ -214,10 +214,26 @@
         // back-reference is preceded by NN or more unescaped opening
         // parentheses.
         //
-        if ( cap_sub.size() > 9 && ascii::is_digit( *xq_c ) )
-          backref_no = backref_no * 10 + (*xq_c - '0');
-        else
+        bool prevent_multidigit_backref = false;
+        if ( ascii::is_digit( *xq_c ) ) {
+          if ( cap_sub.size() > 9 )
+            backref_no = backref_no * 10 + (*xq_c - '0');
+          else {
+            in_backref = false;
+            //
+            // Unlike XQuery, ICU always takes further digits to be part of the
+            // backreference so we have to prevent ICU from doing that.  One
+            // way to do that is by enclosing said digits in a single-character
+            // character class, i.e., [N].
+            //
+            *icu_re += '[';
+            *icu_re += *xq_c;
+            *icu_re += ']';
+            prevent_multidigit_backref = true;
+          }
+        } else
           in_backref = false;
+
         //
         // XQuery 3.0 F&O 5.6.1: The regular expression is invalid if a back-
         // reference refers to a subexpression that does not exist or whose
@@ -231,7 +247,11 @@
           throw INVALID_RE_EXCEPTION(
             xq_re, ZED( NonClosedBackRef_3 ), backref_no
           );
+
+        if ( prevent_multidigit_backref )
+          continue;
       }
+
       switch ( *xq_c ) {
         case '\\':
           got_backslash = true;
@@ -324,6 +344,11 @@
     *icu_re += *xq_c;
   } // FOR_EACH
 
+  if ( got_backslash )
+    throw INVALID_RE_EXCEPTION( xq_re, ZED( TrailingChar_3 ), '\\' );
+  if ( in_char_class )
+    throw INVALID_RE_EXCEPTION( xq_re, ZED( UnbalancedChar_3 ), '[' );
+
   if ( !q_flag ) {
     if ( i_flag ) {
       //
@@ -375,9 +400,7 @@
       icu_error_key = ZED_PREFIX;
       icu_error_key += u_errorName( status );
     }
-    throw XQUERY_EXCEPTION(
-      err::FORX0002, ERROR_PARAMS( pattern, icu_error_key )
-    );
+    throw INVALID_RE_EXCEPTION( pattern, icu_error_key );
   }
 }
 


Follow ups