zorba-coders team mailing list archive
-
zorba-coders team
-
Mailing list archive
-
Message #07440
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
Rodolfo Ochoa has proposed merging lp:~zorba-coders/zorba/no_unicode into lp:zorba.
Requested reviews:
Markos Zaharioudakis (markos-za)
Matthias Brantner (matthias-brantner)
For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/no_unicode/+merge/101588
"No Unicode" is now "No ICU."
Added a a q-flag fix for an undiscovered bug.
--
https://code.launchpad.net/~zorba-coders/zorba/no_unicode/+merge/101588
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'CMakeConfiguration.txt'
--- CMakeConfiguration.txt 2012-03-28 05:19:57 +0000
+++ CMakeConfiguration.txt 2012-04-11 15:45:21 +0000
@@ -135,14 +135,14 @@
SET (ZORBA_DEBUG_STRING ${ZORBA_DEBUG_STRING} CACHE BOOL "debug strings")
MESSAGE (STATUS "ZORBA_DEBUG_STRING: " ${ZORBA_DEBUG_STRING})
-SET(ZORBA_NO_UNICODE OFF CACHE BOOL "disable ICU")
-MESSAGE(STATUS "ZORBA_NO_UNICODE: " ${ZORBA_NO_UNICODE})
+SET(ZORBA_NO_ICU OFF CACHE BOOL "disable ICU")
+MESSAGE(STATUS "ZORBA_NO_ICU: " ${ZORBA_NO_ICU})
-IF (ZORBA_NO_UNICODE)
+IF (ZORBA_NO_ICU)
SET (no_full_text ON)
-ELSE (ZORBA_NO_UNICODE)
+ELSE (ZORBA_NO_ICU)
SET (no_full_text OFF)
-ENDIF (ZORBA_NO_UNICODE)
+ENDIF (ZORBA_NO_ICU)
SET (ZORBA_NO_FULL_TEXT ${no_full_text} CACHE BOOL "disable XQuery Full-Text support")
MESSAGE(STATUS "ZORBA_NO_FULL_TEXT: " ${ZORBA_NO_FULL_TEXT})
=== modified file 'CMakeLists.txt'
--- CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -123,10 +123,14 @@
CHECK_TYPE_SIZE("int64_t" ZORBA_HAVE_INT64_T)
CHECK_CXX_SOURCE_COMPILES ("#include <type_traits>\nint main() { std::enable_if<true,int> x; }" ZORBA_CXX_ENABLE_IF)
-CHECK_CXX_SOURCE_COMPILES ("int main() { int *p = nullptr; }" ZORBA_CXX_NULLPTR)
-CHECK_CXX_SOURCE_COMPILES ("int main() { static_assert(1,\"\"); }" ZORBA_CXX_STATIC_ASSERT)
+SET(CMAKE_EXTRA_INCLUDE_FILES wchar.h)
+CHECK_TYPE_SIZE("wchar_t" ZORBA_SIZEOF_WCHAR_T)
+SET(CMAKE_EXTRA_INCLUDE_FILES)
CHECK_CXX_SOURCE_COMPILES ("#include <memory>\nint main() { std::unique_ptr<int> p; }" ZORBA_CXX_UNIQUE_PTR)
+CHECK_CXX_SOURCE_COMPILES("int main() { int *p = nullptr; }" ZORBA_CXX_NULLPTR)
+CHECK_CXX_SOURCE_COMPILES("int main() { static_assert(1,\"\"); }" ZORBA_CXX_STATIC_ASSERT)
+
################################################################################
# Various cmake macros
=== modified file 'ChangeLog'
--- ChangeLog 2012-04-10 15:41:57 +0000
+++ ChangeLog 2012-04-11 15:45:21 +0000
@@ -4,6 +4,7 @@
New Features:
* Extended API for Python, Java, PHP and Ruby.
+ * Added support for NO_ICU (to not use ICU for unicode processing)
Optimization:
@@ -152,7 +153,9 @@
* Fixed bug when parsing a document with a base-uri attribute.
* Fixed bug #863320 (Sentence is incorrectly incremented when token characters end without sentence terminator)
* Fixed bug #863730 (static delete-node* functions don't raise ZDDY0012)
+ * Implemented the probe-index-range-value for general indexes
* Removed ZSTR0005 and ZSTR0006 error codes
+ * Fixed bug #867662 ("nullptr" warning)
* Fixed bug #868258 (Assertion failure with two delete collection)
* Fixed bug #871623 and #871629 (assertion failures with insertions in dynamic collections)
* Fixed bug #867262 (allow reuse of iterator over ExtFuncArgItemSequence)
@@ -161,6 +164,8 @@
* New node-reference module. References can be obtained for any node, and
different nodes cannot have the same identifier.
* Fixed bug #872697 (segmentation fault with validation of NMTOKENS)
+ * General index cannot be declared as unique if the type of its key is
+ xs:anyAtomicType or xs:untypedAtomic.
* Added undo for node revalidation
* Optimization for count(collection()) expressions
* Fixed bug #872796 (validate-in-place can interfere with other update primitives)
@@ -179,6 +184,8 @@
* Fixed bug #855715 (Invalid escaped characters in regex not caught)
* Fixed bug #862089 (Split binary/xq install directories for modules) by
splitting "module path" into separate URI and Library paths
+ * New node-position module. This module allows to obtain a representation of a node position, which
+ can be used to assess structural relationships with other nodes.
* Fixed bug #872502 (validation of the JSON module xqdoc fails)
* Fixed bug #897619 (testdriver_mt can not run the XQueryX tests)
* Fixed bug #867107 (xqdoc dependency to zorba is wrong)
=== modified file 'KNOWN_ISSUES.txt'
--- KNOWN_ISSUES.txt 2012-03-28 05:19:57 +0000
+++ KNOWN_ISSUES.txt 2012-04-11 15:45:21 +0000
@@ -37,7 +37,7 @@
* The serializer currently doesn't implement character maps as specified
(http://www.w3.org/TR/xslt-xquery-serialization/#character-maps)
-* In the 2.0 release, setting the CMake variables ZORBA_NO_UNICODE to
+* In the 2.0 release, setting the CMake variables ZORBA_NO_ICU to
ON is not supported.
* The PHP language binding is not supported on Mac OS X. For details,
=== modified file 'doc/cxx/examples/context.cpp'
--- doc/cxx/examples/context.cpp 2012-03-28 05:19:57 +0000
+++ doc/cxx/examples/context.cpp 2012-04-11 15:45:21 +0000
@@ -149,7 +149,11 @@
outStream2 << lQuery << std::endl;
std::cout << outStream2.str() << std::endl;
+#ifndef ZORBA_NO_ICU
if (outStream2.str() != "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\nBook 1.1\n")
+#else
+ if (outStream2.str() != "<?xml version=\"1.0\"?>\nBook 1.1\n")
+#endif /* ZORBA_NO_ICU */
{
std::cerr << "Test 4 failed with a wrong result : " << std::endl
<< outStream2.str() << std::endl;
=== modified file 'include/zorba/config.h.cmake'
--- include/zorba/config.h.cmake 2012-03-28 05:19:57 +0000
+++ include/zorba/config.h.cmake 2012-04-11 15:45:21 +0000
@@ -96,6 +96,8 @@
typedef __int64 int64_t;
#endif /* ZORBA_HAVE_INT64_T */
+#cmakedefine ZORBA_SIZEOF_WCHAR_T @ZORBA_SIZEOF_WCHAR_T@
+
// Compiler
#cmakedefine CLANG
#cmakedefine MSVC
@@ -148,7 +150,7 @@
// Zorba features
#cmakedefine ZORBA_NO_FULL_TEXT
-#cmakedefine ZORBA_NO_UNICODE
+#cmakedefine ZORBA_NO_ICU
#cmakedefine ZORBA_NO_XMLSCHEMA
#cmakedefine ZORBA_NUMERIC_OPTIMIZATION
#cmakedefine ZORBA_VERIFY_PEER_SSL_CERTIFICATE
=== modified file 'include/zorba/static_context.h'
--- include/zorba/static_context.h 2012-03-28 05:19:57 +0000
+++ include/zorba/static_context.h 2012-04-11 15:45:21 +0000
@@ -26,9 +26,13 @@
#include <zorba/function.h>
#include <zorba/annotation.h>
#include <zorba/smart_ptr.h>
+#include <zorba/smart_ptr.h>
#ifndef ZORBA_NO_FULL_TEXT
#include <zorba/thesaurus.h>
#endif /* ZORBA_NO_FULL_TEXT */
+#include <zorba/zorba.h>
+#include <zorba/store_manager.h>
+#include <zorba/zorba_exception.h>
namespace zorba {
=== modified file 'include/zorba/util/time.h'
--- include/zorba/util/time.h 2012-03-28 05:19:57 +0000
+++ include/zorba/util/time.h 2012-04-11 15:45:21 +0000
@@ -178,7 +178,7 @@
inline long get_walltime_in_millis(const walltime& t)
{
- return t.time * 1000 + t.millitm;
+ return (long)(t.time * 1000 + t.millitm);
}
#else /* not Windows, and no clock_gettime() */
=== modified file 'src/CMakeLists.txt'
--- src/CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ src/CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -59,7 +59,10 @@
#
# Next, add the files to be compiled into the library
#
+
+MESSAGE(STATUS "PRECOMPILED HEADERS: " ${ZORBA_PRECOMPILED_HEADERS})
SET(ZORBA_PRECOMPILED_HEADERS OFF CACHE BOOL "Activate Zorba precompiled headers.")
+MESSAGE(STATUS "PRECOMPILED HEADERS: " ${ZORBA_PRECOMPILED_HEADERS})
SET(ZORBA_SRCS)
ADD_SRC_SUBFOLDER(ZORBA_SRCS api API_SRCS)
@@ -97,6 +100,7 @@
ENDIF(ZORBA_WITH_DEBUGGER)
ADD_SRC_SUBFOLDER(ZORBA_SRCS unit_tests UNIT_TEST_SRCS)
+MESSAGE(STATUS "PRECOMPILED HEADERS: " ${ZORBA_PRECOMPILED_HEADERS})
IF(ZORBA_PRECOMPILED_HEADERS)
ADD_SRC_SUBFOLDER(ZORBA_SRCS precompiled ZORBAMISC_SRCS)
INCLUDE_DIRECTORIES("${CMAKE_SOURCE_DIR}/src/precompiled")
=== modified file 'src/api/serialization/serializer.cpp'
--- src/api/serialization/serializer.cpp 2012-03-28 05:19:57 +0000
+++ src/api/serialization/serializer.cpp 2012-04-11 15:45:21 +0000
@@ -180,7 +180,6 @@
for (; chars < chars_end; chars++ )
{
-#ifndef ZORBA_NO_UNICODE
// the input string is UTF-8
int char_length = utf8::char_length(*chars);
if (char_length == 0)
@@ -217,7 +216,6 @@
continue;
}
-#endif//ZORBA_NO_UNICODE
// raise an error iff (1) the serialization format is XML 1.0 and (2) the given character is an invalid XML 1.0 character
if (ser && ser->method == PARAMETER_VALUE_XML &&
@@ -332,14 +330,12 @@
{
tr << (char)0xEF << (char)0xBB << (char)0xBF;
}
-#ifndef ZORBA_NO_UNICODE
else if (ser->encoding == PARAMETER_VALUE_UTF_16)
{
// Little-endian
tr.verbatim((char)0xFF);
tr.verbatim((char)0xFE);
}
-#endif
}
}
@@ -862,13 +858,17 @@
emitter::emit_declaration();
if (ser->omit_xml_declaration == PARAMETER_VALUE_NO) {
- tr << "<?xml version=\"" << ser->version << "\" encoding=\"";
- if (ser->encoding == PARAMETER_VALUE_UTF_8) {
- tr << "UTF-8";
-#ifndef ZORBA_NO_UNICODE
- } else if (ser->encoding == PARAMETER_VALUE_UTF_16) {
- tr << "UTF-16";
-#endif
+ tr << "<?xml version=\"" << ser->version;
+ switch (ser->encoding) {
+ case PARAMETER_VALUE_UTF_8:
+ case PARAMETER_VALUE_UTF_16:
+ tr << "\" encoding=\"";
+ switch (ser->encoding) {
+ case PARAMETER_VALUE_UTF_8 : tr << "UTF-8" ; break;
+ case PARAMETER_VALUE_UTF_16: tr << "UTF-16"; break;
+ default : ZORBA_ASSERT(false);
+ }
+ break;
}
tr << "\"";
@@ -1174,14 +1174,18 @@
}
tr << "<meta http-equiv=\"content-type\" content=\""
- << ser->media_type << "; charset=";
-
- if (ser->encoding == PARAMETER_VALUE_UTF_8)
- tr << "UTF-8";
-#ifndef ZORBA_NO_UNICODE
- else if (ser->encoding == PARAMETER_VALUE_UTF_16)
- tr << "UTF-16";
-#endif
+ << ser->media_type;
+ switch (ser->encoding) {
+ case PARAMETER_VALUE_UTF_8:
+ case PARAMETER_VALUE_UTF_16:
+ tr << "\" charset=\"";
+ switch (ser->encoding) {
+ case PARAMETER_VALUE_UTF_8 : tr << "UTF-8" ; break;
+ case PARAMETER_VALUE_UTF_16: tr << "UTF-16"; break;
+ default : ZORBA_ASSERT(false);
+ }
+ break;
+ }
tr << "\"";
// closed_parent_tag = 1;
}
@@ -1371,14 +1375,18 @@
}
tr << "<meta http-equiv=\"content-type\" content=\""
- << ser->media_type << "; charset=";
-
- if (ser->encoding == PARAMETER_VALUE_UTF_8)
- tr << "UTF-8";
-#ifndef ZORBA_NO_UNICODE
- else if (ser->encoding == PARAMETER_VALUE_UTF_16)
- tr << "UTF-16";
-#endif
+ << ser->media_type;
+ switch (ser->encoding) {
+ case PARAMETER_VALUE_UTF_8:
+ case PARAMETER_VALUE_UTF_16:
+ tr << "\" charset=\"";
+ switch (ser->encoding) {
+ case PARAMETER_VALUE_UTF_8 : tr << "UTF-8" ; break;
+ case PARAMETER_VALUE_UTF_16: tr << "UTF-16"; break;
+ default : ZORBA_ASSERT(false);
+ }
+ break;
+ }
tr << "\"/";
//closed_parent_tag = 1;
}
@@ -2098,10 +2106,8 @@
{
if (!strcmp(aValue, "UTF-8"))
encoding = PARAMETER_VALUE_UTF_8;
-#ifndef ZORBA_NO_UNICODE
else if (!strcmp(aValue, "UTF-16"))
encoding = PARAMETER_VALUE_UTF_16;
-#endif
else
throw XQUERY_EXCEPTION(
err::SEPM0016, ERROR_PARAMS( aValue, aName, ZED( GoodValuesAreUTF8 ) )
@@ -2210,16 +2216,13 @@
{
tr = new transcoder(os, false);
}
-#ifndef ZORBA_NO_UNICODE
else if (encoding == PARAMETER_VALUE_UTF_16)
{
tr = new transcoder(os, true);
}
-#endif
else
{
- ZORBA_ASSERT(0);
- return false;
+ ZORBA_ASSERT(false);
}
if (method == PARAMETER_VALUE_XML)
=== modified file 'src/api/serialization/serializer.h'
--- src/api/serialization/serializer.h 2012-03-28 05:19:57 +0000
+++ src/api/serialization/serializer.h 2012-04-11 15:45:21 +0000
@@ -70,10 +70,8 @@
PARAMETER_VALUE_TEXT,
PARAMETER_VALUE_BINARY,
- PARAMETER_VALUE_UTF_8
-#ifndef ZORBA_NO_UNICODE
- ,PARAMETER_VALUE_UTF_16
-#endif
+ PARAMETER_VALUE_UTF_8,
+ PARAMETER_VALUE_UTF_16
} PARAMETER_VALUE_TYPE;
protected:
=== modified file 'src/diagnostics/diagnostic_en.xml'
--- src/diagnostics/diagnostic_en.xml 2012-03-28 05:19:57 +0000
+++ src/diagnostics/diagnostic_en.xml 2012-04-11 15:45:21 +0000
@@ -2517,11 +2517,11 @@
<value>attribute node</value>
</entry>
- <entry key="BackRef0Illegal">
+ <entry key="BackRef0Illegal" if="!defined(ZORBA_NO_ICU)">
<value>"0": illegal backreference</value>
</entry>
- <entry key="BackRefIllegalInCharClass">
+ <entry key="BackRefIllegalInCharClass" if="!defined(ZORBA_NO_ICU)">
<value>backreference illegal in character class</value>
</entry>
@@ -2569,7 +2569,7 @@
<value>invalid library module</value>
</entry>
- <entry key="BadRegexEscape_3">
+ <entry key="BadRegexEscape_3" if="!defined(ZORBA_NO_ICU)">
<value>"$3": illegal escape character</value>
</entry>
@@ -3029,7 +3029,7 @@
<value>nodeid component too big for encoding</value>
</entry>
- <entry key="NonClosedBackRef_3">
+ <entry key="NonClosedBackRef_3" if="!defined(ZORBA_NO_ICU)">
<value>'$$3': non-closed backreference</value>
</entry>
@@ -3041,7 +3041,7 @@
<value>non-localhost authority</value>
</entry>
- <entry key="NonexistentBackRef_3">
+ <entry key="NonexistentBackRef_3" if="!defined(ZORBA_NO_ICU)">
<value>'$$3': non-existent backreference</value>
</entry>
@@ -3193,94 +3193,183 @@
<value>item type is not a subtype of "$3"</value>
</entry>
- <entry key="U_REGEX_BAD_ESCAPE_SEQUENCE" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_BAD_ESCAPE_SEQUENCE" if="!defined(ZORBA_NO_ICU)">
<value>unrecognized backslash escape sequence</value>
</entry>
- <entry key="U_REGEX_BAD_INTERVAL" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_BAD_INTERVAL" if="!defined(ZORBA_NO_ICU)">
<value>error in {min,max} interval</value>
</entry>
- <entry key="U_REGEX_INTERNAL_ERROR" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_INTERNAL_ERROR" if="!defined(ZORBA_NO_ICU)">
<value>an internal ICU error (bug) was detected</value>
</entry>
- <entry key="U_REGEX_INVALID_BACK_REF" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_INVALID_BACK_REF" if="!defined(ZORBA_NO_ICU)">
<value>backreference to a non-existent capture group</value>
</entry>
- <entry key="U_REGEX_INVALID_FLAG" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_INVALID_FLAG" if="!defined(ZORBA_NO_ICU)">
<value>invalid value for match mode flags</value>
</entry>
- <entry key="U_REGEX_INVALID_RANGE" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_INVALID_RANGE" if="!defined(ZORBA_NO_ICU)">
<value>in character range [x-y], x is greater than y</value>
</entry>
- <entry key="U_REGEX_INVALID_STATE" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_INVALID_STATE" if="!defined(ZORBA_NO_ICU)">
<value>RegexMatcher in invalid state for requested operation</value>
</entry>
- <entry key="U_REGEX_LOOK_BEHIND_LIMIT" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_LOOK_BEHIND_LIMIT" if="!defined(ZORBA_NO_ICU)">
<value>look-behind pattern matches must have a bounded maximum length</value>
</entry>
- <entry key="U_REGEX_MAX_LT_MIN" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_MAX_LT_MIN" if="!defined(ZORBA_NO_ICU)">
<value>in {min,max}, max is less than min</value>
</entry>
- <entry key="U_REGEX_MISMATCHED_PAREN" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_MISMATCHED_PAREN" if="!defined(ZORBA_NO_ICU)">
<value>incorrectly nested parentheses</value>
</entry>
- <entry key="U_REGEX_MISSING_CLOSE_BRACKET" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_MISSING_CLOSE_BRACKET" if="!defined(ZORBA_NO_ICU)">
<value>missing ']'</value>
</entry>
- <entry key="U_REGEX_NUMBER_TOO_BIG" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_NUMBER_TOO_BIG" if="!defined(ZORBA_NO_ICU)">
<value>decimal number is too large</value>
</entry>
- <entry key="U_REGEX_OCTAL_TOO_BIG" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_OCTAL_TOO_BIG" if="!defined(ZORBA_NO_ICU)">
<value>octal character constants must be <= 0377</value>
</entry>
- <entry key="U_REGEX_PROPERTY_SYNTAX" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_PROPERTY_SYNTAX" if="!defined(ZORBA_NO_ICU)">
<value>incorrect Unicode property</value>
</entry>
- <entry key="U_REGEX_RULE_SYNTAX" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_RULE_SYNTAX" if="!defined(ZORBA_NO_ICU)">
<value>syntax error</value>
</entry>
- <entry key="U_REGEX_SET_CONTAINS_STRING" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_SET_CONTAINS_STRING" if="!defined(ZORBA_NO_ICU)">
<value>can not have UnicodeSets containing strings</value>
</entry>
- <entry key="U_REGEX_STACK_OVERFLOW" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_STACK_OVERFLOW" if="!defined(ZORBA_NO_ICU)">
<value>backtrack stack overflow</value>
</entry>
- <entry key="U_REGEX_STOPPED_BY_CALLER" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_STOPPED_BY_CALLER" if="!defined(ZORBA_NO_ICU)">
<value>matching operation aborted by user callback fn</value>
</entry>
- <entry key="U_REGEX_TIME_OUT" if="!defined(ZORBA_NO_UNICODE)">
+ <entry key="U_REGEX_TIME_OUT" if="!defined(ZORBA_NO_ICU)">
<value>maximum allowed match time exceeded</value>
</entry>
- <entry key="U_REGEX_UNIMPLEMENTED" if="!defined(ZORBA_NO_UNICODE)">
- <value>use of regular expression feature that is not yet implemented</value>
+ <entry key="U_REGEX_UNIMPLEMENTED" if="!defined(ZORBA_NO_ICU)">
+ <value>use of regular expression feature that is not yet implemented</value>
+ </entry>
+
+ <!-- Regex Ascii error messages-->
+ <entry key="REGEX_UNIMPLEMENTED" if="defined(ZORBA_NO_ICU)">
+ <value>use of regular expression feature that is not yet implemented</value>
+ </entry>
+
+ <entry key="REGEX_MISMATCHED_PAREN" if="defined(ZORBA_NO_ICU)">
+ <value>incorrectly nested parentheses</value>
+ </entry>
+
+ <entry key="REGEX_BROKEN_P_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>broken \\p construct</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PL_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{L?} category; supported categories: L, Lu, Ll, Lt, Lm, Lo</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PM_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{M?} category; supported categories: M, Mn, Mc, Me</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PN_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{N?} category; supported categories: N, Nd, Nl, No</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PP_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{P?} category; supported categories: P, Pc, Pd, Ps, Pe, Pi, Pf, Po</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PZ_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{Z?} category; supported categories: Z, Zs, Zl, Zp</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PS_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{S?} category; supported categories: S, Sm, Sc, Sk, So</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PC_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{C?} category; supported categories: C, Cc, Cf, Co, Cn(for not assigned)</value>
+ </entry>
+
+ <entry key="REGEX_BROKEN_PIs_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>broken \\p{Is} construct; valid characters are [a-zA-Z0-9-]</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_PIs_CONSTRUCT" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\p{Is} category block; see supported block escapes here: http://www.w3.org/TR/xmlschema-2/#charcter-classes</value>
+ </entry>
+
+ <entry key="REGEX_INVALID_UNICODE_CODEPOINT_u" if="defined(ZORBA_NO_ICU)">
+ <value>invalid unicode hex, should be in form \\uXXXX or \\UXXXXXXXX</value>
+ </entry>
+
+ <entry key="REGEX_UNKNOWN_ESC_CHAR" if="defined(ZORBA_NO_ICU)">
+ <value>unknown \\? escape char; supported escapes are: \\[nrt\\|.?*+(){}[]-^$] for char escapes, \\[pP] for categories and \\[sSiIcCdDwW] for multichar groups</value>
+ </entry>
+
+ <entry key="REGEX_INVALID_BACK_REF" if="defined(ZORBA_NO_ICU)">
+ <value>\\$3 backreference to a non-existent capture group ($4 groups so far)</value>
+ </entry>
+
+ <entry key="REGEX_INVALID_ATOM_CHAR" if="defined(ZORBA_NO_ICU)">
+ <value>'$3': invalid character for an atom; forbidden characters are: [{}?*+|^]</value>
+ </entry>
+
+ <entry key="REGEX_INVALID_SUBCLASS" if="defined(ZORBA_NO_ICU)">
+ <value>malformed class subtraction</value>
+ </entry>
+
+ <entry key="REGEX_INVALID_USE_OF_SUBCLASS" if="defined(ZORBA_NO_ICU)">
+ <value>improper use of class subtraction: it must be the last construct in a class group [xxx-[yyy]]</value>
+ </entry>
+
+ <entry key="REGEX_MULTICHAR_IN_CHAR_RANGE" if="defined(ZORBA_NO_ICU)">
+ <value>multichars or char categories cannot be part of a char range</value>
+ </entry>
+
+ <entry key="REGEX_MISSING_CLOSE_BRACKET" if="defined(ZORBA_NO_ICU)">
+ <value>missing ']' in character group</value>
+ </entry>
+
+ <entry key="REGEX_MAX_LT_MIN" if="defined(ZORBA_NO_ICU)">
+ <value>in {min,max}, max is less than min</value>
</entry>
<entry key="UnaryArithOp">
<value>unary arithmetic operator</value>
</entry>
- <entry key="UnbalancedChar_3">
+ <entry key="UnbalancedChar_3" if="!defined(ZORBA_NO_ICU)">
<value>missing '$3'</value>
</entry>
+ <entry key="UnescapedChar_3" if="!defined(ZORBA_NO_ICU)">
+ <value>character '$3' must be escaped here</value>
+ </entry>
+
<entry key="UnexpectedElement">
<value>unexpected element</value>
</entry>
=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
--- src/diagnostics/pregenerated/dict_en.cpp 2012-03-28 05:19:57 +0000
+++ src/diagnostics/pregenerated/dict_en.cpp 2012-04-11 15:45:21 +0000
@@ -437,8 +437,12 @@
{ "~AtomizationOfGroupByMakesMoreThanOneItem", "atomization of groupby variable produces more than one item" },
{ "~AttributeName", "attribute name" },
{ "~AttributeNode", "attribute node" },
+#if !defined(ZORBA_NO_ICU)
{ "~BackRef0Illegal", "\"0\": illegal backreference" },
+#endif
+#if !defined(ZORBA_NO_ICU)
{ "~BackRefIllegalInCharClass", "backreference illegal in character class" },
+#endif
{ "~BadAnyURI", "invalid xs:anyURI" },
{ "~BadArgTypeForFn_2o34o", "${\"2\": }invalid argument type for function $3()${: 4}" },
{ "~BadCharAfter_34", "'$3': illegal character after '$4'" },
@@ -451,7 +455,9 @@
{ "~BadIterator", "invalid iterator" },
{ "~BadLibraryModule", "invalid library module" },
{ "~BadPath", "invalid path" },
+#if !defined(ZORBA_NO_ICU)
{ "~BadRegexEscape_3", "\"$3\": illegal escape character" },
+#endif
{ "~BadStreamState", "bad I/O stream state" },
{ "~BadTokenInBraces_3", "\"$3\": illegal token within { }" },
{ "~BadTraceStream", "trace stream not retrievable using SerializationCallback" },
@@ -567,10 +573,14 @@
{ "~NoUntypedKeyNodeValue_2", "node with untyped key value found during probe on index \"$2\"" },
{ "~NodeIDNeedsBytes_2", "nodeid requires more than $2 bytes" },
{ "~NodeIDTooBig", "nodeid component too big for encoding" },
+#if !defined(ZORBA_NO_ICU)
{ "~NonClosedBackRef_3", "'$$3': non-closed backreference" },
+#endif
{ "~NonFileThesaurusURI", "non-file thesaurus URI" },
{ "~NonLocalhostAuthority", "non-localhost authority" },
+#if !defined(ZORBA_NO_ICU)
{ "~NonexistentBackRef_3", "'$$3': non-existent backreference" },
+#endif
{ "~NotAllowedForTypeName", "not allowed for typeName (use xsd:untyped instead)" },
{ "~NotAmongInScopeSchemaTypes", "not among in-scope schema types" },
{ "~NotDefInDynamicCtx", "not defined in dynamic context" },
@@ -589,6 +599,69 @@
{ "~ParserNoCreateTree", "XML tree creation failed" },
{ "~PromotionImpossible", "promotion not possible" },
{ "~QuotedColon_23", "\"$2\": $3" },
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_BROKEN_PIs_CONSTRUCT", "broken \\p{Is} construct; valid characters are [a-zA-Z0-9-]" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_BROKEN_P_CONSTRUCT", "broken \\p construct" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_INVALID_ATOM_CHAR", "'$3': invalid character for an atom; forbidden characters are: [{}?*+|^]" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_INVALID_BACK_REF", "\\$3 backreference to a non-existent capture group ($4 groups so far)" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_INVALID_SUBCLASS", "malformed class subtraction" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_INVALID_UNICODE_CODEPOINT_u", "invalid unicode hex, should be in form \\uXXXX or \\UXXXXXXXX" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_INVALID_USE_OF_SUBCLASS", "improper use of class subtraction: it must be the last construct in a class group [xxx-[yyy]]" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_MAX_LT_MIN", "in {min,max}, max is less than min" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_MISMATCHED_PAREN", "incorrectly nested parentheses" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_MISSING_CLOSE_BRACKET", "missing ']' in character group" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_MULTICHAR_IN_CHAR_RANGE", "multichars or char categories cannot be part of a char range" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNIMPLEMENTED", "use of regular expression feature that is not yet implemented" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_ESC_CHAR", "unknown \\? escape char; supported escapes are: \\[nrt\\|.?*+(){}[]-^$] for char escapes, \\[pP] for categories and \\[sSiIcCdDwW] for multichar groups" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PC_CONSTRUCT", "unknown \\p{C?} category; supported categories: C, Cc, Cf, Co, Cn(for not assigned)" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PIs_CONSTRUCT", "unknown \\p{Is} category block; see supported block escapes here: http://www.w3.org/TR/xmlschema-2/#charcter-classes" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PL_CONSTRUCT", "unknown \\p{L?} category; supported categories: L, Lu, Ll, Lt, Lm, Lo" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PM_CONSTRUCT", "unknown \\p{M?} category; supported categories: M, Mn, Mc, Me" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PN_CONSTRUCT", "unknown \\p{N?} category; supported categories: N, Nd, Nl, No" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PP_CONSTRUCT", "unknown \\p{P?} category; supported categories: P, Pc, Pd, Ps, Pe, Pi, Pf, Po" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PS_CONSTRUCT", "unknown \\p{S?} category; supported categories: S, Sm, Sc, Sk, So" },
+#endif
+#if defined(ZORBA_NO_ICU)
+ { "~REGEX_UNKNOWN_PZ_CONSTRUCT", "unknown \\p{Z?} category; supported categories: Z, Zs, Zl, Zp" },
+#endif
{ "~SEPM0009_Not10", "the version parameter has a value other than \"1.0\" and the doctype-system parameter is specified" },
{ "~SEPM0009_NotOmit", "the standalone attribute has a value other than \"omit\"" },
{ "~SchemaAttributeName", "schema-attribute name" },
@@ -610,68 +683,73 @@
{ "~TwoDecimalFormatsSameName_2", "\"$2\": two decimal formats with this name" },
{ "~TwoDefaultDecimalFormats", "two default decimal formats" },
{ "~TypeIsNotSubtype", "item type is not a subtype of \"$3\"" },
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_BAD_ESCAPE_SEQUENCE", "unrecognized backslash escape sequence" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_BAD_INTERVAL", "error in {min,max} interval" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_INTERNAL_ERROR", "an internal ICU error (bug) was detected" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_INVALID_BACK_REF", "backreference to a non-existent capture group" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_INVALID_FLAG", "invalid value for match mode flags" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_INVALID_RANGE", "in character range [x-y], x is greater than y" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_INVALID_STATE", "RegexMatcher in invalid state for requested operation" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_LOOK_BEHIND_LIMIT", "look-behind pattern matches must have a bounded maximum length" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_MAX_LT_MIN", "in {min,max}, max is less than min" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_MISMATCHED_PAREN", "incorrectly nested parentheses" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_MISSING_CLOSE_BRACKET", "missing ']'" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_NUMBER_TOO_BIG", "decimal number is too large" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_OCTAL_TOO_BIG", "octal character constants must be <= 0377" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_PROPERTY_SYNTAX", "incorrect Unicode property" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_RULE_SYNTAX", "syntax error" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_SET_CONTAINS_STRING", "can not have UnicodeSets containing strings" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_STACK_OVERFLOW", "backtrack stack overflow" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_STOPPED_BY_CALLER", "matching operation aborted by user callback fn" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_TIME_OUT", "maximum allowed match time exceeded" },
#endif
-#if !defined(ZORBA_NO_UNICODE)
+#if !defined(ZORBA_NO_ICU)
{ "~U_REGEX_UNIMPLEMENTED", "use of regular expression feature that is not yet implemented" },
#endif
{ "~UnaryArithOp", "unary arithmetic operator" },
+#if !defined(ZORBA_NO_ICU)
{ "~UnbalancedChar_3", "missing '$3'" },
+#endif
+#if !defined(ZORBA_NO_ICU)
+ { "~UnescapedChar_3", "character '$3' must be escaped here" },
+#endif
{ "~UnexpectedElement", "unexpected element" },
{ "~VarValMustBeSingleItem_2", "\"$2\": variable value must be single item" },
{ "~Variable", "variable" },
=== modified file 'src/precompiled/stdafx.h'
--- src/precompiled/stdafx.h 2012-03-28 05:19:57 +0000
+++ src/precompiled/stdafx.h 2012-04-11 15:45:21 +0000
@@ -15,363 +15,81 @@
*/
-#if defined STDAFX
-#include <iostream>
-#include <stdexcept>
-#include <cassert>
-#include <cstring>
-#include <memory>
-
-#include <sstream>
-#include <xfwrap>
-#include <xfwrap1>
-#include <istream>
-#include <cstdio>
-#include <xxshared>
-#include <crtdefs.h>
-#include <map>
-#include <set>
-//#include <poppack.h>
-//#include <xxtype_traits>
-//#include <xxcallwrap>
-
-// #include <xxcallpmf>
-// //#include <xxbind0>
-// //#include <xxbind1>
-// //#include <xxresult>
-// #include <zorba/audit.h>
-// #include "api/auditimpl.h"
-// #include <zorba/audit.h>
-
- //#include "unicode/unistr.h"
- #include "runtime/sequences/sequences.h"
- #include "diagnostics/xquery_diagnostics.h"
- #include "xercesc/util/xercesdefs.hpp"
- #include "runtime/collections/collections.h"
- #include "unicode/utypes.h"
- #include "zorba/config.h"
- #include "store/api/store.h"
- #include "zorba/zorba.h"
- #include "zorba/api_shared_types.h"
- #include "compiler/parsetree/parsenodes.h"
- #include "compiler/parser/parse_constants.h"
- //#include "compiler/api/compilercb.h"
- #include "zorbautils/checked_vector.h"
- #include "compiler/parser/xquery_driver.h"
- #include "util/sorter.h"
- #include "compiler/xqueryx/xqueryx_to_xquery.h"
-// #include "compiler/xqueryx/xqueryx_xslt.h"
-//#include "compiler/parser/xquery_scanner.h"
-//#include "compiler/parsetree/parsenode_base.h"
-//#include "compiler/parsetree/parsenode_visitor.h"
-// #include "runtime/core/flwor_iterator.h"
-// #include "context/static_context.h"
-// #include "zorbautils/fatal.h"
-// #include "runtime/base/unarybase.h"
-// #include "compiler/expression/expr_consts.h"
-// #include "api/iterator_singleton.h"
-// #include "runtime/visitors/printer_visitor_api.h"
-// //#include "compiler/parsetree/parsenode_print_dot_visitor.h"
-// //#include "compiler/parsetree/parsenode_print_dot_visitor.h"
-// //#include "runtime/visitors/planiter_visitor_impl_code.h"
-// //#include "runtime/visitors/planiter_visitor_impl_include.h"
-// //#include "runtime/visitors/printer_visitor_impl.h"
-// //#include "runtime/core/path.h"
-// #include "compiler/expression/ft_expr.h"
-// #include "compiler/expression/ftnode.h"
-// #include "compiler/parser/query_loc.h"
+#ifdef STDAFX
+
+ #include <fstream>
+ #include <iostream>
+ #include <stdexcept>
+ #include <cassert>
+ #include <cstring>
+ #include <memory>
+
+ #include <sstream>
+ #include <xfwrap>
+ #include <xfwrap1>
+ #include <istream>
+ #include <cstdio>
+ #include <xxshared>
+ #include <crtdefs.h>
+ #include <map>
+ #include <set>
+
+ #include "runtime/sequences/sequences.h"
+ #include "diagnostics/xquery_diagnostics.h"
+ #include "xercesc/util/xercesdefs.hpp"
+ #include "runtime/collections/collections.h"
+ #include "unicode/utypes.h"
+ #include "zorba/config.h"
+ #include "store/api/store.h"
+ #include "zorba/zorba.h"
+ #include "zorba/api_shared_types.h"
+ #include "compiler/parsetree/parsenodes.h"
+ #include "compiler/parser/parse_constants.h"
+ #include "zorbautils/checked_vector.h"
+ #include "compiler/parser/xquery_driver.h"
+ #include "util/sorter.h"
+ #include "compiler/xqueryx/xqueryx_to_xquery.h"
+ #include <zorba/store_manager.h>
+ #include <zorba/xquery.h>
+ #include <zorba/xquery_exception.h>
#include "util/cxx_util.h"
-// #include "util/indent.h"
-// #include "util/stl_util.h"
-// #include "diagnostics/xquery_diagnostics.h"
-// #include "zorbatypes/numconversions.h"
+ #include "diagnostics/assert.h"
+ #include "zorbatypes/mapm/m_apm_lc.h"
+ #include "zorbatypes/datetime/parse.h"
+ #include "zorbatypes/chartype.h"
+ #include "zorbatypes/collation_manager.h"
+ #include "zorbatypes/ft_token.h"
+ #include "zorbatypes/m_apm.h"
+ #include "zorbatypes/rclock.h"
+ #include "zorbatypes/schema_types.h"
+ #include "zorbatypes/timezone.h"
+ #include "zorbatypes/transcoder.h"
+ #include "zorbatypes/URI.h"
+ #include "zorbatypes/xerces_xmlcharray.h"
+ #include "zorbatypes/zorbatypes_decl.h"
+ #include "zorbatypes/zstring.h"
+ #include "zorbautils/condition.h"
+ #include "zorbautils/hashfun.h"
+ #include "zorbautils/hashmap.h"
+ #include "zorbautils/hashmap_itemp.h"
+ #include "zorbautils/hashmap_str_obj.h"
+ #include "zorbautils/hashmap_zstring.h"
+ #include "zorbautils/hashset.h"
+ #include "zorbautils/hashset_itemh.h"
+ #include "zorbautils/latch.h"
+ #include "zorbautils/locale.h"
+ #include "zorbautils/lock.h"
+ #include "zorbautils/mutex.h"
+ #include "zorbautils/runnable.h"
+ #include "zorbautils/SAXParser.h"
+ #include "zorbautils/stack.h"
+ #include "zorbautils/string_util.h"
+ #include "unit_tests/unit_test_list.h"
+ #include "zorba/diagnostic_handler.h"
+ #include "zorba/xquery_warning.h"
+ #include "runtime/full_text/ftcontains_visitor.h"
+ #include "store/api/ft_token_iterator.h"
+ #include "store/naive/ft_token_store.h"
-// #include "api/serialization/serializable.h"
-// #include "api/serialization/serializer.h"
-// #include "api/collectionimpl.h"
-// #include "api/dynamiccontextimpl.h"
-// #include "api/fileimpl.h"
-// #include "api/functionimpl.h"
-// #include "api/invoke_item_sequence.h"
-// #include "api/itemfactoryimpl.h"
-// #include "api/resultiteratorchainer.h"
-// #include "api/resultiteratorimpl.h"
-// #include "api/sax2impl.h"
-// #include "api/serializerimpl.h"
-// #include "api/staticcontextimpl.h"
-// #include "api/storeiteratorimpl.h"
-// #include "api/unmarshaller.h"
-// #include "api/uri_resolver_wrappers.h"
-// #include "api/vectoriterator.h"
-// #include "api/xmldatamanagerimpl.h"
-// //#include "api/xqueryimpl.h"
-// #include "api/zorbaimpl.h"
-// #include "capi/cdynamic_context.h"
-// #include "capi/cexpression.h"
-// #include "capi/cexternal_function.h"
-// #include "capi/cimplementation.h"
-// #include "capi/csequence.h"
-// #include "capi/cstatic_context.h"
-// #include "capi/error.h"
-// #include "capi/external_module.h"
-// #include "capi/single_item_sequence.h"
-// #include "capi/user_item_sequence.h"
-// #include "compiler/parser/flexlexer.h"
-// #include "compiler/parser/ft_types.h"
-// #include "compiler/parser/symbol_table.h"
-// #include "compiler/parser/xqdoc_comment.h"
-// #include "compiler/parsetree/parsenode_print_xml_visitor.h"
-// #include "compiler/parsetree/parsenode_print_xqdoc_visitor.h"
-// #include "compiler/parsetree/parsenode_print_xquery_visitor.h"
-// #include "compiler/parsetree/parsenode_xqdoc_visitor.h"
-// #include "compiler/translator/prolog_graph.h"
-// #include "compiler/translator/translator.h"
-// #include "compiler/codegen/plan_visitor.h"
-// #include "compiler/expression/abstract_expr_visitor.h"
-// #include "compiler/expression/expr.h"
-// #include "compiler/expression/expr_annotations.h"
-// #include "compiler/expression/expr_base.h"
-// #include "compiler/expression/expr_classes.h"
-// #include "compiler/expression/expr_iter.h"
-// #include "compiler/expression/expr_utils.h"
-// #include "compiler/expression/expr_visitor.h"
-// #include "compiler/expression/flwor_expr.h"
-// //#include "compiler/expression/fo_expr.h"
-// #include "compiler/expression/ftnode_classes.h"
-// #include "compiler/expression/ftnode_visitor.h"
-// #include "compiler/expression/function_item_expr.h"
-// #include "compiler/expression/path_expr.h"
-// #include "compiler/expression/script_exprs.h"
-// #include "compiler/expression/update_exprs.h"
-// #include "compiler/expression/var_expr.h"
-// #include "compiler/rewriter/framework/rewriter.h"
-// #include "compiler/rewriter/framework/rewriter_context.h"
-// #include "compiler/rewriter/framework/rule_driver.h"
-// #include "compiler/rewriter/framework/sequential_rewriter.h"
-// #include "compiler/rewriter/rewriters/common_rewriter.h"
-// #include "compiler/rewriter/rewriters/default_optimizer.h"
-// #include "compiler/rewriter/rewriters/phase1_rewriter.h"
-// #include "compiler/rewriter/rules/ruleset.h"
-// #include "compiler/rewriter/rules/rule_base.h"
-// #include "compiler/rewriter/rules/type_rules.h"
-// #include "compiler/rewriter/tools/dataflow_annotations.h"
-// #include "compiler/rewriter/tools/expr_tools.h"
-// #include "compiler/rewriter/tools/udf_graph.h"
-// #include "compiler/xqddf/collection_decl.h"
-// #include "compiler/xqddf/value_ic.h"
-// #include "compiler/xqddf/value_index.h"
-// #include "compiler/semantic_annotations/annotations.h"
-// #include "compiler/semantic_annotations/annotation_holder.h"
-// #include "compiler/semantic_annotations/annotation_keys.h"
-// #include "compiler/api/compiler_api.h"
-// #include "compiler/api/compiler_api_impl.h"
-// #include "system/globalenv.h"
-// #include "system/properties.h"
-// #include "system/zorba_properties.h"
-// #include "context/decimal_format.h"
-// #include "context/default_uri_mappers.h"
-// #include "context/default_url_resolvers.h"
-// #include "context/dynamic_context.h"
-// #include "context/dynamic_loader.h"
-// #include "context/internal_uri_resolvers.h"
-// //#include "context/namespace_context.h"
-// #include "context/root_static_context.h"
-// #include "context/sctx_map_iterator.h"
-// #include "context/standard_uri_resolvers.h"
-// #include "context/static_context_consts.h"
-// #include "context/stemmer_wrappers.h"
-// #include "context/uri_resolver.h"
-// #include "context/uri_resolver_wrapper.h"
-#include "diagnostics/assert.h"
-// #include "diagnostics/diagnostic.h"
-// #include "diagnostics/dict.h"
-// #include "diagnostics/dict_impl.h"
-// #include "diagnostics/StackWalker.h"
-// #include "diagnostics/user_error.h"
-// #include "diagnostics/user_exception.h"
-// #include "diagnostics/xquery_exception.h"
-// #include "diagnostics/xquery_stack_trace.h"
-// #include "diagnostics/xquery_warning.h"
-// #include "diagnostics/zorba_exception.h"
-// //#include "functions/annotation.h"
-// #include "functions/external_function.h"
-// #include "functions/function.h"
-// #include "functions/function_consts.h"
-// #include "functions/function_impl.h"
-// #include "functions/func_accessors_impl.h"
-// #include "functions/func_apply.h"
-// #include "functions/func_arithmetic.h"
-// #include "functions/func_booleans_impl.h"
-// #include "functions/func_durations_dates_times_impl.h"
-// #include "functions/func_enclosed.h"
-// #include "functions/func_eval.h"
-// #include "functions/func_hoist.h"
-// #include "functions/func_index_ddl.h"
-// #include "functions/func_node_sort_distinct.h"
-// #include "functions/func_numerics_impl.h"
-// #include "functions/func_reflection.h"
-// #include "functions/func_sequences_impl.h"
-// #include "functions/func_var_decl.h"
-// #include "functions/library.h"
-// #include "functions/signature.h"
-// #include "functions/udf.h"
-// #include "runtime/full_text/thesauri/decode_base128.h"
-// #include "runtime/full_text/thesauri/encoded_list.h"
-// #include "runtime/full_text/thesauri/iso2788.h"
-// #include "runtime/full_text/thesauri/wn_db_segment.h"
-// #include "runtime/full_text/thesauri/wn_synset.h"
-// #include "runtime/full_text/thesauri/wn_thesaurus.h"
-// #include "runtime/full_text/thesauri/wn_types.h"
-// #include "runtime/full_text/thesauri/xqftts_relationship.h"
-// #include "runtime/full_text/thesauri/xqftts_thesaurus.h"
-// #include "runtime/full_text/ft_match.h"
-// #include "runtime/full_text/ft_query_item.h"
-// #include "runtime/full_text/ft_single_token_iterator.h"
-// #include "runtime/full_text/ft_stop_words_set.h"
-// #include "runtime/full_text/ft_thesaurus.h"
-// #include "runtime/full_text/ft_token_matcher.h"
-// #include "runtime/full_text/ft_token_seq_iterator.h"
-// #include "runtime/full_text/ft_token_span.h"
-// #include "runtime/full_text/ft_wildcard.h"
-// #include "runtime/full_text/full_text.h"
-// #include "runtime/full_text/apply.h"
-// #include "runtime/full_text/ft_util.h"
-// #include "runtime/collections/collections_base.h"
-// #include "runtime/core/apply_updates.h"
-// #include "runtime/core/arithmetic_impl.h"
-// #include "runtime/core/constructors.h"
-// #include "runtime/core/fncall_iterator.h"
-// #include "runtime/core/internal_operators.h"
-// #include "runtime/core/item_iterator.h"
-// #include "runtime/core/nodeid_iterators.h"
-// #include "runtime/core/path_iterators.h"
-// #include "runtime/core/sequencetypes.h"
-// #include "runtime/core/trycatch.h"
-// #include "runtime/core/var_iterators.h"
-// #include "runtime/numerics/NumericsImpl.h"
-// #include "runtime/booleans/BooleanImpl.h"
-// #include "runtime/base/binarybase.h"
-// #include "runtime/base/narybase.h"
-// #include "runtime/base/noarybase.h"
-// #include "runtime/base/plan_iterator.h"
-// #include "runtime/sequences/SequencesImpl.h"
-// #include "runtime/visitors/iterprinter.h"
-// #include "runtime/misc/materialize.h"
-// #include "runtime/scripting/scripting.h"
-// #include "types/schema/EventSchemaValidator.h"
-// #include "types/schema/LoadSchemaErrorHandler.h"
-// #include "types/schema/PrintSchema.h"
-// #include "types/schema/revalidateUtils.h"
-// #include "types/schema/schema.h"
-// #include "types/schema/SchemaValidatorFilter.h"
-// #include "types/schema/StrX.h"
-// #include "types/schema/validate.h"
-// #include "types/schema/ValidationEventHandler.h"
-// #include "types/schema/xercesIncludes.h"
-// #include "types/schema/XercesParseUtils.h"
-// #include "types/schema/XercSchemaValidator.h"
-// #include "types/casting.h"
-// #include "types/collation.h"
-// #include "types/node_test.h"
-// #include "types/root_typemanager.h"
-// #include "types/typeconstants.h"
-// #include "types/typeimpl.h"
-// #include "types/typemanager.h"
-// #include "types/typemanagerimpl.h"
-// #include "types/typeops.h"
-// #include "util/fx/fxarray.h"
-// #include "util/fx/fxcharheap.h"
-// #include "util/ascii_util.h"
-// #include "util/atomic_int.h"
-// #include "util/auto_vector.h"
-// #include "util/curl_util.h"
-// #include "util/dir.h"
-// #include "util/dynamic_bitset.h"
-// #include "util/empty.h"
-// #include "util/error_util.h"
-// #include "util/fs_util.h"
-// #include "util/hashmap.h"
-// //#include "util/hashmap32.h"
-// #include "util/less.h"
-// #include "util/mmap_file.h"
-// #include "util/nonatomic_int.h"
-// #include "util/omanip.h"
-// #include "util/oseparator.h"
-// #include "util/regex.h"
-// #include "util/singleton.h"
-// #include "util/string_util.h"
-// #include "util/threads.h"
-// #include "util/tokenbuf.h"
-// #include "util/tracer.h"
-// #include "util/triple.h"
-// #include "util/unicode_categories.h"
-// #include "util/unicode_util.h"
-// #include "util/uri_util.h"
-// #include "util/utf8_string.h"
-// #include "util/utf8_util.h"
-// #include "util/utf8_util_base.h"
-// #include "util/void_int.h"
-// #include "util/xml_util.h"
-// #include "zorbamisc/config/platform.h"
-// //#include "zorbaserialization/archiver.h"
-// #include "zorbaserialization/base64impl.h"
-// #include "zorbaserialization/bin_archiver.h"
-// //#include "zorbaserialization/class_serializer.h"
-// #include "zorbaserialization/mem_archiver.h"
-// #include "zorbaserialization/serialization_engine.h"
-// #include "zorbaserialization/template_serializer.h"
-// #include "zorbaserialization/xml_archiver.h"
-// #include "zorbaserialization/zorba_class_serializer.h"
- #include "zorbatypes/mapm/m_apm_lc.h"
- #include "zorbatypes/datetime/parse.h"
- //#include "zorbatypes/binary.h"
- #include "zorbatypes/chartype.h"
- #include "zorbatypes/collation_manager.h"
- //#include "zorbatypes/datetime.h"
- //#include "zorbatypes/decimal.h"
- //#include "zorbatypes/duration.h"
- //#include "zorbatypes/floatimpl.h"
- #include "zorbatypes/ft_token.h"
- //#include "zorbatypes/integer.h"
- #include "zorbatypes/libicu.h"
- #include "zorbatypes/m_apm.h"
- //#include "zorbatypes/rchandle.h"
- #include "zorbatypes/rclock.h"
- //#include "zorbatypes/regex_ascii.h"
- #include "zorbatypes/schema_types.h"
- #include "zorbatypes/timezone.h"
- #include "zorbatypes/transcoder.h"
- #include "zorbatypes/URI.h"
- #include "zorbatypes/xerces_xmlcharray.h"
- #include "zorbatypes/zorbatypes_decl.h"
- #include "zorbatypes/zstring.h"
- //#include "zorbautils/stemmer/sb_stemmer.h"
- #include "zorbautils/condition.h"
- #include "zorbautils/hashfun.h"
- #include "zorbautils/hashmap.h"
- #include "zorbautils/hashmap_itemp.h"
- #include "zorbautils/hashmap_str_obj.h"
- #include "zorbautils/hashmap_zstring.h"
- #include "zorbautils/hashset.h"
- #include "zorbautils/hashset_itemh.h"
- //#include "zorbautils/icu_tokenizer.h"
- #include "zorbautils/latch.h"
- #include "zorbautils/locale.h"
- #include "zorbautils/lock.h"
- #include "zorbautils/mutex.h"
- #include "zorbautils/runnable.h"
- #include "zorbautils/SAXParser.h"
- #include "zorbautils/stack.h"
-// #include "zorbautils/stemmer.h"
- #include "zorbautils/string_util.h"
- //#include "zorbautils/synchronous_logger.h"
- //#include "zorbautils/tokenizer.h"
- #include "unit_tests/unit_test_list.h"
- #include "zorba/diagnostic_handler.h"
- #include "zorba/xquery_warning.h"
- #include "runtime/full_text/ftcontains_visitor.h"
- #include "store/naive/naive_ft_token_iterator.h"
- #include "store/api/ft_token_iterator.h"
- #include "store/naive/ft_token_store.h"
#endif
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/CMakeLists.txt'
--- src/runtime/full_text/CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ src/runtime/full_text/CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -42,11 +42,11 @@
default_tokenizer.cpp
)
-IF (ZORBA_NO_UNICODE)
+IF (ZORBA_NO_ICU)
LIST(APPEND FULLTEXT_SRCS latin_tokenizer.cpp)
-ELSE (ZORBA_NO_UNICODE)
+ELSE (ZORBA_NO_ICU)
LIST(APPEND FULLTEXT_SRCS icu_tokenizer.cpp)
-ENDIF (ZORBA_NO_UNICODE)
+ENDIF (ZORBA_NO_ICU)
ADD_SRC_SUBFOLDER(FULLTEXT_SRCS stemmer LIBSTEMMER_SRCS)
=== modified file 'src/runtime/full_text/default_tokenizer.cpp'
--- src/runtime/full_text/default_tokenizer.cpp 2012-03-28 05:19:57 +0000
+++ src/runtime/full_text/default_tokenizer.cpp 2012-04-11 15:45:21 +0000
@@ -19,22 +19,22 @@
#include <zorba/config.h>
#include "default_tokenizer.h"
-#ifdef ZORBA_NO_UNICODE
+#ifdef ZORBA_NO_ICU
# include "latin_tokenizer.h"
#else
# include "icu_tokenizer.h"
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
namespace zorba {
///////////////////////////////////////////////////////////////////////////////
TokenizerProvider const& default_tokenizer_provider() {
-#ifdef ZORBA_NO_UNICODE
+#ifdef ZORBA_NO_ICU
static LatinTokenizerProvider const instance;
#else
static ICU_TokenizerProvider const instance;
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
return instance;
};
=== modified file 'src/runtime/full_text/latin_tokenizer.cpp'
--- src/runtime/full_text/latin_tokenizer.cpp 2012-03-28 05:19:57 +0000
+++ src/runtime/full_text/latin_tokenizer.cpp 2012-04-11 15:45:21 +0000
@@ -18,8 +18,9 @@
#include <functional>
#include <zorba/diagnostic_list.h>
-#include <zorba/xquery_exception.h>
-#include <zorba/zorba.h>
+
+#include "diagnostics/dict.h"
+#include "diagnostics/xquery_exception.h"
#include "latin_tokenizer.h"
=== modified file 'src/runtime/full_text/latin_tokenizer.h'
--- src/runtime/full_text/latin_tokenizer.h 2012-03-28 05:19:57 +0000
+++ src/runtime/full_text/latin_tokenizer.h 2012-04-11 15:45:21 +0000
@@ -14,12 +14,12 @@
* limitations under the License.
*/
-#ifndef ZORBA_WESTERN_TOKENIZER_H
-#define ZORBA_WESTERN_TOKENIZER_H
+#ifndef ZORBA_LATIN_TOKENIZER_H
+#define ZORBA_LATIN_TOKENIZER_H
#include <zorba/config.h>
-#ifdef ZORBA_NO_FULL_TEXT
+#ifdef ZORBA_NO_ICU
#include <zorba/tokenizer.h>
#include "zorbatypes/zstring.h"
@@ -38,8 +38,8 @@
// inherited
void destroy() const;
- void tokenize( char const*, size_type, iso639_1::type, bool, Callback&,
- void* );
+ void tokenize( char const*, size_type, locale::iso639_1::type, bool,
+ Callback&, void* );
private:
typedef zstring string_type;
@@ -64,13 +64,14 @@
class LatinTokenizerProvider : public TokenizerProvider {
public:
// inherited
- Tokenizer::ptr getTokenizer( iso639_1::type, Tokenizer::Numbers& ) const;
+ Tokenizer::ptr getTokenizer( locale::iso639_1::type,
+ Tokenizer::Numbers& ) const;
};
///////////////////////////////////////////////////////////////////////////////
} // namespace zorba
-#endif /* ZORBA_NO_FULL_TEXT */
-#endif /* ZORBA_WESTERN_TOKENIZER_H */
+#endif /* ZORBA_NO_ICU */
+#endif /* ZORBA_LATIN_TOKENIZER_H */
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/numerics/format_integer_impl.cpp'
--- src/runtime/numerics/format_integer_impl.cpp 2012-03-28 05:19:57 +0000
+++ src/runtime/numerics/format_integer_impl.cpp 2012-04-11 15:45:21 +0000
@@ -881,7 +881,7 @@
utf8_result += (*valueit);
}
else
- utf8_result += (0x2080 + *valueit - '0');
+ utf8_result += (unicode::code_point)(0x2080 + *valueit - '0');
}
}
else if((c0 == 0x2460) || //CIRCLED DIGIT ONE (1-20)
=== modified file 'src/runtime/numerics/numerics_impl.cpp'
--- src/runtime/numerics/numerics_impl.cpp 2012-03-28 05:19:57 +0000
+++ src/runtime/numerics/numerics_impl.cpp 2012-04-11 15:45:21 +0000
@@ -462,7 +462,7 @@
minus( "-" )
{
utf8_string<zstring> u_per_mille( per_mille );
- u_per_mille = 0x2030;
+ u_per_mille = (unicode::code_point)0x2030;
}
void readFormat(const DecimalFormat_t& df_t)
=== modified file 'src/runtime/strings/strings_impl.cpp'
--- src/runtime/strings/strings_impl.cpp 2012-03-28 05:19:57 +0000
+++ src/runtime/strings/strings_impl.cpp 2012-04-11 15:45:21 +0000
@@ -810,7 +810,9 @@
zstring normForm;
zstring resStr;
unicode::normalization::type normType;
+#ifndef ZORBA_NO_ICU
bool success;
+#endif /* ZORBA_NO_ICU */
PlanIteratorState* state;
DEFAULT_STACK_INIT(PlanIteratorState, state, planState);
@@ -860,10 +862,10 @@
}
item0->getStringValue2(resStr);
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
success = utf8::normalize(resStr, normType, &resStr);
ZORBA_ASSERT(success);
-#endif//#ifndef ZORBA_NO_UNICODE
+#endif//#ifndef ZORBA_NO_ICU
STACK_PUSH(GENV_ITEMFACTORY->createString(result, resStr), state );
}
else
@@ -992,7 +994,7 @@
trans_map[ *map_i ] = *trans_i;
for ( ; map_i != map_end; ++map_i )
- trans_map[ *map_i ] = ~0;
+ trans_map[ *map_i ] = static_cast<unicode::code_point>( ~0 );
}
utf8_string<zstring> u_result_string( result_string );
@@ -1007,7 +1009,7 @@
cp_map_type::const_iterator const found_i = trans_map.find( cp );
if ( found_i != trans_map.end() ) {
cp = found_i->second;
- if ( cp == ~0 )
+ if ( cp == static_cast<unicode::code_point>( ~0 ) )
continue;
}
u_result_string += cp;
@@ -1795,16 +1797,33 @@
int &utf8start,
unsigned int &bytestart,
int utf8end,
+ unsigned int byteend,
zstring &out)
{
+#ifndef ZORBA_NO_ICU
utf8::size_type clen;
- while(utf8start < utf8end)
- {
- clen = utf8::char_length(*sin);
- out.append(sin, clen);
- utf8start++;
- bytestart += clen;
- sin += clen;
+ if(utf8end)
+ {
+ while(utf8start < utf8end)
+ {
+ clen = utf8::char_length(*sin);
+ if(clen == 0)
+ clen = 1;
+ out.append(sin, clen);
+ utf8start++;
+ bytestart += clen;
+ sin += clen;
+ }
+ }
+ else
+#endif
+ {
+ if(!utf8end)
+ utf8end = byteend;
+ out.append(sin, utf8end-bytestart);
+ sin += utf8end-bytestart;
+ utf8start = utf8end;
+ bytestart = utf8end;
}
}
@@ -1812,6 +1831,7 @@
int &match_end1,
unsigned int &match_end1_bytes,
int match_start2,
+ unsigned int match_start2_bytes,
const char *&strin)
{
store::Item_t non_match_elem;
@@ -1833,7 +1853,7 @@
// utf8_it++;
// match_end1++;
//}
- copyUtf8Chars(strin, match_end1, match_end1_bytes, match_start2, non_match_str);
+ copyUtf8Chars(strin, match_end1, match_end1_bytes, match_start2, match_start2_bytes, non_match_str);
store::Item_t non_match_text_item;
GENV_ITEMFACTORY->createTextNode(non_match_text_item, non_match_elem, non_match_str);
}
@@ -1864,19 +1884,31 @@
i--;
break;
}
+#ifndef ZORBA_NO_ICU
match_startg = rx.get_match_start(i+1);
if((match_startg < 0) && (gparent < 0))
continue;
+#else
+ int temp_endg;
+ match_startg = -1;
+ temp_endg = -1;
+ if(!rx.get_match_start_end_bytes(i+1, &match_startg, &temp_endg) && (gparent < 0))
+ continue;
+#endif
if(match_endgood < match_startg)
{
//add non-group match text
zstring non_group_str;
- copyUtf8Chars(sin, match_endgood, match_end1_bytes, match_startg, non_group_str);
+ copyUtf8Chars(sin, match_endgood, match_end1_bytes, match_startg, 0, non_group_str);
store::Item_t non_group_text_item;
GENV_ITEMFACTORY->createTextNode(non_group_text_item, parent.getp(), non_group_str);
}
+#ifndef ZORBA_NO_ICU
match_endg = rx.get_match_end(i+1);
+#else
+ match_endg = temp_endg;
+#endif
//add group match text
GENV_ITEMFACTORY->createQName(group_element_name,
static_context::W3C_FN_NS, "fn", "group");
@@ -1907,7 +1939,7 @@
}
zstring group_str;
- copyUtf8Chars(sin, match_startg, match_end1_bytes, match_endg, group_str);
+ copyUtf8Chars(sin, match_startg, match_end1_bytes, match_endg, 0, group_str);
store::Item_t group_text_item;
GENV_ITEMFACTORY->createTextNode(group_text_item, group_elem.getp(), group_str);
}
@@ -1916,7 +1948,7 @@
{
zstring non_group_str;
- copyUtf8Chars(sin, match_endgood, match_end1_bytes, match_end2, non_group_str);
+ copyUtf8Chars(sin, match_endgood, match_end1_bytes, match_end2, 0, non_group_str);
store::Item_t non_group_text_item;
GENV_ITEMFACTORY->createTextNode(non_group_text_item, parent, non_group_str);
}
@@ -2144,8 +2176,14 @@
reachedEnd = false;
while(rx.find_next_match(&reachedEnd))
{
- int match_start2 = rx.get_match_start();
- int match_end2 = rx.get_match_end();
+ int match_start2;
+ int match_end2;
+#ifndef ZORBA_NO_ICU
+ match_start2 = rx.get_match_start();
+ match_end2 = rx.get_match_end();
+#else
+ rx.get_match_start_end_bytes(0, &match_start2, &match_end2);
+#endif
ZORBA_ASSERT(match_start2 >= 0);
if(is_input_stream && reachedEnd && !instream->eof())
@@ -2157,7 +2195,7 @@
//construct the fn:non-match
if(match_start2 > match_end1)
{
- addNonMatchElement(result, match_end1, match_end1_bytes, match_start2, instr);
+ addNonMatchElement(result, match_end1, match_end1_bytes, match_start2, 0, instr);
}
//construct the fn:match
@@ -2165,7 +2203,7 @@
match_end1 = match_end2;
}
- if(is_input_stream && reachedEnd && !instream->eof())
+ if(is_input_stream && !instream->eof())
{
//load some more data, maybe the match will be different
if(match_end1_bytes)
@@ -2213,7 +2251,7 @@
else
{
if(match_end1_bytes < streambuf_read)
- addNonMatchElement(result, match_end1, match_end1_bytes, streambuf_read, instr);
+ addNonMatchElement(result, match_end1, match_end1_bytes, 0, streambuf_read, instr);
if(is_input_stream && instream->eof())
reachedEnd = true;
}
=== modified file 'src/store/api/store.h'
--- src/store/api/store.h 2012-04-10 20:59:34 +0000
+++ src/store/api/store.h 2012-04-11 15:45:21 +0000
@@ -16,7 +16,7 @@
#ifndef ZORBA_STORE_STORE_H
#define ZORBA_STORE_STORE_H
-#include <zorba/config.h>
+#include "zorba/config.h"
#include "zorbatypes/schema_types.h"
#include "store/api/shared_types.h"
=== modified file 'src/store/naive/simple_store.h'
--- src/store/naive/simple_store.h 2012-03-28 23:58:23 +0000
+++ src/store/naive/simple_store.h 2012-04-11 15:45:21 +0000
@@ -16,7 +16,11 @@
#ifndef ZORBA_SIMPLE_STORE
#define ZORBA_SIMPLE_STORE
-#include "store.h"
+#include "store/naive/store.h"
+
+#include "store/naive/node_factory.h"
+#include "store/naive/pul_primitive_factory.h"
+#include "store/naive/tree_id_generator.h"
namespace zorba {
namespace simplestore {
@@ -72,7 +76,7 @@
NodeFactory* createNodeFactory() const;
- void destroyNodeFactory(NodeFactory*) const;
+ void destroyNodeFactory(zorba::simplestore::NodeFactory*) const;
store::ItemFactory* createItemFactory() const;
@@ -84,7 +88,7 @@
PULPrimitiveFactory* createPULFactory() const;
- void destroyPULFactory(PULPrimitiveFactory*) const;
+ void destroyPULFactory(zorba::simplestore::PULPrimitiveFactory*) const;
CollectionSet* createCollectionSet() const;
=== modified file 'src/store/naive/store.h'
--- src/store/naive/store.h 2012-03-28 22:09:36 +0000
+++ src/store/naive/store.h 2012-04-11 15:45:21 +0000
@@ -16,10 +16,18 @@
#ifndef ZORBA_SIMPLESTORE_STORE_H
#define ZORBA_SIMPLESTORE_STORE_H
+#include "store/api/store.h"
+
#include "shared_types.h"
#include "store_defs.h"
#include "hashmap_nodep.h"
#include "tree_id.h"
+#include "store/util/hashmap_stringbuf.h"
+#include "zorbautils/mutex.h"
+#include "zorbautils/lock.h"
+#include "zorbautils/hashmap.h"
+#include "zorbautils/hashmap_itemp.h"
+#include "zorbautils/hashmap_zstring_nonserializable.h"
#if (defined (WIN32) || defined (WINCE))
#include "node_items.h"
@@ -28,14 +36,7 @@
#include "store/api/ic.h"
#endif
-#include "store/api/store.h"
-
-#include "store/util/hashmap_stringbuf.h"
-
-#include "zorbautils/mutex.h"
-#include "zorbautils/lock.h"
-#include "zorbautils/hashmap_itemp.h"
-#include "zorbautils/hashmap_zstring_nonserializable.h"
+using namespace zorba;
namespace zorba
{
@@ -63,9 +64,9 @@
class TreeIdGeneratorFactory;
class TreeIdGenerator;
-typedef zorba::HashMapZString<XmlNode_t> DocumentSet;
-typedef ItemPointerHashMap<store::Index_t> IndexSet;
-typedef ItemPointerHashMap<store::IC_t> ICSet;
+typedef HashMapZString<XmlNode_t> DocumentSet;
+typedef zorba::ItemPointerHashMap<store::Index_t> IndexSet;
+typedef zorba::ItemPointerHashMap<store::IC_t> ICSet;
=== modified file 'src/system/globalenv.cpp'
--- src/system/globalenv.cpp 2012-03-28 05:19:57 +0000
+++ src/system/globalenv.cpp 2012-04-11 15:45:21 +0000
@@ -17,11 +17,11 @@
#include "common/common.h"
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
# include <unicode/uclean.h>
# include <unicode/utypes.h>
# include <unicode/udata.h>
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
#ifdef ZORBA_WITH_BIG_INTEGER
# include "zorbatypes/m_apm.h"
@@ -208,7 +208,7 @@
// from one thread only
// see http://www.icu-project.org/userguide/design.html#Init_and_Termination
// and http://www.icu-project.org/apiref/icu4c/uclean_8h.html
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
# if defined U_STATIC_IMPLEMENTATION && (defined WIN32 || defined WINCE)
{
TCHAR self_path[1024];
@@ -238,13 +238,13 @@
udata_setCommonData(icu_appdata, &data_err);
ZORBA_ASSERT(data_err == U_ZERO_ERROR);
- // u_setDataDirectory(self_path);
+ // u_setDataDirectory(self_path);
}
# endif
UErrorCode lICUInitStatus = U_ZERO_ERROR;
u_init(&lICUInitStatus);
ZORBA_ASSERT(lICUInitStatus == U_ZERO_ERROR);
-#endif//ifndef ZORBA_NO_UNICODE
+#endif /* ZORBA_NO_ICU */
}
@@ -256,12 +256,12 @@
// releases statically initialized memory and prevents
// valgrind from reporting those problems at the end
// see http://www.icu-project.org/apiref/icu4c/uclean_8h.html#93f27d0ddc7c196a1da864763f2d8920
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
u_cleanup();
# if defined U_STATIC_IMPLEMENTATION && (defined WIN32 || defined WINCE)
delete[] icu_appdata;
# endif
-#endif//ifndef ZORBA_NO_UNICODE
+#endif /* ZORBA_NO_ICU */
}
=== modified file 'src/unit_tests/CMakeLists.txt'
--- src/unit_tests/CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ src/unit_tests/CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -29,9 +29,9 @@
tokenizer.cpp)
ENDIF (NOT ZORBA_NO_FULL_TEXT)
-IF (NOT ZORBA_NO_UNICODE)
+IF (NOT ZORBA_NO_ICU)
LIST (APPEND UNIT_TEST_SRCS
test_icu_streambuf.cpp)
-ENDIF (NOT ZORBA_NO_UNICODE)
+ENDIF (NOT ZORBA_NO_ICU)
# vim:set et sw=2 tw=2:
=== modified file 'src/unit_tests/string.cpp'
--- src/unit_tests/string.cpp 2012-03-28 05:19:57 +0000
+++ src/unit_tests/string.cpp 2012-04-11 15:45:21 +0000
@@ -569,6 +569,7 @@
ASSERT_TRUE( t == s );
}
+#ifndef ZORBA_NO_ICU
template<class StringType>
static void test_to_string_from_wchar_t() {
wchar_t const w[] = L"hello";
@@ -578,6 +579,7 @@
for ( string::size_type i = 0; i < s.length(); ++i )
ASSERT_TRUE( s[i] == w[i] );
}
+#endif /* ZORBA_NO_ICU */
template<class StringType>
static void test_to_upper() {
@@ -605,6 +607,7 @@
}
}
+#ifndef ZORBA_NO_ICU
static void test_to_wchar_t() {
string const s = "hello";
wchar_t *w;
@@ -616,6 +619,7 @@
ASSERT_TRUE( w[i] == s[i] );
delete[] w;
}
+#endif /* ZORBA_NO_ICU */
static void test_trim_start() {
char const *s;
@@ -873,16 +877,20 @@
test_to_string_from_utf8<zstring>();
test_to_string_from_utf8<zstring_p>();
+#ifndef ZORBA_NO_ICU
test_to_string_from_wchar_t<string>();
test_to_string_from_wchar_t<zstring>();
test_to_string_from_wchar_t<zstring_p>();
+#endif /* ZORBA_NO_ICU */
test_to_upper<string>();
test_to_upper<zstring>();
test_to_upper<zstring_p>();
test_to_upper<String>();
+#ifndef ZORBA_NO_ICU
test_to_wchar_t();
+#endif /* ZORBA_NO_ICU */
test_trim_start();
test_trim_end();
=== modified file 'src/unit_tests/unit_test_list.h'
--- src/unit_tests/unit_test_list.h 2012-03-28 05:19:57 +0000
+++ src/unit_tests/unit_test_list.h 2012-04-11 15:45:21 +0000
@@ -36,9 +36,9 @@
/**
* ADD NEW UNIT TESTS HERE
*/
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
int test_icu_streambuf( int, char*[] );
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
int json_parser( int, char*[] );
void initializeTestList();
=== modified file 'src/unit_tests/unit_tests.cpp'
--- src/unit_tests/unit_tests.cpp 2012-03-28 05:19:57 +0000
+++ src/unit_tests/unit_tests.cpp 2012-04-11 15:45:21 +0000
@@ -39,9 +39,9 @@
void initializeTestList() {
libunittests["string"] = test_string;
libunittests["uri"] = runUriTest;
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
libunittests["icu_streambuf"] = test_icu_streambuf;
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
libunittests["json_parser"] = json_parser;
libunittests["unique_ptr"] = test_unique_ptr;
#ifndef ZORBA_NO_FULL_TEXT
=== modified file 'src/util/CMakeLists.txt'
--- src/util/CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ src/util/CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -40,14 +40,14 @@
LIST(APPEND UTIL_SRCS mmap_file.cpp)
ENDIF(ZORBA_WITH_FILE_ACCESS)
-IF(ZORBA_NO_UNICODE)
+IF(ZORBA_NO_ICU)
LIST(APPEND UTIL_SRCS
- regex_ascii.cpp
+ regex_xquery.cpp
passthru_streambuf.cpp)
-ELSE(ZORBA_NO_UNICODE)
+ELSE(ZORBA_NO_ICU)
LIST(APPEND UTIL_SRCS
icu_streambuf.cpp)
-ENDIF(ZORBA_NO_UNICODE)
+ENDIF(ZORBA_NO_ICU)
HEADER_GROUP_SUBFOLDER(UTIL_SRCS fx)
HEADER_GROUP_SUBFOLDER(UTIL_SRCS win32)
=== modified file 'src/util/icu_streambuf.h'
--- src/util/icu_streambuf.h 2012-02-04 01:26:18 +0000
+++ src/util/icu_streambuf.h 2012-04-11 15:45:21 +0000
@@ -17,6 +17,7 @@
#ifndef ZORBA_ICU_STREAMBUF_H
#define ZORBA_ICU_STREAMBUF_H
+#include <unicode/ucnv.h>
#include <zorba/transcode_stream.h>
#include "util/utf8_util.h"
=== modified file 'src/util/passthru_streambuf.cpp'
--- src/util/passthru_streambuf.cpp 2012-02-04 01:26:18 +0000
+++ src/util/passthru_streambuf.cpp 2012-04-11 15:45:21 +0000
@@ -14,8 +14,8 @@
* limitations under the License.
*/
+#include "stdafx.h"
#include "passthru_streambuf.h"
-
using namespace std;
namespace zorba {
@@ -47,7 +47,7 @@
}
bool passthru_streambuf::is_supported( char const *cc_charset ) {
- return !is_necessary( charset );
+ return !is_necessary( cc_charset );
}
passthru_streambuf::pos_type
=== modified file 'src/util/passthru_streambuf.h'
--- src/util/passthru_streambuf.h 2012-02-02 18:37:24 +0000
+++ src/util/passthru_streambuf.h 2012-04-11 15:45:21 +0000
@@ -17,8 +17,9 @@
#ifndef ZORBA_PASSTHRU_STREAMBUF_H
#define ZORBA_PASSTHRU_STREAMBUF_H
-#include <zorba/transcode_streambuf.h>
-
+#include <zorba/transcode_stream.h>
+#include "zorbatypes/zstring.h"
+#include "util/ascii_util.h"
namespace zorba {
///////////////////////////////////////////////////////////////////////////////
@@ -48,6 +49,13 @@
* @return \c true only if the character encoding is supported.
*/
static bool is_supported( char const *charset );
+ static bool is_necessary( char const *cc_charset );
+
+ typedef std::streambuf::char_type char_type;
+ typedef std::streambuf::int_type int_type;
+ typedef std::streambuf::off_type off_type;
+ typedef std::streambuf::pos_type pos_type;
+ typedef std::streambuf::traits_type traits_type;
protected:
void imbue( std::locale const& );
=== modified file 'src/util/regex.cpp'
--- src/util/regex.cpp 2012-03-28 05:19:57 +0000
+++ src/util/regex.cpp 2012-04-11 15:45:21 +0000
@@ -15,8 +15,6 @@
*/
#include "stdafx.h"
-#include "regex.h"
-
#include <cstring>
#include <vector>
@@ -28,13 +26,13 @@
#include "ascii_util.h"
#include "cxx_util.h"
+#include "regex.h"
#include "stl_util.h"
#define INVALID_RE_EXCEPTION(...) \
XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS( __VA_ARGS__ ) )
-
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
# include <unicode/uversion.h>
U_NAMESPACE_USE
@@ -103,6 +101,7 @@
bool got_backslash = false;
bool in_char_class = false; // within [...]
+ bool is_first_char = true; // to check ^ placement
bool in_backref = false; // '\'[1-9][0-9]*
unsigned backref_no = 0; // 1-based
@@ -231,6 +230,8 @@
++open_cap_subs;
cap_sub.push_back( true );
cur_cap_sub = cap_sub.size();
+ is_first_char = true;
+ goto append;
}
break;
case ')':
@@ -245,8 +246,10 @@
case '[':
if ( q_flag )
*icu_re += '\\';
- else
+ else {
in_char_class = true;
+ goto append;
+ }
break;
case ']':
if ( q_flag )
@@ -254,6 +257,19 @@
else
in_char_class = false;
break;
+ case '^':
+ if ( q_flag )
+ *icu_re += '\\';
+ else if ( !is_first_char && !in_char_class )
+ throw INVALID_RE_EXCEPTION( xq_re, ZED( UnescapedChar_3 ), *xq_c );
+ break;
+ case '|':
+ if ( q_flag )
+ *icu_re += '\\';
+ else {
+ is_first_char = true;
+ goto append;
+ }
default:
if ( x_flag && ascii::is_space( *xq_c ) ) {
if ( !in_char_class )
@@ -265,37 +281,42 @@
//
*icu_re += '\\';
}
- }
- }
+ } // switch
+ } // else
+ is_first_char = false;
+append:
*icu_re += *xq_c;
} // FOR_EACH
- if ( i_flag ) {
- //
- // XQuery 3.0 F&O 5.6.1.1: All other constructs are unaffected by the "i"
- // flag. For example, "\p{Lu}" continues to match upper-case letters only.
- //
- // However, ICU lower-cases everything for the 'i' flag; hence we have to
- // turn off the 'i' flag for just the \p{Lu}.
- //
- // Note that the "6" and "12" below are correct since "\\" represents a
- // single '\'.
- //
- ascii::replace_all( *icu_re, "\\p{Lu}", 6, "(?-i:\\p{Lu})", 12 );
- }
+ if ( !q_flag ) {
+ if ( i_flag ) {
+ //
+ // XQuery 3.0 F&O 5.6.1.1: All other constructs are unaffected by the "i"
+ // flag. For example, "\p{Lu}" continues to match upper-case letters
+ // only.
+ //
+ // However, ICU lower-cases everything for the 'i' flag; hence we have to
+ // turn off the 'i' flag for just the \p{Lu}.
+ //
+ // Note that the "6" and "12" below are correct since "\\" represents a
+ // single '\'.
+ //
+ ascii::replace_all( *icu_re, "\\p{Lu}", 6, "(?-i:\\p{Lu})", 12 );
+ }
- //
- // XML Schema Part 2 F.1.1: [Unicode Database] groups code points into a
- // number of blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement,
- // Hangul Jamo, CJK Compatibility, etc. The set containing all characters
- // that have block name X (with all white space stripped out), can be
- // identified with a block escape \p{IsX}.
- //
- // However, ICU uses \p{InX} rather than \p{IsX}.
- //
- // Note that the "5" below is correct since "\\" represents a single '\'.
- //
- ascii::replace_all( *icu_re, "\\p{Is", 5, "\\p{In", 5 );
+ //
+ // XML Schema Part 2 F.1.1: [Unicode Database] groups code points into a
+ // number of blocks such as Basic Latin (i.e., ASCII), Latin-1 Supplement,
+ // Hangul Jamo, CJK Compatibility, etc. The set containing all characters
+ // that have block name X (with all white space stripped out), can be
+ // identified with a block escape \p{IsX}.
+ //
+ // However, ICU uses \p{InX} rather than \p{IsX}.
+ //
+ // Note that the "5" below is correct since "\\" represents a single '\'.
+ //
+ ascii::replace_all( *icu_re, "\\p{Is", 5, "\\p{In", 5 );
+ } // q_flag
}
///////////////////////////////////////////////////////////////////////////////
@@ -442,11 +463,11 @@
}
} // namespace unicode
-
-}//namespace zorba
-
-
-#else /* ZORBA_NO_UNICODE */
+} // namespace zorba
+
+///////////////////////////////////////////////////////////////////////////////
+
+#else /* ZORBA_NO_ICU */
#include "zorbatypes/zstring.h"
@@ -470,7 +491,7 @@
case 'i': flags |= REGEX_ASCII_CASE_INSENSITIVE; break;
case 's': flags |= REGEX_ASCII_DOTALL; break;
case 'm': flags |= REGEX_ASCII_MULTILINE; break;
- case 'x': flags |= REGEX_ASCII_COMMENTS; break;
+ case 'x': flags |= REGEX_ASCII_NO_WHITESPACE; break;
case 'q': flags |= REGEX_ASCII_LITERAL; break;
default:
throw XQUERY_EXCEPTION( err::FORX0001, ERROR_PARAMS( *p ) );
@@ -483,6 +504,7 @@
void regex::compile( char const *pattern, char const *flags)
{
parsed_flags = parse_regex_flags(flags);
+ regex_xquery::CRegexXQuery_parser regex_parser;
regex_matcher = regex_parser.parse(pattern, parsed_flags);
if(!regex_matcher)
throw INVALID_RE_EXCEPTION(pattern);
@@ -517,6 +539,8 @@
bool regex::next_token( char const *s, size_type *pos, zstring *token,
bool *matched)
{
+ if(!s[*pos])
+ return false;
bool retval;
int match_pos;
int matched_len;
@@ -528,14 +552,8 @@
token->assign(s+*pos, match_pos);
*pos += match_pos + matched_len;
if(matched)
- if(match_pos)
- *matched = true;
- else
- *matched = false;
- if(match_pos)
- return true;
- else
- return false;
+ *matched = true;
+ return true;
}
else
{
@@ -544,7 +562,7 @@
*pos += strlen(s+*pos);
if(matched)
*matched = false;
- return s[*pos] != 0;
+ return true;
}
}
@@ -554,13 +572,9 @@
int matched_pos;
int matched_len;
- bool prev_align = regex_matcher->set_align_begin(true);
- retval = regex_matcher->match_from(s, parsed_flags, &matched_pos, &matched_len);
- regex_matcher->set_align_begin(prev_align);
+ retval = regex_matcher->match_anywhere(s, parsed_flags|REGEX_ASCII_WHOLE_MATCH, &matched_pos, &matched_len);
if(!retval)
return false;
- if(matched_len != strlen(s))
- return false;
return true;
}
@@ -587,14 +601,19 @@
//look for dollars
if(*temprepl == '\\')
{
- temprepl++;
- if(!*temprepl || (*temprepl != '\\') || (*temprepl != '$'))//Invalid replacement string.
- throw XQUERY_EXCEPTION( err::FORX0004, ERROR_PARAMS( replacement ) );
+ if(!(parsed_flags & REGEX_ASCII_LITERAL))
+ {
+ temprepl++;
+ if(!*temprepl)
+ temprepl--;
+ else if((*temprepl != '\\') && (*temprepl != '$'))//Invalid replacement string.
+ throw XQUERY_EXCEPTION( err::FORX0004, ERROR_PARAMS( replacement ) );
+ }
result->append(1, *temprepl);
temprepl++;
continue;
}
- if(*temprepl == '$')
+ if((*temprepl == '$') && !(parsed_flags & REGEX_ASCII_LITERAL))
{
temprepl++;
index = 0;
@@ -648,7 +667,7 @@
if(retval)
{
m_match_pos += m_pos;
- m_pos = m_match_pos = m_matched_len;
+ m_pos = m_match_pos + m_matched_len;
}
else
{
@@ -666,35 +685,30 @@
return (int)regex_matcher->get_indexed_regex_count();
}
-int regex::get_match_start( int groupId )
-{
- if(groupId == 0)
- return m_match_pos;
- if(groupId > (int)regex_matcher->get_indexed_regex_count())
- return -1;
- const char *submatched_source;
- int submatched_len;
- if(!regex_matcher->get_indexed_match(groupId, &submatched_source, &submatched_len))
- return -1;
- return submatched_source - s_in_.c_str();
-}
-
-int regex::get_match_end( int groupId )
-{
- if(groupId == 0)
- return m_match_pos + m_matched_len;
- if(groupId > (int)regex_matcher->get_indexed_regex_count())
- return -1;
- const char *submatched_source;
- int submatched_len;
- if(!regex_matcher->get_indexed_match(groupId, &submatched_source, &submatched_len))
- return -1;
- return submatched_source - s_in_.c_str() + submatched_len;
+bool regex::get_match_start_end_bytes( int groupId, int *start, int *end )
+{
+ *start = -1;
+ *end = -1;
+ if(groupId == 0)
+ {
+ *start = m_match_pos;
+ *end = m_match_pos + m_matched_len;
+ return true;
+ }
+ if(groupId > (int)regex_matcher->get_indexed_regex_count())
+ return false;
+ const char *submatched_source;
+ int submatched_len;
+ if(!regex_matcher->get_indexed_match(groupId, &submatched_source, &submatched_len))
+ return false;
+ *start = submatched_source - s_in_.c_str();
+ *end = *start + submatched_len;
+ return true;
}
} // namespace unicode
} // namespace zorba
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/util/regex.h'
--- src/util/regex.h 2012-03-28 05:19:57 +0000
+++ src/util/regex.h 2012-04-11 15:45:21 +0000
@@ -17,15 +17,13 @@
#ifndef ZORBA_REGEX_H
#define ZORBA_REGEX_H
-#ifndef ZORBA_NO_UNICODE
-#include <unicode/regex.h>
-#endif
-
#include "cxx_util.h"
#include "unicode_util.h"
#include "zorbatypes/zstring.h"
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
+
+#include <unicode/regex.h>
namespace zorba {
@@ -496,15 +494,17 @@
} // namespace unicode
} // namespace zorba
-#else ///ZORBA_NO_UNICODE (ascii part:)
-
-#include "util/regex_ascii.h"
+///////////////////////////////////////////////////////////////////////////////
+
+#else /* ZORBA_NO_ICU */
+
+#include "util/regex_xquery.h"
#include <string>
namespace zorba{
/**
* Converts an XQuery regular expression to the form used by the regular
- * expression library Zorba is using (here regex_ascii).
+ * expression library Zorba is using (here regex_xquery).
*
* @param xq_re The XQuery regular expression.
* @param lib_re A pointer to the resuling library regular expression.
@@ -525,7 +525,7 @@
/**
* Constructs a %regex.
*/
- regex() : regex_matcher( NULL ) { }
+ regex() : regex_matcher( nullptr ) { }
/**
* Destroys a %regex.
@@ -835,31 +835,21 @@
/**
* Get the start position of the matched group.
- * If groupId is zero, then the start position of the whole match is returned.
- * If groupId is non-zero, then the start position of that group is returned.
- * If that group has not been matched, -1 is returned.
+ * If groupId is zero, then the start and end position of the whole match is returned.
+ * If groupId is non-zero, then the start and end position of that group is returned.
+ * If that group has not been matched, false is returned.
*
* @param groupId the id of the group, either zero for the entire regex,
* or [1 .. group_count] for that specific group
- * @return the start position, zero based, or -1 if that group didn't match
+ * @param start to return start position in bytes
+ * @param end to return end position in bytes
+ * @return true if that group exists and has been matched
*/
- int get_match_start( int groupId = 0 );
+ bool get_match_start_end_bytes( int groupId, int *start, int *end );
- /**
- * Get the end position of the matched group.
- * If groupId is zero, then the end position of the whole match is returned.
- * If groupId is non-zero, then the end position of that group is returned.
- * If that group has not been matched, -1 is returned.
- *
- * @param groupId the id of the group, either zero for the entire regex,
- * or [1 .. group_count] for that specific group
- * @return the end position, zero based, or -1 if that group didn't match
- */
- int get_match_end( int groupId = 0 );
private:
- regex_ascii::CRegexAscii_parser regex_parser;
- regex_ascii::CRegexAscii_regex *regex_matcher;
+ regex_xquery::CRegexXQuery_regex *regex_matcher;
uint32_t parsed_flags;
zstring s_in_;
@@ -873,15 +863,13 @@
regex( regex const& );
regex& operator=( regex const& );
};
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace unicode
} // namespace zorba
-#endif /* ZORBA_NO_UNICODE */
-
-
-///////////////////////////////////////////////////////////////////////////////
-
-
+#endif /* ZORBA_NO_ICU */
#endif /* ZORBA_REGEX_H */
/*
* Local variables:
=== renamed file 'src/util/regex_ascii.cpp' => 'src/util/regex_xquery.cpp'
--- src/util/regex_ascii.cpp 2012-03-28 05:19:57 +0000
+++ src/util/regex_xquery.cpp 2012-04-11 15:45:21 +0000
@@ -1,4 +1,4 @@
-a/*
+/*
* Copyright 2006-2008 The FLWOR Foundation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
@@ -18,12 +18,15 @@
#include "diagnostics/xquery_diagnostics.h"
-#include "regex_ascii.h"
+#include "regex_xquery.h"
#include <string.h>
#include "zorbatypes/chartype.h"
+#include "util/unicode_categories.h"
+#include "util/ascii_util.h"
+#include "util/utf8_string.h"
namespace zorba {
- namespace regex_ascii{
+ namespace regex_xquery{
//ascii regular expression matching
/*http://www.w3.org/TR/xmlschema-2/#regexs
@@ -62,96 +65,138 @@
+ http://www.w3.org/TR/xquery-operators/#regex-syntax (not implemented)
*/
+
+static bool compare_ascii_i(const char *str1, const char *str2)
+{
+ while(*str1 && *str2)
+ {
+ if(ascii::to_lower(*str1) != ascii::to_lower(*str2))
+ return false;
+ str1++;
+ str2++;
+ }
+ if(*str1 || *str2)
+ return false;
+ return true;
+}
+
+static bool compare_unicode_ni(const char *str1, const char *str2, int len)
+{
+ while(len > 0)
+ {
+ const char *temp_str1 = str1;
+ const char *temp_str2 = str2;
+ unicode::code_point cp1 = unicode::to_upper(utf8::next_char(temp_str1));
+ unicode::code_point cp2 = unicode::to_upper(utf8::next_char(temp_str2));
+ if(cp1 != cp2)
+ return false;
+ len -= temp_str1-str1;
+ str1 = temp_str1;
+ str2 = temp_str2;
+ }
+ return true;
+}
+static utf8::size_type myutf8len(const char *source)
+{
+ utf8::size_type len = utf8::char_length(*source);
+ if(!len)
+ return 1;
+ else
+ return len;
+}
////////////////////////////////////
////Regular expression parsing and building of the tree
////////////////////////////////////
-CRegexAscii_regex* CRegexAscii_parser::parse(const char *pattern, unsigned int flags)
+CRegexXQuery_regex* CRegexXQuery_parser::parse(const char *pattern, unsigned int flags)
{
this->flags = flags;
- bool align_begin = false;
- if(!(flags & REGEX_ASCII_LITERAL) && (pattern[0] == '^'))
- align_begin = true;
-
int regex_len;
- CRegexAscii_regex* regex = parse_regexp(pattern + (align_begin?1:0), ®ex_len);
+ CRegexXQuery_regex* regex = parse_regexp(pattern, ®ex_len);
- if(regex)
- regex->set_align_begin(align_begin);
-
return regex;
}
//until '\0' or ')'
-CRegexAscii_regex* CRegexAscii_parser::parse_regexp(const char *pattern,
+CRegexXQuery_regex* CRegexXQuery_parser::parse_regexp(const char *pattern,
int *regex_len)
{
*regex_len = 0;
int branch_len;
regex_depth++;
- CRegexAscii_regex *regex = new CRegexAscii_regex(current_regex);
+ std::auto_ptr<CRegexXQuery_regex> regex(new CRegexXQuery_regex(current_regex));
if(!current_regex)
- current_regex = regex;
+ current_regex = regex.get();
if(regex_depth >= 2)
{
//mark this as group if it does not start with ?:
if(pattern[0] != '?' || pattern[1] != ':')
- current_regex->subregex.push_back(regex);
+ current_regex->subregex.push_back(regex.get());
else
*regex_len = 2;
}
- CRegexAscii_branch *branch;
+ CRegexXQuery_branch *branch;
+ bool must_read_another_branch = true;
while(pattern[*regex_len] && (pattern[*regex_len] != ')'))
{
branch = parse_branch(pattern+*regex_len, &branch_len);
if(!branch)
{
regex_depth--;
- delete regex;
return NULL;
}
regex->add_branch(branch);
*regex_len += branch_len;
+ if(pattern[*regex_len] == '|')
+ (*regex_len)++;
+ else
+ must_read_another_branch = false;
}
- if((current_regex == regex) && (pattern[*regex_len] == ')'))
+ if((current_regex == regex.get()) && (pattern[*regex_len] == ')'))
{
- throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(U_REGEX_MISMATCHED_PAREN)) );
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_MISMATCHED_PAREN)) );
}
if(pattern[*regex_len])
(*regex_len)++;
+ if(must_read_another_branch)
+ regex->add_branch(new CRegexXQuery_branch(current_regex));//add empty branch
regex->flags = 0;//finished initialization
regex_depth--;
- return regex;
+ return regex.release();
}
-CRegexAscii_branch* CRegexAscii_parser::parse_branch(const char *pattern, int *branch_len)
+CRegexXQuery_branch* CRegexXQuery_parser::parse_branch(const char *pattern, int *branch_len)
{
int piece_len;
- CRegexAscii_branch *branch = new CRegexAscii_branch(current_regex);
- CRegexAscii_piece *piece;
+ std::auto_ptr<CRegexXQuery_branch> branch(new CRegexXQuery_branch(current_regex));
+ CRegexXQuery_piece *piece;
*branch_len = 0;
while(pattern[*branch_len] && (pattern[*branch_len] != '|') && (pattern[*branch_len] != ')'))
{
piece = parse_piece(pattern+*branch_len, &piece_len);
if(!piece)
{
- delete branch;
return NULL;
}
+ if(branch->piece_list.size() && dynamic_cast<CRegexXQuery_pinstart*>(piece->atom))
+ {
+ //found ^ that is not at the beginning of branch
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_ATOM_CHAR), '^') );
+ }
branch->add_piece(piece);
*branch_len += piece_len;
}
- if(pattern[*branch_len] == '|')
- (*branch_len)++;
- return branch;
+ //if(pattern[*branch_len] == '|')
+ // (*branch_len)++;
+ return branch.release();
}
//piece = atom + quantifier
-CRegexAscii_piece* CRegexAscii_parser::parse_piece(const char *pattern, int *piece_len)
+CRegexXQuery_piece* CRegexXQuery_parser::parse_piece(const char *pattern, int *piece_len)
{
- CRegexAscii_piece *piece = new CRegexAscii_piece;
+ std::auto_ptr<CRegexXQuery_piece> piece(new CRegexXQuery_piece);
IRegexAtom *atom;
*piece_len = 0;
@@ -160,19 +205,18 @@
atom = read_atom(pattern, &atom_len);
if(!atom)
{
- delete piece;
return NULL;
}
piece->set_atom(atom);
if(!(flags & REGEX_ASCII_LITERAL))
- read_quantifier(piece, pattern+atom_len, &quantif_len);
+ read_quantifier(piece.get(), pattern+atom_len, &quantif_len);
*piece_len += atom_len + quantif_len;
- return piece;
+ return piece.release();
}
-char CRegexAscii_parser::myishex(char c)
+char CRegexXQuery_parser::myishex(char c)
{
if((c >= '0') && (c <= '9'))
return c-'0'+1;
@@ -183,26 +227,125 @@
return 0;//not a hex
}
-bool CRegexAscii_parser::myisdigit(char c)
-{
- return (c >= '0') || (c <= '9');
-}
-
-char CRegexAscii_parser::readChar(const char *pattern, int *char_len, bool *is_multichar)
+bool CRegexXQuery_parser::myisdigit(char c)
+{
+ return (c >= '0') && (c <= '9');
+}
+
+bool CRegexXQuery_parser::myisletterAZ(char c)
+{
+ return ((c >= 'a') && (c <= 'z')) || ((c >= 'A') && (c <= 'Z'));
+}
+
+static const unicode::code_point specials_extcp[] = {0xFFF0, 0xFFFD, 0};
+
+static CRegexXQuery_parser::block_escape_t block_escape[] =
+{
+{{0x0000, 0x007F}, NULL, "BasicLatin"},
+{{0x0080, 0x00FF}, NULL, "Latin-1Supplement"},
+{{0x0100, 0x017F}, NULL, "LatinExtended-A"},
+{{0x0180, 0x024F}, NULL, "LatinExtended-B"},
+{{0x0250, 0x02AF}, NULL, "IPAExtensions"},
+{{0x02B0, 0x02FF}, NULL, "SpacingModifierLetters"},
+{{0x0300, 0x036F}, NULL, "CombiningDiacriticalMarks"},
+{{0x0370, 0x03FF}, NULL, "Greek"},
+{{0x0400, 0x04FF}, NULL, "Cyrillic"},
+{{0x0530, 0x058F}, NULL, "Armenian"},
+{{0x0590, 0x05FF}, NULL, "Hebrew"},
+{{0x0600, 0x06FF}, NULL, "Arabic"},
+{{0x0700, 0x074F}, NULL, "Syriac"},
+{{0x0780, 0x07BF}, NULL, "Thaana"},
+{{0x0900, 0x097F}, NULL, "Devanagari"},
+{{0x0980, 0x09FF}, NULL, "Bengali"},
+{{0x0A00, 0x0A7F}, NULL, "Gurmukhi"},
+{{0x0A80, 0x0AFF}, NULL, "Gujarati"},
+{{0x0B00, 0x0B7F}, NULL, "Oriya"},
+{{0x0B80, 0x0BFF}, NULL, "Tamil"},
+{{0x0C00, 0x0C7F}, NULL, "Telugu"},
+{{0x0C80, 0x0CFF}, NULL, "Kannada"},
+{{0x0D00, 0x0D7F}, NULL, "Malayalam"},
+{{0x0D80, 0x0DFF}, NULL, "Sinhala"},
+{{0x0E00, 0x0E7F}, NULL, "Thai"},
+{{0x0E80, 0x0EFF}, NULL, "Lao"},
+{{0x0F00, 0x0FFF}, NULL, "Tibetan"},
+{{0x1000, 0x109F}, NULL, "Myanmar"},
+{{0x10A0, 0x10FF}, NULL, "Georgian"},
+{{0x1100, 0x11FF}, NULL, "HangulJamo"},
+{{0x1200, 0x137F}, NULL, "Ethiopic"},
+{{0x13A0, 0x13FF}, NULL, "Cherokee"},
+{{0x1400, 0x167F}, NULL, "UnifiedCanadianAboriginalSyllabics"},
+{{0x1680, 0x169F}, NULL, "Ogham"},
+{{0x16A0, 0x16FF}, NULL, "Runic"},
+{{0x1780, 0x17FF}, NULL, "Khmer"},
+{{0x1800, 0x18AF}, NULL, "Mongolian"},
+{{0x1E00, 0x1EFF}, NULL, "LatinExtendedAdditional"},
+{{0x1F00, 0x1FFF}, NULL, "GreekExtended"},
+{{0x2000, 0x206F}, NULL, "GeneralPunctuation"},
+{{0x2070, 0x209F}, NULL, "SuperscriptsandSubscripts"},
+{{0x20A0, 0x20CF}, NULL, "CurrencySymbols"},
+{{0x20D0, 0x20FF}, NULL, "CombiningMarksforSymbols"},
+{{0x2100, 0x214F}, NULL, "LetterlikeSymbols"},
+{{0x2150, 0x218F}, NULL, "NumberForms"},
+{{0x2190, 0x21FF}, NULL, "Arrows"},
+{{0x2200, 0x22FF}, NULL, "MathematicalOperators"},
+{{0x2300, 0x23FF}, NULL, "MiscellaneousTechnical"},
+{{0x2400, 0x243F}, NULL, "ControlPictures"},
+{{0x2440, 0x245F}, NULL, "OpticalCharacterRecognition"},
+{{0x2460, 0x24FF}, NULL, "EnclosedAlphanumerics"},
+{{0x2500, 0x257F}, NULL, "BoxDrawing"},
+{{0x2580, 0x259F}, NULL, "BlockElements"},
+{{0x25A0, 0x25FF}, NULL, "GeometricShapes"},
+{{0x2600, 0x26FF}, NULL, "MiscellaneousSymbols"},
+{{0x2700, 0x27BF}, NULL, "Dingbats"},
+{{0x2800, 0x28FF}, NULL, "BraillePatterns"},
+{{0x2E80, 0x2EFF}, NULL, "CJKRadicalsSupplement"},
+{{0x2F00, 0x2FDF}, NULL, "KangxiRadicals"},
+{{0x2FF0, 0x2FFF}, NULL, "IdeographicDescriptionCharacters"},
+{{0x3000, 0x303F}, NULL, "CJKSymbolsandPunctuation"},
+{{0x3040, 0x309F}, NULL, "Hiragana"},
+{{0x30A0, 0x30FF}, NULL, "Katakana"},
+{{0x3100, 0x312F}, NULL, "Bopomofo"},
+{{0x3130, 0x318F}, NULL, "HangulCompatibilityJamo"},
+{{0x3190, 0x319F}, NULL, "Kanbun"},
+{{0x31A0, 0x31BF}, NULL, "BopomofoExtended"},
+{{0x3200, 0x32FF}, NULL, "EnclosedCJKLettersandMonths"},
+{{0x3300, 0x33FF}, NULL, "CJKCompatibility"},
+{{0x3400, 0x4DB5}, NULL, "CJKUnifiedIdeographsExtensionA"},
+{{0x4E00, 0x9FFF}, NULL, "CJKUnifiedIdeographs"},
+{{0xA000, 0xA48F}, NULL, "YiSyllables"},
+{{0xA490, 0xA4CF}, NULL, "YiRadicals"},
+{{0xAC00, 0xD7A3}, NULL, "HangulSyllables"},
+{{0xE000, 0xF8FF}, NULL, "PrivateUse"},
+{{0xF900, 0xFAFF}, NULL, "CJKCompatibilityIdeographs"},
+{{0xFB00, 0xFB4F}, NULL, "AlphabeticPresentationForms"},
+{{0xFB50, 0xFDFF}, NULL, "ArabicPresentationForms-A"},
+{{0xFE20, 0xFE2F}, NULL, "CombiningHalfMarks"},
+{{0xFE30, 0xFE4F}, NULL, "CJKCompatibilityForms"},
+{{0xFE50, 0xFE6F}, NULL, "SmallFormVariants"},
+{{0xFE70, 0xFEFE}, NULL, "ArabicPresentationForms-B"},
+{{0xFEFF, 0xFEFF}, specials_extcp, "Specials"},
+{{0xFF00, 0xFFEF}, NULL, "HalfwidthandFullwidthForms"}
+};
+
+CRegexXQuery_charmatch* CRegexXQuery_parser::readChar(const char *pattern,
+ int *char_len,
+ enum CHARGROUP_t *multichar_type)
{
char c = 0;
*char_len = 0;
- *is_multichar = false;
+ *multichar_type = CHARGROUP_NO_MULTICHAR;
switch(pattern[*char_len])
{
case '\\':
- { (*char_len)++;
+ {
+ (*char_len)++;
switch(pattern[*char_len])
{
- case 'n': c = '\n';break;
- case 'r': c = '\r';break;
- case 't': c = '\t';break;
+ case 'n': c = '\n';(*char_len)++;return new CRegexXQuery_char_ascii(current_regex, c);
+ case 'r': c = '\r';(*char_len)++;return new CRegexXQuery_char_ascii(current_regex, c);
+ case 't': c = '\t';(*char_len)++;return new CRegexXQuery_char_ascii(current_regex, c);
case '\\':
+ case '/'://+
case '|':
case '.':
case '?':
@@ -216,19 +359,205 @@
case '['://#x5B
case ']'://#x5D
case '^'://#x5E
+ case '$'://+
c = pattern[*char_len];
- break;
+ (*char_len)++;
+ *multichar_type = CHARGROUP_FLAGS_ONECHAR_ASCII;
+ return new CRegexXQuery_char_ascii(current_regex, c);
case 'p'://catEsc
case 'P'://complEsc
+ {
//ignore the prop for now
- c = pattern[*char_len];
- *is_multichar = true;
- if(pattern[*char_len+1] == '{')
- {
- while(pattern[*char_len] != '}')
+ *multichar_type = CHARGROUP_FLAGS_MULTICHAR_p;//(CHARGROUP_t)((pattern[*char_len] == 'P') ? 128 : 0);
+ bool is_reverse = (pattern[*char_len] == 'P');
+ c = 0;
+ if(pattern[(*char_len)+1] != '{')
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_BROKEN_P_CONSTRUCT)) );
+ }
+ (*char_len) += 2;
+ switch(pattern[*char_len])
+ {//IsCategory
+ case 'L':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Ll + 50;break;
+ case 'u':
+ c = unicode::UNICODE_Lu; (*char_len)++;break;
+ case 'l':
+ c = unicode::UNICODE_Ll; (*char_len)++;break;
+ case 't':
+ c = unicode::UNICODE_Lt; (*char_len)++;break;
+ case 'm':
+ c = unicode::UNICODE_Lm; (*char_len)++;break;
+ case 'o':
+ c = unicode::UNICODE_Lo; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PL_CONSTRUCT)) );
+ }
+ }break;
+ case 'M':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Mc + 50;break;
+ case 'n':
+ c = unicode::UNICODE_Mn; (*char_len)++;break;
+ case 'c':
+ c = unicode::UNICODE_Mc; (*char_len)++;break;
+ case 'e':
+ c = unicode::UNICODE_Me; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PM_CONSTRUCT)) );
+ }
+ }break;
+ case 'N':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Nd + 50;break;
+ case 'd':
+ c = unicode::UNICODE_Nd; (*char_len)++;break;
+ case 'l':
+ c = unicode::UNICODE_Nl; (*char_len)++;break;
+ case 'o':
+ c = unicode::UNICODE_No; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PN_CONSTRUCT)) );
+ }
+ }break;
+ case 'P':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Pc + 50;break;
+ case 'c':
+ c = unicode::UNICODE_Pc; (*char_len)++;break;
+ case 'd':
+ c = unicode::UNICODE_Pd; (*char_len)++;break;
+ case 's':
+ c = unicode::UNICODE_Ps; (*char_len)++;break;
+ case 'e':
+ c = unicode::UNICODE_Pe; (*char_len)++;break;
+ case 'i':
+ c = unicode::UNICODE_Pi; (*char_len)++;break;
+ case 'f':
+ c = unicode::UNICODE_Pf; (*char_len)++;break;
+ case 'o':
+ c = unicode::UNICODE_Po; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PP_CONSTRUCT)) );
+ }
+ }break;
+ case 'Z':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Zl + 50;break;
+ case 's':
+ c = unicode::UNICODE_Zs; (*char_len)++;break;
+ case 'l':
+ c = unicode::UNICODE_Zl; (*char_len)++;break;
+ case 'p':
+ c = unicode::UNICODE_Zp; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PZ_CONSTRUCT)) );
+ }
+ }break;
+ case 'S':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Sc + 50;break;
+ case 'm':
+ c = unicode::UNICODE_Sm; (*char_len)++;break;
+ case 'c':
+ c = unicode::UNICODE_Sc; (*char_len)++;break;
+ case 'k':
+ c = unicode::UNICODE_Sk; (*char_len)++;break;
+ case 'o':
+ c = unicode::UNICODE_So; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PS_CONSTRUCT)) );
+ }
+ }break;
+ case 'C':
+ {
+ switch(pattern[(*char_len)+1])
+ {
+ case '}':
+ c = unicode::UNICODE_Cc + 50;break;
+ case 'c':
+ c = unicode::UNICODE_Cc; (*char_len)++;break;
+ case 'f':
+ c = unicode::UNICODE_Cf; (*char_len)++;break;
+ case 'o':
+ c = unicode::UNICODE_Co; (*char_len)++;break;
+ case 'n':
+ c = unicode::UNICODE_Cn; (*char_len)++;break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PC_CONSTRUCT)) );
+ }
+ }break;
+ }//end switch
+ if(c)
+ {
+ if(pattern[(*char_len) + 1] != '}')
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_BROKEN_P_CONSTRUCT)) );
+ (*char_len)++;
+ (*char_len)++;
+ return new CRegexXQuery_multicharP(current_regex, c, is_reverse);
+ }
+ if(pattern[*char_len] == 'I')
+ {
+ if(pattern[(*char_len)+1] == 's')//IsBlock
+ {
+ *multichar_type = CHARGROUP_FLAGS_MULTICHAR_Is;
+ (*char_len) += 2;
+ zstring block_name;
+ char tempc = pattern[(*char_len)];
+ while(tempc && (tempc != '}'))
+ {
+ if(!myisletterAZ(tempc) && !myisdigit(tempc) && (tempc != '-'))
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_BROKEN_PIs_CONSTRUCT)) );
+ block_name.append(1, tempc);
+ (*char_len)++;
+ tempc = pattern[(*char_len)];
+ }
+ if(!tempc)
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_BROKEN_PIs_CONSTRUCT)) );
+ //search for the block name
+ int i;
+ int nr_blocks = sizeof(block_escape)/sizeof(CRegexXQuery_parser::block_escape_t);
+ for(i=0;i<nr_blocks;i++)
+ {
+ if(compare_ascii_i(block_name.c_str(), block_escape[i].group_name))
+ {
+ c = i;
+ break;
+ }
+ }
+ if(i==nr_blocks)
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_PIs_CONSTRUCT)) );
(*char_len)++;
- }
- break;
+ return new CRegexXQuery_multicharIs(current_regex, i, is_reverse);
+ }
+ else
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_BROKEN_PIs_CONSTRUCT)) );
+ }
+ else
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_BROKEN_P_CONSTRUCT)) );
+ }
+ break;//unreachable
+ }//end case 'p'
//multiCharEsc
case 's':
case 'S':
@@ -240,40 +569,104 @@
case 'D':
case 'w':
case 'W':
- *is_multichar = true;
+ *multichar_type = CHARGROUP_FLAGS_MULTICHAR_OTHER;
c = pattern[*char_len];
- break;
- }
- break;
- }
- case '#':///might be #xXX
- {
- if((pattern[*char_len+1] == 'x') &&
- myishex(pattern[*char_len+2]) && myishex(pattern[*char_len+3]))
- {
- c = (myishex(pattern[*char_len+2])-1)<<4 | (myishex(pattern[*char_len+3])-1);
- *char_len += 3;
- break;
- }
- }
+ (*char_len)++;
+ return new CRegexXQuery_multicharOther(current_regex, c);
+ case 'u'://unicode codepoint \uXXXX
+ {
+ unicode::code_point utf8c = 0;
+ (*char_len)++;
+ for(int i=0;i<4;i++)
+ {
+ char hex = myishex(pattern[*char_len]);
+ if(!hex)
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_UNICODE_CODEPOINT_u)) );
+ }
+ utf8c <<= 4;
+ utf8c |= (hex-1) & 0x0f;
+ (*char_len)++;
+ }
+ return create_charmatch(utf8c, NULL, 0, multichar_type);
+ }
+ case 'U'://unicode codepoint \UXXXXXXXX
+ {
+ unicode::code_point utf8c = 0;
+ (*char_len)++;
+ for(int i=0;i<8;i++)
+ {
+ char hex = myishex(pattern[*char_len]);
+ if(!hex)
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_UNICODE_CODEPOINT_u)) );
+ }
+ utf8c <<= 4;
+ utf8c |= (hex-1) & 0x0f;
+ (*char_len)++;
+ }
+ return create_charmatch(utf8c, NULL, 0, multichar_type);
+ }
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_UNKNOWN_ESC_CHAR)) );
+ }
+ assert(false);
+ break;//unreachable
+ }//end case '\'
default:
- c = pattern[*char_len];
- break;
- }
-
- (*char_len)++;
- return c;
-}
-
-
-
-IRegexAtom* CRegexAscii_parser::read_atom(const char *pattern, int *atom_len)
+ {
+ const char *temp_pattern = pattern;
+ unicode::code_point utf8c = utf8::next_char(temp_pattern);
+ (*char_len) = temp_pattern - pattern;
+ return create_charmatch(utf8c, pattern, *char_len, multichar_type);
+ }
+ }
+ return NULL;
+}
+
+CRegexXQuery_charmatch *CRegexXQuery_parser::create_charmatch(unicode::code_point utf8c,
+ const char *pattern, int utf8len,
+ enum CHARGROUP_t *multichar_type)
+{
+ if(utf8c <= 0x7F)
+ {
+ *multichar_type = CHARGROUP_FLAGS_ONECHAR_ASCII;
+ if(flags & REGEX_ASCII_CASE_INSENSITIVE)
+ return new CRegexXQuery_char_ascii_i(current_regex, (char)utf8c);
+ else
+ return new CRegexXQuery_char_ascii(current_regex, (char)utf8c);
+ }
+ else
+ {
+ *multichar_type = CHARGROUP_FLAGS_ONECHAR_UNICODE;
+ if(flags & REGEX_ASCII_CASE_INSENSITIVE)
+ return new CRegexXQuery_char_unicode_i(current_regex, utf8c);
+ else
+ {
+ if(pattern)
+ return new CRegexXQuery_char_unicode(current_regex, pattern, utf8len);
+ else
+ return new CRegexXQuery_char_unicode_cp(current_regex, utf8c);
+ }
+ }
+}
+
+IRegexAtom* CRegexXQuery_parser::read_atom(const char *pattern, int *atom_len)
{
*atom_len = 0;
- char c;
- bool is_end_line = false;
- c = pattern[*atom_len];
- if((!(flags & REGEX_ASCII_LITERAL)) && (c == '\\'))
+ if(flags & REGEX_ASCII_LITERAL)
+ {
+ unicode::code_point utf8c;
+ //bool is_end_line = false;
+ const char *temp_pattern = pattern;
+ utf8c = utf8::next_char(temp_pattern);
+ *atom_len = temp_pattern - pattern;
+ enum CHARGROUP_t multichar_type;
+ return create_charmatch(utf8c, pattern, *atom_len, &multichar_type);
+ }
+
+ char c = *pattern;
+ if(c == '\\')
{
//check for back reference
if(myisdigit(pattern[(*atom_len)+1]))
@@ -281,13 +674,13 @@
(*atom_len)++;
if(pattern[*atom_len] == '0')
{
- throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(U_REGEX_INVALID_BACK_REF)) );
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_BACK_REF), 0, current_regex->subregex.size()) );
}
unsigned int backref = pattern[*atom_len] - '0';
if((backref > current_regex->subregex.size()) ||
(current_regex->subregex.at(backref-1)->flags != 0))
{
- throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(U_REGEX_INVALID_BACK_REF)) );
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_BACK_REF), backref, current_regex->subregex.size()) );
}
while(current_regex->subregex.size() >= backref*10)
{
@@ -303,70 +696,86 @@
break;
}
}
- return new CRegexAscii_backref(current_regex, backref);
+ (*atom_len)++;
+ return new CRegexXQuery_backref(current_regex, backref);
}
}
+ if(c == '^')
+ {
+ (*atom_len)++;
+ return new CRegexXQuery_pinstart(current_regex);
+ }
+ if((c == '}') || (c == '{') || (c == '?') || (c == '*') || (c == '+') || (c == '|'))
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_ATOM_CHAR), c) );
+ }
switch(c)
{
case '[':
{
- if(!(flags & REGEX_ASCII_LITERAL))
- {
- (*atom_len)++;
- CRegexAscii_chargroup *chargroup = NULL;
- int chargroup_len;
- chargroup = readchargroup(pattern+*atom_len, &chargroup_len);
- *atom_len += chargroup_len;
- return chargroup;
- }
+ (*atom_len)++;
+ CRegexXQuery_chargroup *chargroup = NULL;
+ int chargroup_len;
+ chargroup = readchargroup(pattern+*atom_len, &chargroup_len);
+ *atom_len += chargroup_len;
+ return chargroup;
}
case '.'://WildCharEsc
{
- if(!(flags & REGEX_ASCII_LITERAL))
- {
- CRegexAscii_wildchar *wildchar = new CRegexAscii_wildchar(current_regex);
- (*atom_len)++;
- return wildchar;
- }
+ (*atom_len)++;
+ return new CRegexXQuery_wildchar(current_regex);
}
case '('://begin an embedded reg exp
{
- if(!(flags & REGEX_ASCII_LITERAL))
- {
- (*atom_len)++;
- CRegexAscii_regex *emb_regex = NULL;
- int regex_len;
- emb_regex = parse_regexp(pattern + *atom_len, ®ex_len);
- *atom_len += regex_len;
- return emb_regex;
- }
+ (*atom_len)++;
+ CRegexXQuery_regex *emb_regex = NULL;
+ int regex_len;
+ emb_regex = parse_regexp(pattern + *atom_len, ®ex_len);
+ *atom_len += regex_len;
+ return emb_regex;
}
case '$'://end line
- if(!(flags & REGEX_ASCII_LITERAL))
- {
- is_end_line = true;
- }
+ //is_end_line = true;
+ (*atom_len)++;
+ return new CRegexXQuery_endline(current_regex);
default:
{
- char c;
+ //char c;
+ CRegexXQuery_charmatch *charmatch = NULL;
int c_len;
- bool is_multichar = false;
- if(!(flags & REGEX_ASCII_LITERAL))
- c = readChar(pattern+*atom_len, &c_len, &is_multichar);
- else
+ CHARGROUP_t multichar_type = CHARGROUP_NO_MULTICHAR;
+ *atom_len = 0;
+ while(pattern[*atom_len])
{
- c = pattern[*atom_len];
- c_len = 1;
+ charmatch = readChar(pattern+*atom_len, &c_len, &multichar_type);
+ *atom_len += c_len;
+ if((flags & REGEX_ASCII_NO_WHITESPACE) && (multichar_type == CHARGROUP_FLAGS_ONECHAR_ASCII))
+ {
+ char c = (char)charmatch->get_c();
+ if((c == ' ') || (c == '\t') || (c == '\r') || (c == '\n'))
+ {
+ //ignore this whitespace
+ delete charmatch;
+ continue;
+ }
+ else
+ break;
+ }
+ else
+ break;
}
- CRegexAscii_chargroup *chargroup = new CRegexAscii_chargroup(current_regex);
- if(is_multichar)
- chargroup->addMultiChar(c);
+ /*
+ std::auto_ptr<CRegexXQuery_chargroup> chargroup(new CRegexXQuery_chargroup(current_regex));
+ if(multichar_type)
+ chargroup->addMultiChar(c, multichar_type);
else if(is_end_line)
chargroup->addEndLine();
else
- chargroup->addCharRange(c, c);
+ chargroup->addOneChar(c);
*atom_len += c_len;
- return chargroup;
+ return chargroup.release();
+ */
+ return charmatch;
}
}
}
@@ -374,81 +783,119 @@
//read until ']'
//posCharGroup ::= ( charRange | charClassEsc )+
//charRange ::= seRange | XmlCharIncDash
-CRegexAscii_chargroup* CRegexAscii_parser::readchargroup(const char *pattern, int *chargroup_len)
+CRegexXQuery_chargroup* CRegexXQuery_parser::readchargroup(const char *pattern, int *chargroup_len)
{
- CRegexAscii_chargroup *chargroup = NULL;
+ std::auto_ptr<CRegexXQuery_chargroup> chargroup;
*chargroup_len = 0;
if(pattern[*chargroup_len] == '^')//negative group
{
(*chargroup_len)++;
- chargroup = new CRegexAscii_negchargroup(current_regex);
+ chargroup.reset(new CRegexXQuery_negchargroup(current_regex));
}
else
- chargroup = new CRegexAscii_chargroup(current_regex);
+ chargroup.reset(new CRegexXQuery_chargroup(current_regex));
while(pattern[*chargroup_len] && (pattern[*chargroup_len]!=']'))
{
- char c1, c2;
- bool is_multichar;
+ //char c1, c2;
+ CHARGROUP_t multichar_type = CHARGROUP_NO_MULTICHAR;
int c1_len;
- c1 = pattern[*chargroup_len];
- c2 = pattern[*chargroup_len+1];
- if((c1 == '-') && (c2 == '['))//charClassSub
+ if((pattern[*chargroup_len] == '-') && (pattern[(*chargroup_len)+1] == '['))//charClassSub
{
int classsub_len;
- CRegexAscii_chargroup *classsub = readchargroup(pattern + *chargroup_len+1 + 1, &classsub_len);
+ CRegexXQuery_chargroup *classsub = readchargroup(pattern + (*chargroup_len)+1 + 1, &classsub_len);
if(!classsub)
{
- delete chargroup;
- return NULL;
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_SUBCLASS)) );
}
chargroup->addClassSub(classsub);
*chargroup_len += 2 + classsub_len + 1;
if(pattern[*chargroup_len-1] != ']')
{
- delete chargroup;
- return NULL;
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_INVALID_USE_OF_SUBCLASS)) );
}
- return chargroup;
+ return chargroup.release();
}
- c1 = readChar(pattern+*chargroup_len, &c1_len, &is_multichar);
- if(is_multichar)//first char is multichar
+ std::unique_ptr<CRegexXQuery_charmatch> charmatch(readChar(pattern+*chargroup_len, &c1_len, &multichar_type));
+ if((multichar_type == CHARGROUP_FLAGS_MULTICHAR_p) ||
+ (multichar_type == CHARGROUP_FLAGS_MULTICHAR_Is) ||
+ (multichar_type == CHARGROUP_FLAGS_MULTICHAR_OTHER))//first char is multichar
{
- chargroup->addMultiChar(c1);
+ if((pattern[*chargroup_len+c1_len] == '-') &&///should not be a range
+ (pattern[*chargroup_len+c1_len+1] != ']'))
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_MULTICHAR_IN_CHAR_RANGE)) );
+ }
+ //chargroup->addMultiChar(c1, multichar_type);
+ chargroup->addCharMatch(charmatch.release());
*chargroup_len += c1_len;
continue;
}
- if(pattern[*chargroup_len+c1_len] == '-')///might be a range
+ (*chargroup_len) += c1_len;
+ if(pattern[*chargroup_len] == '-')///might be a range
{
- if(pattern[*chargroup_len+c1_len+1] == ']')//no range, just the last char is '-'
+ if(pattern[(*chargroup_len)+1] == ']')//no range, just the last char is '-'
{
- chargroup->addCharRange(c1, c1);
- chargroup->addCharRange('-', '-');
- *chargroup_len += c1_len + 1;
+ //chargroup->addOneChar(c1);
+ //chargroup->addOneChar('-');
+ chargroup->addCharMatch(charmatch.release());
+ chargroup->addCharMatch(new CRegexXQuery_char_ascii(current_regex, '-'));
+ (*chargroup_len)++;
continue;
}
- else
+ else if(pattern[(*chargroup_len)+1] != '[')
{
//it is a range
- char c3;
- int c3_len;
- c3 = readChar(pattern+*chargroup_len+c1_len+1, &c3_len, &is_multichar);
- if(is_multichar)
- return NULL;//error
- chargroup->addCharRange(c1, c3);
- *chargroup_len += c1_len + 1 + c3_len;
+ (*chargroup_len)++;
+ std::unique_ptr<CRegexXQuery_charmatch> charmatch2;
+ CHARGROUP_t multichar_type2 = CHARGROUP_NO_MULTICHAR;
+ int c2_len;
+ charmatch2.reset(readChar(pattern+(*chargroup_len), &c2_len, &multichar_type2));
+ if((multichar_type2 != CHARGROUP_FLAGS_ONECHAR_ASCII) &&
+ (multichar_type2 != CHARGROUP_FLAGS_ONECHAR_ASCII))//second char in range is multichar
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_MULTICHAR_IN_CHAR_RANGE)) );
+ }
+ //chargroup->addCharRange(c1, c3);
+ if((multichar_type == CHARGROUP_FLAGS_ONECHAR_ASCII) && (multichar_type2 == CHARGROUP_FLAGS_ONECHAR_ASCII))
+ {
+ if(flags & REGEX_ASCII_CASE_INSENSITIVE)
+ chargroup->addCharMatch(new CRegexXQuery_char_range_ascii_i(current_regex,
+ (char)charmatch->get_c(),
+ (char)charmatch2->get_c()));
+ else
+ chargroup->addCharMatch(new CRegexXQuery_char_range_ascii(current_regex,
+ (char)charmatch->get_c(),
+ (char)charmatch2->get_c()));
+ }
+ else
+ {
+ if(flags & REGEX_ASCII_CASE_INSENSITIVE)
+ chargroup->addCharMatch(new CRegexXQuery_char_range_unicode_i(current_regex,
+ charmatch->get_c(),
+ charmatch2->get_c()));
+ else
+ chargroup->addCharMatch(new CRegexXQuery_char_range_unicode(current_regex,
+ charmatch->get_c(),
+ charmatch2->get_c()));
+ }
+ *chargroup_len += c2_len;
continue;
}
}
- chargroup->addCharRange(c1, c1);
- *chargroup_len += c1_len;
+ //chargroup->addOneChar(c1);
+ chargroup->addCharMatch(charmatch.release());
}
if(pattern[*chargroup_len])
(*chargroup_len)++;
- return chargroup;
+ else
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_MISSING_CLOSE_BRACKET)) );
+ }
+ return chargroup.release();
}
-void CRegexAscii_parser::read_quantifier(CRegexAscii_piece *piece,
+void CRegexXQuery_parser::read_quantifier(CRegexXQuery_piece *piece,
const char *pattern, int *quantif_len)
{
*quantif_len = 0;
@@ -496,6 +943,10 @@
max = max*10 + pattern[*quantif_len] - '0';
(*quantif_len)++;
}
+ if(max < min)
+ {
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(pattern, ZED(REGEX_MAX_LT_MIN)) );
+ }
piece->set_quantifier_min_max(min, max, true);
}
while(pattern[*quantif_len] && (pattern[*quantif_len] != '}'))
@@ -524,23 +975,25 @@
///Constructors and destructors and internal functions
////////////////////////////
-CRegexAscii_regex::CRegexAscii_regex(CRegexAscii_regex *topregex) : IRegexAtom(topregex?topregex:this)
+CRegexXQuery_regex::CRegexXQuery_regex(CRegexXQuery_regex *topregex) : IRegexAtom(topregex?topregex:this)
{
matched_source = NULL;
matched_len = 0;
+// backup_matched_source = NULL;
+// backup_matched_len = 0;
flags = 128;//set to 0 after initialization
}
-CRegexAscii_regex::~CRegexAscii_regex()
+CRegexXQuery_regex::~CRegexXQuery_regex()
{
- std::list<CRegexAscii_branch*>::iterator branch_it;
+ std::list<CRegexXQuery_branch*>::iterator branch_it;
for(branch_it = branch_list.begin(); branch_it != branch_list.end(); branch_it++)
{
delete (*branch_it);
}
/*
- std::vector<CRegexAscii_regex*>::iterator subregex_it;
+ std::vector<CRegexXQuery_regex*>::iterator subregex_it;
for(subregex_it = subregex.begin(); subregex_it != subregex.end(); subregex_it++)
{
delete (*subregex_it);
@@ -548,25 +1001,18 @@
*/
}
-bool CRegexAscii_regex::set_align_begin(bool align_begin)
-{
- bool prev_align = this->align_begin;
- this->align_begin = align_begin;
- return prev_align;
-}
-
-void CRegexAscii_regex::add_branch(CRegexAscii_branch *branch)
+void CRegexXQuery_regex::add_branch(CRegexXQuery_branch *branch)
{
branch_list.push_back(branch);
}
-bool CRegexAscii_regex::get_indexed_match(int index,
+bool CRegexXQuery_regex::get_indexed_match(int index,
const char **matched_source,
int *matched_len)
{
if(!index || index > (int)subregex.size())
return false;
- CRegexAscii_regex *subr = subregex[index-1];
+ CRegexXQuery_regex *subr = subregex[index-1];
*matched_source = subr->matched_source;
if(!*matched_source)
return false;
@@ -574,145 +1020,209 @@
return true;
}
-unsigned int CRegexAscii_regex::get_indexed_regex_count()
+unsigned int CRegexXQuery_regex::get_indexed_regex_count()
{
return subregex.size();
}
-CRegexAscii_branch::CRegexAscii_branch(CRegexAscii_regex* regex) :
- IRegexMatcher(regex)
+CRegexXQuery_branch::CRegexXQuery_branch(CRegexXQuery_regex* regex)
+ //:
+ //IRegexMatcher(regex)
{
}
-CRegexAscii_branch::~CRegexAscii_branch()
+CRegexXQuery_branch::~CRegexXQuery_branch()
{
- std::list<CRegexAscii_piece*>::iterator piece_it;
+ std::list<RegexAscii_pieceinfo>::iterator piece_it;
for(piece_it = piece_list.begin(); piece_it != piece_list.end(); piece_it++)
{
- delete (*piece_it);
+ delete (*piece_it).piece;
}
}
-void CRegexAscii_branch::add_piece(CRegexAscii_piece *piece)
+void CRegexXQuery_branch::add_piece(CRegexXQuery_piece *piece)
{
piece_list.push_back(piece);
}
-CRegexAscii_piece::CRegexAscii_piece()
+CRegexXQuery_piece::CRegexXQuery_piece()
{
+ atom = NULL;
+ regex_atom = NULL;
}
-CRegexAscii_piece::~CRegexAscii_piece()
+CRegexXQuery_piece::~CRegexXQuery_piece()
{
delete atom;
}
-void CRegexAscii_piece::set_atom(IRegexAtom *atom)
+void CRegexXQuery_piece::set_atom(IRegexAtom *atom)
{
this->atom = atom;
+ this->regex_atom = dynamic_cast<CRegexXQuery_regex*>(atom);
}
-void CRegexAscii_piece::set_quantifier_min_max(int min, int max, bool strict_max)
+void CRegexXQuery_piece::set_quantifier_min_max(int min, int max, bool strict_max)
{
this->min = min;
this->max = max;
this->strict_max = strict_max;
}
-void CRegexAscii_piece::set_is_reluctant(bool is_reluctant)
+void CRegexXQuery_piece::set_is_reluctant(bool is_reluctant)
{
this->is_reluctant = is_reluctant;
}
-void CRegexAscii_piece::get_quantifier(int *min, int *max, bool *strict_max)
+void CRegexXQuery_piece::get_quantifier(int *min, int *max, bool *strict_max)
{
*min = this->min;
*max = this->max;
*strict_max = this->strict_max;
}
-bool CRegexAscii_piece::get_is_reluctant()
+bool CRegexXQuery_piece::get_is_reluctant()
{
+ if(atom->regex_intern->flags & REGEX_ASCII_MINIMAL_MATCH)
+ return true;
return is_reluctant;
}
-CRegexAscii_chargroup::CRegexAscii_chargroup(CRegexAscii_regex* regex) :
+CRegexXQuery_charmatch::CRegexXQuery_charmatch(CRegexXQuery_regex* regex) :
+ IRegexAtom(regex)
+{
+}
+CRegexXQuery_multicharP::CRegexXQuery_multicharP(CRegexXQuery_regex* regex, char type, bool is_reverse) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->multichar_type = type; this->is_reverse = is_reverse;
+}
+CRegexXQuery_multicharIs::CRegexXQuery_multicharIs(CRegexXQuery_regex* regex, int block_index, bool is_reverse) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->block_index = block_index; this->is_reverse = is_reverse;
+}
+CRegexXQuery_multicharOther::CRegexXQuery_multicharOther(CRegexXQuery_regex* regex, char type) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->multichar_type = type;
+}
+CRegexXQuery_char_ascii::CRegexXQuery_char_ascii(CRegexXQuery_regex* regex, char c) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->c = c;
+}
+CRegexXQuery_char_ascii_i::CRegexXQuery_char_ascii_i(CRegexXQuery_regex* regex, char c) :
+ CRegexXQuery_char_ascii(regex, toupper(c))
+{
+}
+CRegexXQuery_char_range_ascii::CRegexXQuery_char_range_ascii(CRegexXQuery_regex* regex, char c1, char c2) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->c1 = c1; this->c2 = c2;
+}
+CRegexXQuery_char_range_ascii_i::CRegexXQuery_char_range_ascii_i(CRegexXQuery_regex* regex, char c1, char c2) :
+ CRegexXQuery_char_range_ascii(regex, toupper(c1), toupper(c2))
+{
+}
+CRegexXQuery_char_unicode::CRegexXQuery_char_unicode(CRegexXQuery_regex* regex, const char *source, int len) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->len = len;
+ memcpy(c, source, len);
+}
+CRegexXQuery_char_unicode_cp::CRegexXQuery_char_unicode_cp(CRegexXQuery_regex* regex, unicode::code_point c) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->c = c;
+}
+CRegexXQuery_char_unicode_i::CRegexXQuery_char_unicode_i(CRegexXQuery_regex* regex, unicode::code_point c) :
+ CRegexXQuery_char_unicode_cp(regex, unicode::to_upper(c))
+{
+}
+CRegexXQuery_char_range_unicode::CRegexXQuery_char_range_unicode(CRegexXQuery_regex* regex, unicode::code_point c1, unicode::code_point c2) :
+ CRegexXQuery_charmatch(regex)
+{
+ this->c1 = c1; this->c2 = c2;
+}
+CRegexXQuery_char_range_unicode_i::CRegexXQuery_char_range_unicode_i(CRegexXQuery_regex* regex, unicode::code_point c1, unicode::code_point c2) :
+ CRegexXQuery_char_range_unicode(regex, unicode::to_upper(c1), unicode::to_upper(c2))
+{
+}
+CRegexXQuery_endline::CRegexXQuery_endline(CRegexXQuery_regex* regex) :
+ CRegexXQuery_charmatch(regex)
+{
+}
+
+unicode::code_point CRegexXQuery_char_unicode::get_c()
+{
+ const char *temp_c = (const char*)c;
+ return utf8::next_char(temp_c);
+}
+
+
+CRegexXQuery_chargroup::CRegexXQuery_chargroup(CRegexXQuery_regex* regex) :
IRegexAtom(regex)
{
classsub = NULL;
}
-CRegexAscii_chargroup::~CRegexAscii_chargroup()
+CRegexXQuery_chargroup::~CRegexXQuery_chargroup()
{
delete classsub;
-}
-
-void CRegexAscii_chargroup::addMultiChar(char c)
-{
- chargroup_t cgt;
- cgt.flags = CHARGROUP_FLAGS_MULTICHAR;
- cgt.c1 = c;
- cgt.c2 = 0;
- chargroup_list.push_back(cgt);
-}
-
-void CRegexAscii_chargroup::addEndLine()
-{
- chargroup_t cgt;
- cgt.flags = CHARGROUP_FLAGS_ENDLINE;
- cgt.c1 = '$';
- cgt.c2 = 0;
- chargroup_list.push_back(cgt);
-}
-
-void CRegexAscii_chargroup::addCharRange(char c1, char c2)
-{
- chargroup_t cgt;
- cgt.flags = 0;
- cgt.c1 = c1;
- cgt.c2 = c2;
- chargroup_list.push_back(cgt);
-}
-
-void CRegexAscii_chargroup::addClassSub(CRegexAscii_chargroup* classsub)
+ std::list<CRegexXQuery_charmatch* >::iterator charmatch_it;
+ for(charmatch_it=chargroup_list.begin(); charmatch_it != chargroup_list.end(); charmatch_it++)
+ delete (*charmatch_it);
+}
+
+void CRegexXQuery_chargroup::addCharMatch(CRegexXQuery_charmatch *charmatch)
+{
+ chargroup_list.push_back(charmatch);
+}
+void CRegexXQuery_chargroup::addClassSub(CRegexXQuery_chargroup* classsub)
{
this->classsub = classsub;
}
-CRegexAscii_negchargroup::CRegexAscii_negchargroup(CRegexAscii_regex* regex) :
- CRegexAscii_chargroup(regex)
-{
-}
-
-CRegexAscii_negchargroup::~CRegexAscii_negchargroup()
-{
-}
-
-CRegexAscii_wildchar::CRegexAscii_wildchar(CRegexAscii_regex* regex) :
+CRegexXQuery_negchargroup::CRegexXQuery_negchargroup(CRegexXQuery_regex* regex) :
+ CRegexXQuery_chargroup(regex)
+{
+}
+
+CRegexXQuery_negchargroup::~CRegexXQuery_negchargroup()
+{
+}
+
+CRegexXQuery_wildchar::CRegexXQuery_wildchar(CRegexXQuery_regex* regex) :
IRegexAtom(regex)
{
}
-CRegexAscii_wildchar::~CRegexAscii_wildchar()
+CRegexXQuery_wildchar::~CRegexXQuery_wildchar()
{
}
-CRegexAscii_backref::CRegexAscii_backref(CRegexAscii_regex* regex, unsigned int backref_) :
+CRegexXQuery_backref::CRegexXQuery_backref(CRegexXQuery_regex* regex, unsigned int backref_) :
IRegexAtom(regex),
backref(backref_)
{
}
-CRegexAscii_backref::~CRegexAscii_backref()
-{
-}
-
-CRegexAscii_parser::CRegexAscii_parser()
+CRegexXQuery_backref::~CRegexXQuery_backref()
+{
+}
+
+CRegexXQuery_pinstart::CRegexXQuery_pinstart(CRegexXQuery_regex* regex):
+ IRegexAtom(regex)
+{
+}
+
+CRegexXQuery_parser::CRegexXQuery_parser()
{
current_regex = NULL;
regex_depth = 0;
}
-CRegexAscii_parser::~CRegexAscii_parser()
+CRegexXQuery_parser::~CRegexXQuery_parser()
{
}
@@ -720,9 +1230,68 @@
//////////////////////////////////////////
////Matching the pattern on a string
/////////////////////////////////////////
+static std::list<RegexAscii_pieceinfo> empty_pieces;//empty list of pieces
+/*
+std::list<RegexAscii_pieceinfo>::iterator
+IRegexAtom::choose_next_piece(const char *source, int *matched_len,
+ std::list<RegexAscii_pieceinfo>::iterator this_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece)
+{
+ //if this_piece is repetition, repeat until max, then go to next piece
+ int min, max;
+ bool strict_max;
+ while(this_piece != end_piece)
+ {
+ (*this_piece).piece->get_quantifier(&min, &max, &strict_max);
+ if(max <= ((*this_piece).nr_matches))//finished this piece
+ {
+ this_piece++;
+ }
+ else
+ break;
+ }
+ return this_piece;
+}
+*/
+
+bool IRegexAtom::match(const char *source, int *start_from_branch, int *matched_len,
+ std::list<RegexAscii_pieceinfo>::iterator this_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece)
+{
+ *start_from_branch = 0;
+ bool retmatch;
+ retmatch = match_internal(source, start_from_branch, matched_len);
+ if(!retmatch)
+ return false;
+
+ if(this_piece == end_piece)
+ return true;
+
+ (*this_piece).nr_matches++;
+ int min,max;
+ bool strict_max;
+ (*this_piece).piece->get_quantifier(&min, &max, &strict_max);
+ std::list<RegexAscii_pieceinfo>::iterator init_piece = this_piece;
+ if(((min == 1) && (max == 1)) || //the simple common case
+ ((*matched_len == 0) && ((*this_piece).nr_matches>=min)))//to avoid infinite loop
+ {
+ this_piece++;
+ if(this_piece == end_piece)
+ return true;
+ }
+ int matched_len2;
+ retmatch = (*this_piece).piece->match_piece(this_piece, end_piece, source + *matched_len, &matched_len2);
+ if(!retmatch)
+ {
+ (*init_piece).nr_matches--;
+ return false;
+ }
+ *matched_len += matched_len2;
+ return true;
+}
//try every position in source to match the pattern
-bool CRegexAscii_regex::match_anywhere(const char *source, unsigned int flags,
+bool CRegexXQuery_regex::match_anywhere(const char *source, unsigned int flags,
int *match_pos, int *matched_len)
{
*match_pos = 0;
@@ -730,43 +1299,66 @@
return match_from(source, flags, match_pos, matched_len);
}
-bool CRegexAscii_regex::match_from(const char *source, unsigned int flags,
+bool CRegexXQuery_regex::match_from(const char *source, unsigned int flags,
int *match_pos, int *matched_len)
{
this->flags = flags;
+ this->source_start = source;
reachedEnd = false;
- std::vector<CRegexAscii_regex*>::iterator regex_it;
+ std::vector<CRegexXQuery_regex*>::iterator regex_it;
for(regex_it = subregex.begin(); regex_it != subregex.end(); regex_it++)
{
(*regex_it)->matched_source = NULL;
}
-// if(!source[0])
-// {
-// if(branch_list.empty())
-// return true;
-// else
-// return false;
-// }
-
- bool skip_first_match = false;
- if(*match_pos && align_begin)
- skip_first_match = true;
+
+ std::vector<std::pair<const char*, int> > saved_subregex;
+
+ if(*match_pos && (flags & REGEX_ASCII_WHOLE_MATCH))
+ return false;
+
do
{
- if(!skip_first_match)
- {
- if(match(source + *match_pos, matched_len))
- return true;
- }
- skip_first_match = false;
- if(align_begin)
+ int start_from_branch = 0;
+ int longest_match = -1;
+ while(1)
+ {
+ if(!match(source + *match_pos, &start_from_branch, matched_len, empty_pieces.begin(), empty_pieces.end()))
+ break;
+ if(longest_match < *matched_len)
+ {
+ longest_match = *matched_len;
+ if(start_from_branch && (flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ save_subregex_list(saved_subregex);
+ }
+ if(!start_from_branch || !(flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ break;
+ //else try the other branches to see which is longer
+ }
+ if(longest_match != -1)
+ {
+ *matched_len = longest_match;
+ if(saved_subregex.size())
+ load_subregex_list(saved_subregex);
+ if(flags & REGEX_ASCII_WHOLE_MATCH)
+ {
+ if(!source[*match_pos+*matched_len])
+ return true;
+ if((flags & REGEX_ASCII_MULTILINE) &&
+ ((source[*match_pos+*matched_len] == '\n') || (source[*match_pos+*matched_len] == '\r')))
+ return true;
+ return false;
+ }
+ return true;
+ }
+
+ if(flags & REGEX_ASCII_WHOLE_MATCH)
{
if(flags & REGEX_ASCII_MULTILINE)
{
- //goto the next line
+ //go to next line
while(source[*match_pos] && (source[*match_pos] != '\n') && (source[*match_pos] != '\r'))
- (*match_pos)++;
+ (*match_pos) += myutf8len(source);
if(source[*match_pos] == '\n')
{
(*match_pos)++;
@@ -780,190 +1372,1039 @@
(*match_pos)++;
}
if(!source[*match_pos])
- return false;
+ break;
continue;
}
- return false;
+ break;
}
if(!source[*match_pos])
break;
- (*match_pos)++;
+ (*match_pos) += myutf8len(source);
}
while(source[*match_pos]);
+// if(!source[*match_pos])
+// {
+// reachedEnd = true;
+// }
return false;
}
+void CRegexXQuery_regex::reset_match()
+{
+// this->backup_matched_source = this->matched_source;
+// this->backup_matched_len = this->matched_len;
+ this->matched_source = NULL;
+ this->matched_len = 0;
+ std::list<CRegexXQuery_branch*>::iterator branch_it;
+ for(branch_it = branch_list.begin(); branch_it != branch_list.end(); branch_it++)
+ {
+ (*branch_it)->reset();
+ }
+}
+/*
+void CRegexXQuery_regex::restore_match()
+{
+ this->matched_source = this->backup_matched_source;
+ this->matched_len = this->backup_matched_len;
+ std::list<CRegexXQuery_branch*>::iterator branch_it;
+ for(branch_it = branch_list.begin(); branch_it != branch_list.end(); branch_it++)
+ {
+ (*branch_it)->restore();
+ }
+}
+*/
//match any of the branches
-bool CRegexAscii_regex::match(const char *source, int *matched_len)
+bool CRegexXQuery_regex::match(const char *source, int *start_from_branch, int *matched_len,
+ std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece)
{
reachedEnd = false;
- std::list<CRegexAscii_branch*>::iterator branch_it;
-
- for(branch_it = branch_list.begin(); branch_it != branch_list.end(); branch_it++)
- {
- if((*branch_it)->match(source, matched_len))
- {
- matched_source = source;
- this->matched_len = *matched_len;
+ if(!(flags & REGEX_ASCII_GROUPING_LEN_WHOLE_PIECE) ||
+ (this->matched_source == NULL) || ((this->matched_source + this->matched_len) != source))
+ this->matched_source = source;
+ *matched_len = 0;
+ std::list<CRegexXQuery_branch*>::iterator branch_it;
+
+ if(*start_from_branch == 0)
+ {
+ for(branch_it = branch_list.begin(); branch_it != branch_list.end(); branch_it++)
+ {
+ (*branch_it)->reset();
+ }
+ }
+
+ branch_it = branch_list.begin();
+ if(*start_from_branch)
+ {
+ for(int i=0;i<*start_from_branch;i++)
+ branch_it++;
+ }
+ (*start_from_branch)++;
+ for(; branch_it != branch_list.end(); branch_it++,(*start_from_branch)++)
+ {
+ if((*branch_it)->match(source, matched_len, this, next_piece, end_piece))
+ {
+ //matched_source = source;
+ //this->matched_len = *matched_len;
return true;
}
}
- matched_source = NULL;
- matched_len = 0;
+ *start_from_branch = 0;
+ if(this->matched_source == source)
+ this->matched_source = NULL;
+ *matched_len = 0;
return false;
}
+void CRegexXQuery_regex::save_subregex_list(std::vector<std::pair<const char*, int> > &saved_subregex)
+{
+ saved_subregex.resize(0);
+ saved_subregex.reserve(subregex.size());
+ std::vector<CRegexXQuery_regex*>::iterator it;
+ for(it=subregex.begin(); it != subregex.end(); it++)
+ {
+ saved_subregex.push_back(std::pair<const char*, int>((*it)->matched_source, (*it)->matched_len));
+ }
+}
+
+void CRegexXQuery_regex::load_subregex_list(std::vector<std::pair<const char*, int> > &saved_subregex)
+{
+ std::vector<std::pair<const char*, int> >::iterator it;
+ std::vector<CRegexXQuery_regex*>::iterator subit;
+ for(it=saved_subregex.begin(), subit = subregex.begin(); it != saved_subregex.end(); it++, subit++)
+ {
+ (*subit)->matched_source = (*it).first;
+ (*subit)->matched_len = (*it).second;
+ }
+}
+
+void CRegexXQuery_branch::reset()
+{
+ std::list<RegexAscii_pieceinfo>::iterator piece_it;
+ for(piece_it = piece_list.begin(); piece_it != piece_list.end(); piece_it++)
+ {
+ (*piece_it).piece->atom->reset_match();
+ }
+}
+/*
+void CRegexXQuery_branch::restore()
+{
+ std::list<RegexAscii_pieceinfo>::iterator piece_it;
+ for(piece_it = piece_list.begin(); piece_it != piece_list.end(); piece_it++)
+ {
+ (*piece_it).piece->atom->restore_match();
+ }
+}
+*/
//match all the pieces
-bool CRegexAscii_branch::match(const char *source, int *matched_len)
+bool CRegexXQuery_branch::match(const char *source, int *matched_len,
+ CRegexXQuery_regex* group_regex,
+ std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece)
{
- std::list<CRegexAscii_piece*>::iterator piece_it;
+ std::list<RegexAscii_pieceinfo>::iterator piece_it;
piece_it = piece_list.begin();
+ //if(piece_it == piece_list.end())
+ //if(!source[0])
+ // return true;
+ //else
+ // return false;
if(piece_it == piece_list.end())
- if(source[0])
- return false;
+ {
+ piece_it = next_piece;
+ if(next_piece == end_piece)
+ {
+ group_regex->matched_len = 0;
+ return true;
+ }
+ }
+
+ std::list<RegexAscii_pieceinfo> temp_pieces(piece_list);
+ temp_pieces.push_back(group_regex);//this will be used to store the group match
+ temp_pieces.insert(temp_pieces.end(), next_piece, end_piece);
+
+ return (*piece_it).piece->match_piece(temp_pieces.begin(), temp_pieces.end(), source, matched_len);
+}
+
+bool CRegexXQuery_piece::match_piece(std::list<RegexAscii_pieceinfo>::iterator piece_it,
+ std::list<RegexAscii_pieceinfo>::iterator end_it,
+ const char *source, int *matched_len)
+{
+ if((*piece_it).nr_matches < 0)
+ {
+ //special case, store the group match
+ (*piece_it).group_regex->matched_len = source - (*piece_it).group_regex->matched_source;
+ piece_it++;
+ if(piece_it == end_it)
+ return true;
else
- return true;
- if(!(*piece_it)->get_is_reluctant())
- return match_piece_iter_normal(piece_it, source, matched_len);
+ return (*piece_it).piece->match_piece(piece_it, end_it, source, matched_len);
+ }
+
+ if(!get_is_reluctant())
+ return match_piece_iter_normal(piece_it, end_it, source, matched_len);
else
- return match_piece_iter_reluctant(piece_it, source, matched_len);
-}
-
-//match as less as possible
-bool CRegexAscii_branch::match_piece_iter_reluctant(
- std::list<CRegexAscii_piece*>::iterator piece_it,
+ return match_piece_iter_reluctant(piece_it, end_it, source, matched_len);
+}
+
+int CRegexXQuery_piece::choose_another_branch(std::vector<std::pair<int,int> > &match_lens)
+{
+ int i = match_lens.size()-1;
+ i--;
+ while((i >= 0) && (match_lens.at(i).second == 0))
+ i--;
+ if(i < 0)
+ return -1;//no more branches
+ match_lens.resize(i+1);
+ i++;
+ return i;
+}
+
+bool CRegexXQuery_piece::is_regex_atom()
+{
+ return regex_atom != NULL;
+}
+
+//match as less as possible (shortest string)
+bool CRegexXQuery_piece::match_piece_iter_reluctant(
+ std::list<RegexAscii_pieceinfo>::iterator piece_it,
+ std::list<RegexAscii_pieceinfo>::iterator end_it,
const char *source, int *matched_len)
{
*matched_len = 0;
- if(piece_it == piece_list.end())
+ if(piece_it == end_it)
return true;
int min, max;
bool strict_max;
//std::vector<int> match_lens;
- (*piece_it)->get_quantifier(&min, &max, &strict_max);
- if(strict_max && (max >= 0))
+ (*piece_it).piece->get_quantifier(&min, &max, &strict_max);
+
+ std::vector<std::pair<const char*, int> > saved_subregex;
+
+ if(is_regex_atom())
{
- int timeslen;
- //check if the piece doesn't exceed the max match
- if((*piece_it)->match_piece_times(source, ×len, max+1, NULL))
- return false;///too many matches
+ //recursive
+ bool retmatch;
+ atom->regex_intern->save_subregex_list(saved_subregex);
+ if((*piece_it).nr_matches >= min)
+ {
+ //go to next piece
+ std::list<RegexAscii_pieceinfo>::iterator next_it = piece_it;
+ next_it++;
+ if(next_it == end_it)
+ return true;
+ retmatch = (*next_it).piece->match_piece(next_it, end_it, source, matched_len);
+ if(retmatch)
+ return true;
+ }
+ if(((max == -1) || ((*piece_it).nr_matches < max)) &&//try further with this piece
+ (((*piece_it).nr_matches < min) || ((*piece_it).nr_matches == 0) || ((*piece_it).piece->regex_atom->matched_len)))//if matched_len is zero, avoid infinite loop
+ {
+ int start_from_branch = 0;
+ int shortest_len = -1;
+ bool branch_saved = false;
+ //try all branches to get the shortest len
+ (*piece_it).nr_matches++;
+ while(atom->match(source, &start_from_branch, matched_len, piece_it, end_it))
+ {
+ if((shortest_len == -1) || (shortest_len > *matched_len))
+ {
+ shortest_len = *matched_len;
+ if(start_from_branch && (atom->regex_intern->flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ {
+ atom->regex_intern->save_subregex_list(saved_subregex);
+ branch_saved = true;
+ }
+ }
+ if(!start_from_branch || !(atom->regex_intern->flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ break;
+ }
+ if(shortest_len != -1)
+ {
+ *matched_len = shortest_len;
+ if(branch_saved)
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return true;
+ }
+ else
+ {
+ (*piece_it).nr_matches--;
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return false;
+ }
+ }
+ else
+ {
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return false;
+ }
}
- int i=min;
- std::list<CRegexAscii_piece*>::iterator next_it = piece_it;
+ int i=0;
+ int shortest_len = -1;
+ int otherpieces_shortest = -1;
+ int i_shortest = -1;
+ std::list<RegexAscii_pieceinfo>::iterator next_it = piece_it;
+ std::vector<std::pair<int,int> > match_lens;
next_it++;
int pieceslen = 0;
while(1)
{
- if((max > 0) && (i>max))
- break;
- int piecelen = 0;
- if((*piece_it)->match_piece_times(source+pieceslen, &piecelen, !pieceslen ? i : 1, NULL))
- {
- pieceslen += piecelen;
+ int piecelen = 0;
+ bool retmatch;
+ retmatch = match_piece_times(source, &piecelen, i < min ? min : i, &match_lens);
+ i = match_lens.size()-1;//number of matches
+ if(i<0)
+ i = 0;
+ if((i>=min))
+ {
+ pieceslen = piecelen;
+ if((shortest_len >= 0) && (shortest_len <= pieceslen))//this branch is longer
+ {//try another branch
+ i = choose_another_branch(match_lens);
+ if(i >= 0)
+ continue;//try another branch
+ else
+ break;
+ }
int otherpieces = 0;
- if((next_it == piece_list.end()) ||
- ((*next_it)->get_is_reluctant() && match_piece_iter_reluctant(next_it, source+pieceslen, &otherpieces)) ||
- (!(*next_it)->get_is_reluctant() && match_piece_iter_normal(next_it, source+pieceslen, &otherpieces)))
- {
- *matched_len = pieceslen + otherpieces;
- return true;
- }
+ if((next_it == end_it) ||
+ (*next_it).piece->match_piece(next_it, end_it, source+pieceslen, &otherpieces)
+ )
+ {
+ if((i == pieceslen) || (match_lens.at(0).second == 0) ||//minimum achieved already, cannot go lower than that
+ !(atom->regex_intern->flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ {
+ *matched_len = pieceslen + otherpieces;
+ return true;
+ }
+ if((shortest_len < 0) || (shortest_len > pieceslen))
+ {
+ shortest_len = pieceslen;
+ otherpieces_shortest = otherpieces;
+ i_shortest = i;
+ if(match_lens.at(0).second != 0)
+ atom->regex_intern->save_subregex_list(saved_subregex);
+ }
+ i = choose_another_branch(match_lens);
+ if(i >= 0)
+ continue;//try another branch
+ else
+ break;
+ }
+ else
+ {
+ //try further
+ if(retmatch)
+ {
+ i++;
+ if((max < 0) || (i<=max))
+ continue;
+ i--;
+ }
+ }
+ }
+
+ if(i==0)
+ {
+ break;
}
else
- break;
- i++;
+ {
+ i = choose_another_branch(match_lens);
+ if(i >= 0)
+ continue;//try another branch
+ else
+ break;
+ }
}
+ if(shortest_len >= 0)
+ {
+ if(strict_max && (max>=0) && (i_shortest > max))
+ return false;
+ *matched_len = shortest_len + otherpieces_shortest;
+ if(saved_subregex.size())
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return true;
+ }
return false;
}
//match as much as possible
-bool CRegexAscii_branch::match_piece_iter_normal(
- std::list<CRegexAscii_piece*>::iterator piece_it,
+bool CRegexXQuery_piece::match_piece_iter_normal(
+ std::list<RegexAscii_pieceinfo>::iterator piece_it,
+ std::list<RegexAscii_pieceinfo>::iterator end_it,
const char *source, int *matched_len)
{
*matched_len = 0;
int min, max;
bool strict_max;
- std::vector<int> match_lens;
- (*piece_it)->get_quantifier(&min, &max, &strict_max);
- int timeslen;
- if(strict_max && (max >= 0))
+ std::vector<std::pair<int,int> > match_lens;
+ (*piece_it).piece->get_quantifier(&min, &max, &strict_max);
+ int timeslen = 0;
+ std::vector<std::pair<const char*, int> > saved_subregex;
+
+ if(is_regex_atom())
{
- //check if the piece doesn't exceed the max match
- //if((*piece_it)->match_piece_times(source, ×len, max+1, &match_lens))
- // return false;///too many matches
- (*piece_it)->match_piece_times(source, ×len, max, &match_lens);
+ //recursive
+ bool retmatch;
+ atom->regex_intern->save_subregex_list(saved_subregex);
+ if(((max == -1) || ((*piece_it).nr_matches < max)) && //try further with this piece
+ (((*piece_it).nr_matches < min) || ((*piece_it).nr_matches == 0) || ((*piece_it).piece->regex_atom->matched_len)))//if matched_len is zero, avoid infinite loop
+ {
+ int start_from_branch = 0;
+ int longest_len = -1;
+ bool branch_saved = false;
+ //try all branches to get the longest len
+ (*piece_it).nr_matches++;
+ while(atom->match(source, &start_from_branch, matched_len, piece_it, end_it))
+ {
+ if((longest_len < *matched_len))
+ {
+ longest_len = *matched_len;
+ if(start_from_branch && (atom->regex_intern->flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ {
+ atom->regex_intern->save_subregex_list(saved_subregex);
+ branch_saved = true;
+ }
+ }
+ if(!start_from_branch || !(atom->regex_intern->flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ break;
+ }
+ if(longest_len != -1)
+ {
+ *matched_len = longest_len;
+ if(branch_saved)
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return true;
+ }
+ else
+ {
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ (*piece_it).nr_matches--;
+ }
+ }
+ if((*piece_it).nr_matches >= min)
+ {
+ //go to next piece
+ std::list<RegexAscii_pieceinfo>::iterator next_it = piece_it;
+ next_it++;
+ if(next_it == end_it)
+ return true;
+ retmatch = (*next_it).piece->match_piece(next_it, end_it, source, matched_len);
+ if(!retmatch)
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return retmatch;
+ }
+ else
+ {
+ // regex_atom->restore_match();
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return false;
+ }
}
- else if(!strict_max && (max >= 0))
- (*piece_it)->match_piece_times(source, ×len, max, &match_lens);
- else
- (*piece_it)->match_piece_times(source, ×len, -1, &match_lens);
- int i;
- std::list<CRegexAscii_piece*>::iterator next_it = piece_it;
+ int longest_len = -1;
+ int otherpieces_longest = -1;
+ int i_longest = -1;
+ int i = max;
+ std::list<RegexAscii_pieceinfo>::iterator next_it = piece_it;
next_it++;
- if(next_it == piece_list.end())
+
+ bool retmatch;
+ while(1)
{
- if((int)match_lens.size() > min)
- {
- *matched_len = timeslen;
- return true;
+ retmatch = match_piece_times(source, ×len, i, &match_lens);
+ i=match_lens.size()-1;//number of matches
+ if((i>=min))
+ {
+ if(timeslen < longest_len)
+ {//this branch is no use
+ i = choose_another_branch(match_lens);
+ if(i >= 0)
+ {
+ i = max;
+ continue;//try another branch
+ }
+ else
+ break;
+ }
+ //int piecelen = 0;
+ int otherpieces = 0;
+ if((next_it == end_it) ||
+ (*next_it).piece->match_piece(next_it, end_it, source+timeslen, &otherpieces)
+ )
+ {
+ if(timeslen > longest_len)
+ {
+ longest_len = timeslen;
+ otherpieces_longest = otherpieces;
+ i_longest = i;
+ if(!(atom->regex_intern->flags & REGEX_ASCII_GET_LONGEST_BRANCH))
+ {
+ *matched_len = longest_len + otherpieces_longest;
+ return true;
+ }
+ else
+ {
+ if(match_lens.at(0).second)
+ atom->regex_intern->save_subregex_list(saved_subregex);
+ }
+ }
+ }
+ else
+ {
+ if(!match_lens.at(0).second)
+ {
+ match_lens.resize(match_lens.size()-1);
+ i--;
+ if(i >= 0)
+ continue;//try smaller
+ else
+ break;
+ }
+ else
+ {
+ i = choose_another_branch(match_lens);
+ if(i >= 0)
+ continue;//try another branch
+ else
+ break;
+ }
+ }
+ }
+ //now try another branch
+ i = choose_another_branch(match_lens);
+ if(i >= 0)
+ {
+ i = max;
+ continue;//try another branch
}
else
- return false;
- }
- for(i=match_lens.size()-1; i>=min; i--)
+ break;
+ }//end while
+
+ if(longest_len >= 0)
{
- int piecelen = 0;
- int otherpieces = 0;
- if(((*next_it)->get_is_reluctant() && match_piece_iter_reluctant(next_it, source+match_lens[i]+piecelen, &otherpieces)) ||
- (!(*next_it)->get_is_reluctant() && match_piece_iter_normal(next_it, source+match_lens[i]+piecelen, &otherpieces)))
- {
- *matched_len = match_lens[i] + piecelen + otherpieces;
- return true;
- }
+ *matched_len = longest_len + otherpieces_longest;
+ if(saved_subregex.size())
+ atom->regex_intern->load_subregex_list(saved_subregex);
+ return true;
}
return false;
}
-bool CRegexAscii_piece::match_piece_times(const char *source,
+bool CRegexXQuery_piece::match_piece_times(const char *source,
int *piecelen,
int times,
- std::vector<int> *match_lens)
+ std::vector<std::pair<int,int> > *match_lens)
{
- *piecelen = 0;
- for(int i=0;(times < 0) || (i<times);i++)
- {
+ int i=0;
+ if(match_lens && match_lens->size())
+ {
+ i = match_lens->size()-1;
+ }
+ if(match_lens && match_lens->size())
+ *piecelen = match_lens->at(match_lens->size()-1).first;
+ else
+ *piecelen = 0;
+ if((times >= 0) && (i>=times))
+ return true;
+ for(;(times < 0) || (i<times);i++)
+ {
+ int atomlen;
+ int start_from_branch = 0;
+ if(match_lens && (i<(int)match_lens->size()))
+ start_from_branch = match_lens->at(i).second;
+ bool first_branch = (start_from_branch == 0);
+ if(!atom->match(source+*piecelen, &start_from_branch, &atomlen, empty_pieces.begin(), empty_pieces.end()))
+ {
+ if(match_lens)
+ {
+ if(i >= (int)match_lens->size())
+ match_lens->push_back(std::pair<int,int>(*piecelen, 0));
+ else
+ (*match_lens)[i] = std::pair<int,int>(*piecelen, 0);
+ }
+ return false;
+ }
if(match_lens)
- match_lens->push_back(*piecelen);
- int atomlen;
- if(!atom->match(source+*piecelen, &atomlen))
- return false;
+ {
+ if(i >= (int)match_lens->size())
+ match_lens->push_back(std::pair<int,int>(*piecelen, start_from_branch));
+ else
+ (*match_lens)[i] = std::pair<int,int>(*piecelen, start_from_branch);
+ }
*piecelen += atomlen;
if(!atomlen && !source[*piecelen])
{
- atom->regex_intern->reachedEnd = true;
+ // atom->regex_intern->set_reachedEnd(source);
+ break;
+ }
+ if(first_branch && (atomlen == 0))//avoid infinite loop
+ {
break;
}
}
if(match_lens)
- match_lens->push_back(*piecelen);
+ {
+ // if(i >= match_lens->size())
+ match_lens->push_back(std::pair<int,int>(*piecelen, 0));
+ // else
+ // (*match_lens)[i] = std::pair<int,int>(*piecelen, 0);
+ }
return true;
}
+bool CRegexXQuery_multicharP::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ bool found = false;
+ const char *temp_source = source;
+ unicode::code_point utf8c = utf8::next_char(temp_source);
+ switch(multichar_type)
+ {
+ case unicode::UNICODE_Ll + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Ll) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Lm) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Lo) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Lt) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Lu))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ case unicode::UNICODE_Mc + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Mn) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Mc) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Me))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ case unicode::UNICODE_Nd + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Nd) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Nl) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_No))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ case unicode::UNICODE_Pc + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pc) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pd) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Ps) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pe) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pi) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pf) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Po))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ case unicode::UNICODE_Zl + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Zs) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Zl) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Zp))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ case unicode::UNICODE_Sc + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Sm) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Sc) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Sk) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_So))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ case unicode::UNICODE_Cc + 50:
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Cc) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Cf) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Co))//ignore unicode::UNICODE_Cn
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ default:
+ if(unicode::check_codepoint_category(utf8c, (unicode::category)multichar_type))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ break;
+ }
+
+ if(found)
+ {
+ *matched_len = temp_source - source;
+ }
+ return found;
+}
+
+bool CRegexXQuery_multicharIs::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ bool found = false;
+ const char *temp_source = source;
+ unicode::code_point utf8c = utf8::next_char(temp_source);
+ const unicode::code_point *cp = block_escape[block_index].cp;
+ if((utf8c >= cp[0]) && (utf8c <= cp[1]))
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else if(block_escape[block_index].ext_cp)
+ {
+ cp = block_escape[block_index].ext_cp;
+ while(*cp)
+ {
+ if((utf8c >= cp[0]) && (utf8c <= cp[1]))
+ break;
+ cp += 2;
+ }
+ if(*cp)
+ {
+ if(!is_reverse)
+ found = true;
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ }
+ else
+ {
+ if(is_reverse)
+ found = true;
+ }
+ if(found)
+ {
+ *matched_len = temp_source - source;
+ }
+ return found;
+}
+
+bool CRegexXQuery_multicharOther::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ bool found = false;
+ bool value_true = true;
+ const char *temp_source = source;
+ unicode::code_point utf8c = utf8::next_char(temp_source);
+ switch(multichar_type)
+ {
+ case 'S':value_true = false;//[^\s]
+ case 's'://[#x20\t\n\r]
+ switch(utf8c)
+ {
+ case '\t':
+ case '\r':
+ case '\n':
+ case ' ':
+ found = true;
+ default:
+ break;
+ }
+ break;
+ case 'I':value_true = false;//[^\i]
+ case 'i'://the set of initial name characters, those matched by Letter | '_' | ':'
+ if((utf8c == '_') ||
+ (utf8c == ':') ||
+ XQCharType::isLetter(utf8c))
+ {
+ found = true;
+ }
+ break;
+ case 'C':value_true = false;//[^\c]
+ case 'c'://the set of name characters, those matched by NameChar
+ if(XQCharType::isNameChar(utf8c))
+ {
+ found = true;
+ }
+ break;
+ case 'D':value_true = false;//[^\d]
+ case 'd':
+ if(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Nd))
+ found = true;
+ break;
+ case 'W':value_true = false;//[^\w]
+ case 'w':
+ found = !(unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pc) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pd) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Ps) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pe) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pi) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Pf) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Po) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Zs) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Zl) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Zp) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Cc) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Cf) ||
+ unicode::check_codepoint_category(utf8c, unicode::UNICODE_Co));//ignore unicode::UNICODE_Cn
+ break;
+ default:
+ throw XQUERY_EXCEPTION( err::FORX0002, ERROR_PARAMS(source, ZED(REGEX_UNIMPLEMENTED)) );
+ }
+ if((found && value_true) || (!found && !value_true))
+ {
+ *matched_len = temp_source - source;
+ return true;
+ }
+ else
+ {
+ return false;
+ }
+}
+
+bool CRegexXQuery_char_ascii::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ if(source[0] == c)
+ {
+ *matched_len = 1;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_ascii_i::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ char sup = toupper(source[0]);
+ if(sup == c)
+ {
+ *matched_len = 1;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_range_ascii::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ if((source[0] >= c1) && (source[0] <= c2))
+ {
+ *matched_len = 1;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_range_ascii_i::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ char sup = toupper(source[0]);
+ if((sup >= c1) && (sup <= c2))
+ {
+ *matched_len = 1;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_unicode::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ if(!memcmp(source, c, len))
+ {
+ *matched_len = len;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_unicode_cp::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ const char *temp_source = source;
+ unicode::code_point utf8c = utf8::next_char(temp_source);
+ if(utf8c == c)
+ {
+ *matched_len = temp_source - source;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_unicode_i::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ const char *temp_source = source;
+ unicode::code_point sup = unicode::to_upper(utf8::next_char(temp_source));
+ if(sup == c)
+ {
+ *matched_len = temp_source - source;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_range_unicode::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ const char *temp_source = source;
+ unicode::code_point utf8c = utf8::next_char(temp_source);
+ if((utf8c >= c1) && (utf8c <= c2))
+ {
+ *matched_len = temp_source - source;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_char_range_unicode_i::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ const char *temp_source = source;
+ unicode::code_point sup = unicode::to_upper(utf8::next_char(temp_source));
+ if((sup >= c1) && (sup <= c2))
+ {
+ *matched_len = temp_source - source;
+ return true;
+ }
+ else
+ return false;
+}
+
+bool CRegexXQuery_endline::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ *matched_len = 0;
+ if(!source[0])
+ {
+ // regex_intern->reachedEnd = true;
+ return true;
+ }
+ if((source[0] == 0x0A) || ((source[0] == 0x0D) && (source[1] == 0x0A)))
+ {
+ if(regex_intern->get_flags() & REGEX_ASCII_MULTILINE)
+ {
+ // regex_intern->reachedEnd = true;
+ return true;
+ }
+ }
+ return false;
+}
+
+
//match any of chargroups
-bool CRegexAscii_chargroup::match(const char *source, int *matched_len)
+bool CRegexXQuery_chargroup::match_internal(const char *source, int *start_from_branch, int *matched_len)
{
*matched_len = 0;
- std::list<chargroup_t>::iterator cgt_it;
-
+ std::list<CRegexXQuery_charmatch* >::iterator cgt_it;
+/*
if(!source[0])
{
regex_intern->reachedEnd = true;
@@ -975,113 +2416,21 @@
return false;
}
- if(source[0] == 0x0A)
+ if((source[0] == 0x0A) || ((source[0] == 0x0D) && (source[1] == 0x0A)))
{
if((regex_intern->flags & REGEX_ASCII_MULTILINE) &&
(chargroup_list.size() == 1) && (chargroup_list.begin()->flags == CHARGROUP_FLAGS_ENDLINE))
{
- *matched_len = 1;
+ // *matched_len = 1;
return true;
}
}
-
+*/
+ //bool found = false;
for(cgt_it = chargroup_list.begin(); cgt_it != chargroup_list.end(); cgt_it++)
{
- if(cgt_it->flags == CHARGROUP_FLAGS_MULTICHAR)
- {
- switch(cgt_it->c1)
- {
- case 'p'://catEsc
- case 'P'://complEsc
- //ignore the prop for now
- throw XQUERY_EXCEPTION( err::FORX0002 );
- case 's'://[#x20\t\n\r]
- switch(source[0])
- {
- case '\t':
- case '\r':
- case '\n':
- case ' ':
- *matched_len = 1;
- return true;
- default:
- return false;
- }
- case 'S'://[^\s]
- switch(source[0])
- {
- case 0:
- regex_intern->reachedEnd = true;
- case '\t':
- case '\r':
- case '\n':
- case ' ':
- return false;
- default:
- *matched_len = 1;
- return true;
- }
- case 'i'://the set of initial name characters, those matched by Letter | '_' | ':'
- if((source[0] == '_') ||
- (source[0] == ':') ||
- XQCharType::isLetter(source[0]))
- {
- *matched_len = 1;
- return true;
- }
- return false;
- case 'I':
- if((source[0] == '_') ||
- (source[0] == ':') ||
- XQCharType::isLetter(source[0]))
- {
- return false;
- }
- *matched_len = 1;
- return true;
- case 'c'://the set of name characters, those matched by NameChar
- if(XQCharType::isNameChar(source[0]))
- {
- *matched_len = 1;
- return true;
- }
- return false;
- case 'C':
- if(XQCharType::isNameChar(source[0]))
- {
- return false;
- }
- *matched_len = 1;
- return true;
- case 'd':
- case 'D':
- case 'w':
- case 'W':
- default:
- throw XQUERY_EXCEPTION( err::FORX0002 );
- }
- return false;
- }
- else if(cgt_it->flags == CHARGROUP_FLAGS_ENDLINE)
- {
- return false;
- }
- else
- {
- if(regex_intern->flags & REGEX_ASCII_CASE_INSENSITIVE)
- {
- char sup = toupper(source[0]);
- if((sup >= toupper(cgt_it->c1)) &&
- (sup <= toupper(cgt_it->c2)))
- break;
- }
- else
- {
- if((source[0] >= cgt_it->c1) &&
- (source[0] <= cgt_it->c2))
- break;
- }
- }
+ if((*cgt_it)->match_internal(source, start_from_branch, matched_len))
+ break;
}
if(cgt_it == chargroup_list.end())
return false;
@@ -1089,53 +2438,48 @@
if(classsub)
{
int classsub_len;
- if(classsub->match(source, &classsub_len))
+ if(classsub->match_internal(source, NULL, &classsub_len))
return false;
}
- *matched_len = 1;
+ //*matched_len = 1;
return true;
}
-bool CRegexAscii_negchargroup::match(const char *source, int *matched_len)
+bool CRegexXQuery_negchargroup::match_internal(const char *source, int *start_from_branch, int *matched_len)
{
if(!source[0])
{
- regex_intern->reachedEnd = true;
+ regex_intern->set_reachedEnd(source);
return false;
}
- if(!CRegexAscii_chargroup::match(source, matched_len))
+ if(!CRegexXQuery_chargroup::match_internal(source, start_from_branch, matched_len))
{
- *matched_len = 1;
+ *matched_len = myutf8len(source);
return true;
}
return false;
}
-bool CRegexAscii_wildchar::match(const char *source, int *matched_len)
+bool CRegexXQuery_wildchar::match_internal(const char *source, int *start_from_branch, int *matched_len)
{
*matched_len = 0;
- if(source[0])
- {
- if((regex_intern->flags & REGEX_ASCII_DOTALL) ||
- (source[0] != '\n') && (source[0] != '\r'))
- {
- *matched_len = 1;
- return true;
- }
- else
- return false;
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
+ if((regex_intern->flags & REGEX_ASCII_DOTALL) ||
+ ((source[0] != '\n') && (source[0] != '\r')))
+ {
+ *matched_len = myutf8len(source);
+ return true;
}
else
- {
- if(!source[0])
- regex_intern->reachedEnd = true;
- *matched_len = 0;
return false;
- }
}
-bool CRegexAscii_backref::match(const char *source, int *matched_len)
+bool CRegexXQuery_backref::match_internal(const char *source, int *start_from_branch, int *matched_len)
{
const char *submatch = regex_intern->subregex.at(backref-1)->matched_source;
if(!submatch)
@@ -1143,15 +2487,42 @@
*matched_len = 0;
return true;
}
+ if(!source[0])
+ {
+ regex_intern->set_reachedEnd(source);
+ return false;
+ }
*matched_len = regex_intern->subregex.at(backref-1)->matched_len;
- if(!strncmp(source, submatch, *matched_len))
- {
- return true;
- }
- *matched_len = 0;
- return false;
-}
-
- }//end namespace regex_ascii
+ if(regex_intern->flags & REGEX_ASCII_CASE_INSENSITIVE)
+ {
+ if(compare_unicode_ni(source, submatch, *matched_len))
+ {
+ return true;
+ }
+ }
+ else
+ {
+ if(!memcmp(source, submatch, *matched_len))
+ {
+ return true;
+ }
+ }
+ *matched_len = 0;
+ return false;
+}
+
+bool CRegexXQuery_pinstart::match_internal(const char *source, int *start_from_branch, int *matched_len)
+{
+ *matched_len = 0;
+ if(source == regex_intern->source_start)
+ return true;
+ if((regex_intern->flags & REGEX_ASCII_MULTILINE) &&
+ ((source[-1] == '\n') || (source[-1] == '\r')))
+ return true;
+
+ return false;
+}
+
+ }//end namespace regex_xquery
}//end namespace zorba
/* vim:set et sw=2 ts=2: */
=== renamed file 'src/util/regex_ascii.h' => 'src/util/regex_xquery.h'
--- src/util/regex_ascii.h 2012-03-28 05:19:57 +0000
+++ src/util/regex_xquery.h 2012-04-11 15:45:21 +0000
@@ -19,103 +19,140 @@
#include <list>
#include <vector>
-
+#include <util/unicode_util.h>
namespace zorba {
- namespace regex_ascii{
+ namespace regex_xquery{
//matching flags
-#define REGEX_ASCII_CASE_INSENSITIVE 1
-#define REGEX_ASCII_DOTALL 2
-#define REGEX_ASCII_MULTILINE 4
-#define REGEX_ASCII_COMMENTS 8
-#define REGEX_ASCII_LITERAL 16
-
-class CRegexAscii_regex;
-
-class IRegexMatcher
+#define REGEX_ASCII_CASE_INSENSITIVE 1 //i
+#define REGEX_ASCII_DOTALL 2 //s
+#define REGEX_ASCII_MULTILINE 4 //m
+#define REGEX_ASCII_NO_WHITESPACE 8 //x
+#define REGEX_ASCII_LITERAL 16 //q
+
+#define REGEX_ASCII_GET_LONGEST_BRANCH 32 //try all branches and get the longest match (or shortest for reluctant pieces)
+#define REGEX_ASCII_MINIMAL_MATCH 64 //consider all pieces as reluctant
+#define REGEX_ASCII_WHOLE_MATCH 128 //match only all string, like having "^regex$"
+#define REGEX_ASCII_GROUPING_LEN_WHOLE_PIECE 256 //compute the len of a grouping as for the whole piece ( for example (a)+ when matching "aa" and referred as $1 will get string len 2 instead of last 1)
+
+class CRegexXQuery_regex;
+class CRegexXQuery_piece;
+
+struct RegexAscii_pieceinfo
{
-public:
- CRegexAscii_regex *regex_intern;
-public:
- IRegexMatcher(CRegexAscii_regex* regex) : regex_intern(regex) {}
- virtual ~IRegexMatcher() {}
+ union
+ {
+ CRegexXQuery_piece* piece;
+ CRegexXQuery_regex* group_regex;
+ };
+ int nr_matches;
- virtual bool match(const char *source, int *matched_len) = 0;
+ RegexAscii_pieceinfo(CRegexXQuery_piece* piece) {nr_matches=0;this->piece=piece;}
+ RegexAscii_pieceinfo(CRegexXQuery_regex* group_regex) {nr_matches=-1;this->group_regex=group_regex;}
};
-class IRegexAtom : public IRegexMatcher
+
+class IRegexAtom
{
+protected:
+ friend class CRegexXQuery_piece;
+ CRegexXQuery_regex *regex_intern;
public:
- IRegexAtom(CRegexAscii_regex* regex) : IRegexMatcher(regex) {}
+ IRegexAtom(CRegexXQuery_regex* regex) : regex_intern(regex) {}
virtual ~IRegexAtom() {}
+
+ virtual bool match(const char *source, int *start_from_branch, int *matched_len,
+ std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece);
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len) = 0;
+ virtual void reset_match() {}
+// virtual void restore_match() {}
};
-class CRegexAscii_branch;
-class CRegexAscii_piece;
-class CRegexAscii_chargroup;
-class CRegexAscii_parser;
+class CRegexXQuery_branch;
+class CRegexXQuery_piece;
+class CRegexXQuery_chargroup;
+class CRegexXQuery_parser;
-class CRegexAscii_regex : public IRegexAtom
+class CRegexXQuery_regex : public IRegexAtom
{
- friend class CRegexAscii_parser;
- friend class CRegexAscii_branch;
- friend class CRegexAscii_piece;
- friend class CRegexAscii_chargroup;
- friend class CRegexAscii_negchargroup;
- friend class CRegexAscii_wildchar;
- friend class CRegexAscii_backref;
+ friend class CRegexXQuery_parser;
+ friend class CRegexXQuery_branch;
+ friend class CRegexXQuery_piece;
+ friend class CRegexXQuery_chargroup;
+ friend class CRegexXQuery_negchargroup;
+ friend class CRegexXQuery_wildchar;
+ friend class CRegexXQuery_backref;
+ friend class CRegexXQuery_pinstart;
public:
- CRegexAscii_regex(CRegexAscii_regex *);
- virtual ~CRegexAscii_regex();
+ CRegexXQuery_regex(CRegexXQuery_regex *);
+ virtual ~CRegexXQuery_regex();
bool match_anywhere(const char *source, unsigned int flags, int *match_pos, int *matched_len);
bool match_from(const char *source, unsigned int flags, int *match_pos, int *matched_len);
- virtual bool match(const char *source, int *matched_len);
//for replace $1, $2 ...
bool get_indexed_match(int index, const char **matched_source, int *matched_len);
unsigned int get_indexed_regex_count();
bool get_reachedEnd() {return reachedEnd;}
- bool set_align_begin(bool align_begin);
+ void set_reachedEnd(const char *source) {if(source > source_start) reachedEnd = true;}
+ unsigned int get_flags() {return flags;}
+public:
+ virtual bool match(const char *source, int *start_from_branch, int *matched_len,
+ std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece);
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len) {return false;}//not impl
+ virtual void reset_match();
+// virtual void restore_match();
private:
- void add_branch(CRegexAscii_branch *branch);
+ void add_branch(CRegexXQuery_branch *branch);
+
+ void save_subregex_list(std::vector<std::pair<const char*, int> > &saved_subregex);
+ void load_subregex_list(std::vector<std::pair<const char*, int> > &saved_subregex);
private:
unsigned int flags;
- std::list<CRegexAscii_branch*> branch_list;
- bool align_begin;
+ std::list<CRegexXQuery_branch*> branch_list;
+
+ const char *source_start;
const char *matched_source;
int matched_len;
- std::vector<CRegexAscii_regex*> subregex;//for grouping
+// const unicode::code_point *backup_matched_source;
+// int backup_matched_len;
+ std::vector<CRegexXQuery_regex*> subregex;//for grouping
bool reachedEnd;
};
-class CRegexAscii_branch : public IRegexMatcher
+class CRegexXQuery_branch
{
- friend class CRegexAscii_parser;
+ friend class CRegexXQuery_parser;
public:
- CRegexAscii_branch(CRegexAscii_regex* regex);
- ~CRegexAscii_branch();
+ CRegexXQuery_branch(CRegexXQuery_regex* regex);
+ ~CRegexXQuery_branch();
- virtual bool match(const char *source, int *matched_len);
-private:
- std::list<CRegexAscii_piece*> piece_list;
-private:
- void add_piece(CRegexAscii_piece *piece);
+ bool match(const char *source, int *matched_len,
+ CRegexXQuery_regex* group_regex,
+ std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece);
+ void reset();
+// void restore();
+private:
+ std::list<RegexAscii_pieceinfo> piece_list;
+private:
+ void add_piece(CRegexXQuery_piece *piece);
- bool match_piece_iter_reluctant(std::list<CRegexAscii_piece*>::iterator piece_it,
- const char *source, int *matched_len);
- bool match_piece_iter_normal(std::list<CRegexAscii_piece*>::iterator piece_it,
- const char *source, int *matched_len);
};
-class CRegexAscii_piece //: public IRegexMatcher
+class CRegexXQuery_piece //: public IRegexMatcher
{
- friend class CRegexAscii_parser;
-public:
+ friend class CRegexXQuery_parser;
+ friend class CRegexXQuery_branch;
+
IRegexAtom *atom;
+ CRegexXQuery_regex *regex_atom;
+
//quantifier
bool strict_max;
int min;
@@ -123,8 +160,8 @@
bool is_reluctant;
public:
- CRegexAscii_piece();
- ~CRegexAscii_piece();
+ CRegexXQuery_piece();
+ ~CRegexXQuery_piece();
public:
void set_atom(IRegexAtom *atom);
void set_quantifier_min_max(int min, int max, bool strict_max);
@@ -132,95 +169,294 @@
void get_quantifier(int *min, int *max, bool *strict_max);
bool get_is_reluctant();
// bool match(const char *source, int *matched_len);
+ bool match_piece(std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece,
+ const char *source, int *matched_len);
+protected:
bool match_piece_times(const char *source,
int *piecelen,
int times,
- std::vector<int> *match_lens);
-};
-
-#define CHARGROUP_FLAGS_MULTICHAR 1
-#define CHARGROUP_FLAGS_ENDLINE 2
-
-class CRegexAscii_chargroup : public IRegexAtom
-{
- friend class CRegexAscii_parser;
-public:
- CRegexAscii_chargroup(CRegexAscii_regex* regex);
- virtual ~CRegexAscii_chargroup();
+ std::vector<std::pair<int,int> > *match_lens);
+ int choose_another_branch(std::vector<std::pair<int,int> > &match_lens);
+ bool match_piece_iter_reluctant(std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece,
+ const char *source, int *matched_len);
+ bool match_piece_iter_normal(std::list<RegexAscii_pieceinfo>::iterator next_piece,
+ std::list<RegexAscii_pieceinfo>::iterator end_piece,
+ const char *source, int *matched_len);
+ bool is_regex_atom();
+};
+
+
+enum CHARGROUP_t
+{
+CHARGROUP_NO_MULTICHAR = 0,
+//CHARGROUP_FLAGS_CHAR_RANGE,
+CHARGROUP_FLAGS_MULTICHAR_p,
+CHARGROUP_FLAGS_MULTICHAR_Is,
+CHARGROUP_FLAGS_MULTICHAR_OTHER,
+CHARGROUP_FLAGS_ONECHAR_ASCII,
+CHARGROUP_FLAGS_ONECHAR_UNICODE
+//CHARGROUP_FLAGS_ENDLINE
+};
+
+
+class CRegexXQuery_charmatch : public IRegexAtom
+{
+ friend class CRegexXQuery_parser;
+protected:
+ //enum CHARGROUP_t type;
+public:
+ CRegexXQuery_charmatch(CRegexXQuery_regex* regex);//, enum CHARGROUP_t type);
+ virtual ~CRegexXQuery_charmatch() {}
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len) = 0;
+ virtual unicode::code_point get_c() {return 0;}
+};
+
+class CRegexXQuery_multicharP : public CRegexXQuery_charmatch
+{
+ char multichar_type;
+ bool is_reverse;
+public:
+ CRegexXQuery_multicharP(CRegexXQuery_regex* regex, char type, bool is_reverse);
+ virtual ~CRegexXQuery_multicharP() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_multicharIs : public CRegexXQuery_charmatch
+{
+ int block_index;
+ bool is_reverse;
+public:
+ CRegexXQuery_multicharIs(CRegexXQuery_regex* regex, int block_index, bool is_reverse);
+ virtual ~CRegexXQuery_multicharIs() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_multicharOther : public CRegexXQuery_charmatch
+{
+ char multichar_type;
+public:
+ CRegexXQuery_multicharOther(CRegexXQuery_regex* regex, char type);
+ virtual ~CRegexXQuery_multicharOther() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_char_ascii : public CRegexXQuery_charmatch
+{
+ friend class CRegexXQuery_parser;
+protected:
+ char c;
+public:
+ CRegexXQuery_char_ascii(CRegexXQuery_regex* regex, char c);
+ virtual ~CRegexXQuery_char_ascii() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+ virtual unicode::code_point get_c() {return c;}
+};
+
+class CRegexXQuery_char_ascii_i : public CRegexXQuery_char_ascii
+{
+public:
+ CRegexXQuery_char_ascii_i(CRegexXQuery_regex* regex, char c);
+ virtual ~CRegexXQuery_char_ascii_i() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+ virtual unicode::code_point get_c() {return c;}
+};
+
+class CRegexXQuery_char_range_ascii : public CRegexXQuery_charmatch
+{
+protected:
+ char c1;
+ char c2;
+public:
+ CRegexXQuery_char_range_ascii(CRegexXQuery_regex* regex, char c1, char c2);
+ virtual ~CRegexXQuery_char_range_ascii() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_char_range_ascii_i : public CRegexXQuery_char_range_ascii
+{
+public:
+ CRegexXQuery_char_range_ascii_i(CRegexXQuery_regex* regex, char c1, char c2);
+ virtual ~CRegexXQuery_char_range_ascii_i() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_char_unicode : public CRegexXQuery_charmatch
+{
+ unsigned char c[6];
+ int len;
+public:
+ CRegexXQuery_char_unicode(CRegexXQuery_regex* regex, const char *c, int len);
+ virtual ~CRegexXQuery_char_unicode() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+ virtual unicode::code_point get_c();
+};
+
+class CRegexXQuery_char_unicode_cp : public CRegexXQuery_charmatch
+{
+protected:
+ unicode::code_point c;
+public:
+ CRegexXQuery_char_unicode_cp(CRegexXQuery_regex* regex, unicode::code_point c);
+ virtual ~CRegexXQuery_char_unicode_cp() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+ virtual unicode::code_point get_c() {return c;}
+};
+
+class CRegexXQuery_char_unicode_i : public CRegexXQuery_char_unicode_cp
+{
+public:
+ CRegexXQuery_char_unicode_i(CRegexXQuery_regex* regex, unicode::code_point c);
+ virtual ~CRegexXQuery_char_unicode_i() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+ virtual unicode::code_point get_c() {return c;}
+};
+
+class CRegexXQuery_char_range_unicode : public CRegexXQuery_charmatch
+{
+protected:
+ unicode::code_point c1;
+ unicode::code_point c2;
+public:
+ CRegexXQuery_char_range_unicode(CRegexXQuery_regex* regex, unicode::code_point c1, unicode::code_point c2);
+ virtual ~CRegexXQuery_char_range_unicode() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_char_range_unicode_i : public CRegexXQuery_char_range_unicode
+{
+public:
+ CRegexXQuery_char_range_unicode_i(CRegexXQuery_regex* regex, unicode::code_point c1, unicode::code_point c2);
+ virtual ~CRegexXQuery_char_range_unicode_i() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_endline : public CRegexXQuery_charmatch
+{
+public:
+ CRegexXQuery_endline(CRegexXQuery_regex* regex);
+ virtual ~CRegexXQuery_endline() {}
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+
+class CRegexXQuery_chargroup : public IRegexAtom
+{
+ friend class CRegexXQuery_parser;
+public:
+ CRegexXQuery_chargroup(CRegexXQuery_regex* regex);
+ virtual ~CRegexXQuery_chargroup();
private:
- typedef struct
+/* typedef struct
{
- unsigned char flags;
+ CHARGROUP_t flags;
char c1;
char c2;
}chargroup_t;
- std::list<chargroup_t> chargroup_list;
- CRegexAscii_chargroup *classsub;
-public:
- void addMultiChar(char c);
- void addEndLine();
- void addCharRange(char c1, char c2);
- void addClassSub(CRegexAscii_chargroup* classsub);
-
- virtual bool match(const char *source, int *matched_len);
-};
-
-class CRegexAscii_negchargroup : public CRegexAscii_chargroup
-{
-public:
- CRegexAscii_negchargroup(CRegexAscii_regex* regex);
- virtual ~CRegexAscii_negchargroup();
-
- virtual bool match(const char *source, int *matched_len);
-};
-
-class CRegexAscii_wildchar : public IRegexAtom
-{
-public:
- CRegexAscii_wildchar(CRegexAscii_regex* regex);
- virtual ~CRegexAscii_wildchar();
-
- virtual bool match(const char *source, int *matched_len);
-};
-
-class CRegexAscii_backref : public IRegexAtom
-{
-public:
- CRegexAscii_backref(CRegexAscii_regex* regex, unsigned int backref);
- virtual ~CRegexAscii_backref();
-
- virtual bool match(const char *source, int *matched_len);
+*/
+ std::list<CRegexXQuery_charmatch* > chargroup_list;
+ CRegexXQuery_chargroup *classsub;
+public:
+ //void addMultiChar(char c, CHARGROUP_t multichar_type);
+ //void addEndLine();
+ //void addCharRange(char c1, char c2);
+ //void addOneChar(char c);
+ void addCharMatch(CRegexXQuery_charmatch *charmatch);
+ void addClassSub(CRegexXQuery_chargroup* classsub);
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_negchargroup : public CRegexXQuery_chargroup
+{
+public:
+ CRegexXQuery_negchargroup(CRegexXQuery_regex* regex);
+ virtual ~CRegexXQuery_negchargroup();
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_wildchar : public IRegexAtom
+{
+public:
+ CRegexXQuery_wildchar(CRegexXQuery_regex* regex);
+ virtual ~CRegexXQuery_wildchar();
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_backref : public IRegexAtom
+{
+public:
+ CRegexXQuery_backref(CRegexXQuery_regex* regex, unsigned int backref);
+ virtual ~CRegexXQuery_backref();
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
private:
unsigned int backref;
};
-class CRegexAscii_parser
-{
-public:
- CRegexAscii_parser();
- ~CRegexAscii_parser();
-
-public:
- CRegexAscii_regex* parse(const char *pattern, unsigned int flags);
+class CRegexXQuery_pinstart : public IRegexAtom
+{
+public:
+ CRegexXQuery_pinstart(CRegexXQuery_regex* regex);
+
+ virtual bool match_internal(const char *source, int *start_from_branch, int *matched_len);
+};
+
+class CRegexXQuery_parser
+{
+public:
+ typedef struct
+ {
+ const unicode::code_point cp[2];//in pairs start, end
+ const unicode::code_point *ext_cp;
+ const char *group_name;
+ }block_escape_t;
+
+ CRegexXQuery_parser();
+ ~CRegexXQuery_parser();
+
+public:
+ CRegexXQuery_regex* parse(const char *pattern, unsigned int flags);
protected:
- CRegexAscii_regex* parse_regexp(const char *pattern, int *regex_len);
- CRegexAscii_branch* parse_branch(const char *pattern, int *branch_len);
- CRegexAscii_piece* parse_piece(const char *pattern, int *piece_len);
+ CRegexXQuery_regex* parse_regexp(const char *pattern, int *regex_len);
+ CRegexXQuery_branch* parse_branch(const char *pattern, int *branch_len);
+ CRegexXQuery_piece* parse_piece(const char *pattern, int *piece_len);
char myishex(char c);
bool myisdigit(char c);
- char readChar(const char *pattern, int *char_len, bool *is_multichar);
+ bool myisletterAZ(char c);
+ CRegexXQuery_charmatch* readChar(const char *pattern, int *char_len, CHARGROUP_t *multichar_type);
+ CRegexXQuery_charmatch *create_charmatch(unicode::code_point utf8c,
+ const char *pattern, int utf8len,
+ enum CHARGROUP_t *multichar_type);
IRegexAtom* read_atom(const char *pattern, int *atom_len);
- CRegexAscii_chargroup* readchargroup(const char *pattern, int *chargroup_len);
- void read_quantifier(CRegexAscii_piece *piece, const char *pattern, int *quantif_len);
+ CRegexXQuery_chargroup* readchargroup(const char *pattern, int *chargroup_len);
+ void read_quantifier(CRegexXQuery_piece *piece, const char *pattern, int *quantif_len);
private:
- CRegexAscii_regex *current_regex;
+ CRegexXQuery_regex *current_regex;
int regex_depth;
unsigned int flags;
};
-}}//end namespace zorba::regex_ascii
+}
+}//end namespace zorba::regex_xquery
#endif
/* vim:set et sw=2 ts=2: */
=== modified file 'src/util/transcode_streambuf.h'
--- src/util/transcode_streambuf.h 2012-02-04 01:26:18 +0000
+++ src/util/transcode_streambuf.h 2012-04-11 15:45:21 +0000
@@ -21,21 +21,21 @@
///////////////////////////////////////////////////////////////////////////////
-#ifdef ZORBA_NO_UNICODE
+#ifdef ZORBA_NO_ICU
# include "passthru_streambuf.h"
#else
# include "icu_streambuf.h"
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
namespace zorba {
namespace internal {
namespace transcode {
-#ifdef ZORBA_NO_UNICODE
-typedef passthru_streambuf streambuf;
+#ifdef ZORBA_NO_ICU
+typedef zorba::passthru_streambuf streambuf;
#else
typedef icu_streambuf streambuf;
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
} // namespace transcode
} // namespace internal
=== modified file 'src/util/unicode_categories.cpp'
--- src/util/unicode_categories.cpp 2012-03-28 05:19:57 +0000
+++ src/util/unicode_categories.cpp 2012-04-11 15:45:21 +0000
@@ -65812,7 +65812,7 @@
{ 0x100000, 0x100000, UNICODE_Co},
};
-bool check_codepoint_category(code_point cp, UnicodeCategoriesEnum categ)
+bool check_codepoint_category(code_point cp, category categ)
{
if(cp < 0x10000)
return codepoints_categories[cp] == categ;
@@ -65824,10 +65824,10 @@
if(cp >= codepoints_categories2[i].cp1)
return codepoints_categories2[i].category == categ;
else
- return false;
+ return categ ? false : true;
}
}
- return false;
+ return categ ? false : true;
}
/*
=== modified file 'src/util/unicode_categories.h'
--- src/util/unicode_categories.h 2012-03-28 05:19:57 +0000
+++ src/util/unicode_categories.h 2012-04-11 15:45:21 +0000
@@ -22,46 +22,53 @@
namespace zorba {
namespace unicode {
-//Unicode codepoint categories, as from http://www.fileformat.info/info/unicode/category/index.htm
+///////////////////////////////////////////////////////////////////////////////
-enum UnicodeCategoriesEnum {
-UNICODE_Cc, //Other, Control
-UNICODE_Cf, //Other, Format
-UNICODE_Co, //Other, Private Use
-UNICODE_Cs, //Other, Surrogate
-UNICODE_Ll, //Letter, Lowercase
-UNICODE_Lm, //Letter, Modifier
-UNICODE_Lo, //Letter, Other
-UNICODE_Lt, //Letter, Titlecase
-UNICODE_Lu, //Letter, Uppercase
-UNICODE_Mc, //Mark, Spacing Combining
-UNICODE_Me, //Mark, Enclosing
-UNICODE_Mn, //Mark, Nonspacing
-UNICODE_Nd, //Number, Decimal Digit
-UNICODE_Nl, //Number, Letter
-UNICODE_No, //Number, Other
-UNICODE_Pc, //Punctuation, Connector
-UNICODE_Pd, //Punctuation, Dash
-UNICODE_Pe, //Punctuation, Close
-UNICODE_Pf, //Punctuation, Final quote (may behave like Ps or Pe depending on usage)
-UNICODE_Pi, //Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
-UNICODE_Po, //Punctuation, Other
-UNICODE_Ps, //Punctuation, Open
-UNICODE_Sc, //Symbol, Currency
-UNICODE_Sk, //Symbol, Modifier
-UNICODE_Sm, //Symbol, Math
-UNICODE_So, //Symbol, Other
-UNICODE_Zl, //Separator, Line
-UNICODE_Zp, //Separator, Paragraph
-UNICODE_Zs //Separator, Space
+/**
+ * Unicode codepoint categories.
+ * See: http://www.fileformat.info/info/unicode/category/
+ */
+enum category {
+ UNICODE_Cn, // Not Assigned
+ UNICODE_Cc, // Other, Control
+ UNICODE_Cf, // Other, Format
+ UNICODE_Co, // Other, Private Use
+ UNICODE_Cs, // Other, Surrogate
+ UNICODE_Ll, // Letter, Lowercase
+ UNICODE_Lm, // Letter, Modifier
+ UNICODE_Lo, // Letter, Other
+ UNICODE_Lt, // Letter, Titlecase
+ UNICODE_Lu, // Letter, Uppercase
+ UNICODE_Mc, // Mark, Spacing Combining
+ UNICODE_Me, // Mark, Enclosing
+ UNICODE_Mn, // Mark, Nonspacing
+ UNICODE_Nd, // Number, Decimal Digit
+ UNICODE_Nl, // Number, Letter
+ UNICODE_No, // Number, Other
+ UNICODE_Pc, // Punctuation, Connector
+ UNICODE_Pd, // Punctuation, Dash
+ UNICODE_Pe, // Punctuation, Close
+ UNICODE_Pf, // Punctuation, Final quote (like Ps or Pe depending on usage)
+ UNICODE_Pi, // Punctuation, Initial quote (like Ps or Pe depending on usage)
+ UNICODE_Po, // Punctuation, Other
+ UNICODE_Ps, // Punctuation, Open
+ UNICODE_Sc, // Symbol, Currency
+ UNICODE_Sk, // Symbol, Modifier
+ UNICODE_Sm, // Symbol, Math
+ UNICODE_So, // Symbol, Other
+ UNICODE_Zl, // Separator, Line
+ UNICODE_Zp, // Separator, Paragraph
+ UNICODE_Zs // Separator, Space
};
bool is_UnicodeNd(code_point cp, code_point *ret_zero);
-bool check_codepoint_category(code_point cp, UnicodeCategoriesEnum categ);
-
-}
-}
-
-#endif
+bool check_codepoint_category(code_point cp, category categ);
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace unicode
+} // namespaec zorba
+
+#endif /* ZORBA_UNICODE_CATEGORIES */
/* vim:set et sw=2 ts=2: */
=== modified file 'src/util/unicode_util.cpp'
--- src/util/unicode_util.cpp 2012-03-28 05:19:57 +0000
+++ src/util/unicode_util.cpp 2012-04-11 15:45:21 +0000
@@ -22,15 +22,19 @@
#include <functional> /* for binary_function */
#include <utility> /* for pair */
-#include <unicode/normlzr.h>
-#include <unicode/ustring.h>
+#ifndef ZORBA_NO_ICU
+# include <unicode/normlzr.h>
+# include <unicode/ustring.h>
+#endif /* ZORBA_NO_ICU */
#include "cxx_util.h"
#include "unicode_util.h"
#include "utf8_util.h"
using namespace std;
+#ifndef ZORBA_NO_ICU
U_NAMESPACE_USE
+#endif /* ZORBA_NO_ICU */
namespace zorba {
namespace unicode {
@@ -2208,6 +2212,8 @@
return to_case<upper>( c );
}
+#ifndef ZORBA_NO_ICU
+
bool normalize( string const &in, normalization::type n, string *out ) {
UErrorCode status = U_ZERO_ERROR;
UNormalizationMode icu_mode;
@@ -2230,8 +2236,11 @@
return U_SUCCESS( status ) == TRUE;
}
+#endif /* ZORBA_NO_ICU */
+
bool to_string( char const *in, size_type in_len, char_type **out,
size_type *out_len ) {
+#ifndef ZORBA_NO_ICU
size_type utf16_len;
UErrorCode status = U_ZERO_ERROR;
u_strFromUTF8WithSub( // pre-flight to get utf16_len
@@ -2250,9 +2259,16 @@
}
*out = utf16_buf;
*out_len = utf16_len;
+#else
+ *out = new char_type[ in_len + 1 ];
+ *out_len = in_len;
+ ::strncpy( *out, in, *out_len );
+#endif /* ZORBA_NO_ICU */
return true;
}
+#ifndef ZORBA_NO_ICU
+
bool to_string( char const *in, size_type in_len, string *out ) {
char_type *const buf = out->getBuffer( in_len + 1 );
size_type buf_len;
@@ -2271,6 +2287,8 @@
return U_SUCCESS( status ) == TRUE;
}
+#endif /* ZORBA_NO_ICU */
+
///////////////////////////////////////////////////////////////////////////////
} // namespace unicode
=== modified file 'src/util/unicode_util.h'
--- src/util/unicode_util.h 2012-03-28 05:19:57 +0000
+++ src/util/unicode_util.h 2012-04-11 15:45:21 +0000
@@ -19,12 +19,18 @@
#include <zorba/config.h>
-#ifndef ZORBA_NO_UNICODE
-
#include <cctype>
#include <cstring>
#include <cwchar>
-#include <unicode/unistr.h>
+
+#include <zorba/internal/ztd.h>
+
+#ifdef ZORBA_NO_ICU
+# include "zorbamisc/config/stdint.h"
+# include "zorbatypes/zstring.h"
+#else
+# include <unicode/unistr.h>
+#endif /* ZORBA_NO_ICU */
#include "stl_util.h"
@@ -37,13 +43,21 @@
* The character type that can hold a Unicode character encoded in UTF-16. Do
* not assume that this is an unsigned type.
*/
-typedef UChar char_type;
+#ifdef ZORBA_NO_ICU
+ typedef char char_type;
+#else
+ typedef /* ICU's */ UChar char_type;
+#endif /* ZORBA_NO_ICU */
/**
* The type type that can hold a Unicode code-point. Do not assume that this
* is an unsigned type.
*/
-typedef UChar32 code_point;
+#ifdef ZORBA_NO_ICU
+typedef uint32_t code_point;
+#else
+typedef /* ICU's */ UChar32 code_point;
+#endif /* ZORBA_NO_ICU */
/**
* The type that represents the size of a string. Do not assume that this is
@@ -64,10 +78,17 @@
};
}
+#ifndef ZORBA_NO_ICU
/**
* A Unicode string.
*/
typedef U_NAMESPACE_QUALIFIER UnicodeString string;
+#else
+/**
+ * Since there is no ICU, just use a zstring as a "Unicode" string.
+ */
+typedef zstring string;
+#endif /* ZORBA_NO_ICU */
////////// code-point checking ////////////////////////////////////////////////
@@ -102,7 +123,7 @@
return ascii_c == c && isspace( ascii_c );
#else
return isspace( c );
-#endif
+#endif /* WIN32 */
}
/**
@@ -120,8 +141,10 @@
* @param c The code-point to check.
* @return Returns \c true only if the code-point is valid.
*/
-template<class CodePointType>
-inline bool is_valid( CodePointType c ) {
+template<typename CodePointType> inline
+typename std::enable_if<ZORBA_TR1_NS::is_integral<CodePointType>::value,
+ bool>::type
+is_valid( CodePointType c ) {
return (ztd::ge0( c ) && c <= 0x00D7FF)
|| (c >= 0x00E000 && c <= 0x00FFFD)
|| (c >= 0x010000 && c <= 0x10FFFF);
@@ -165,6 +188,7 @@
////////// normalization //////////////////////////////////////////////////////
+#ifndef ZORBA_NO_ICU
/**
* Normalizes the given string.
*
@@ -173,9 +197,11 @@
* @return Returns \c true only if the normalization succeeded.
*/
bool normalize( string const &in, normalization::type n, string *out );
+#endif /* ZORBA_NO_ICU */
////////// string conversion //////////////////////////////////////////////////
+#ifndef ZORBA_NO_ICU
/**
* Converts a single UTF-8 encoded character into a single Unicode character.
*
@@ -184,6 +210,7 @@
* @return Returns \c true only if the conversion succeeded.
*/
bool to_char( char const *in, char_type *out );
+#endif /* ZORBA_NO_ICU */
/**
* Converts a UTF-8 encoded string into a sequence of Unicode characters.
@@ -206,7 +233,15 @@
* @param out The Unicode string result.
* @return Returns \c true only if the conversion succeeded.
*/
+#ifndef ZORBA_NO_ICU
+ZORBA_DLL_PUBLIC
bool to_string( char const *in, size_type in_len, string *out );
+#else
+inline bool to_string( char const *in, size_type in_len, string *out ) {
+ out->assign( in, in_len );
+ return true;
+}
+#endif /* ZORBA_NO_ICU */
/**
* Converts a C string to a Unicode string.
@@ -219,6 +254,8 @@
return to_string( in, (size_type)std::strlen( in ), out );
}
+#ifndef ZORBA_NO_ICU
+
/**
* Converts a wide-character string to a Unicode string.
*
@@ -240,6 +277,8 @@
return to_string( in, static_cast<size_type>( std::wcslen( in ) ), out );
}
+#endif /* ZORBA_NO_ICU */
+
/**
* Converts a string to a Unicode string.
*
@@ -258,13 +297,6 @@
} // namespace unicode
} // namespace zorba
-#else
-#endif /* ZORBA_NO_UNICODE */
-namespace zorba{
-namespace unicode{
-typedef int32_t size_type;
-} // namespace unicode
-} // namespace zorba
#endif /* ZORBA_UNICODE_UTIL_H */
/*
* Local variables:
=== modified file 'src/util/utf8_util.cpp'
--- src/util/utf8_util.cpp 2012-03-28 05:19:57 +0000
+++ src/util/utf8_util.cpp 2012-04-11 15:45:21 +0000
@@ -15,17 +15,17 @@
*/
#include "stdafx.h"
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
#include <unicode/ustring.h>
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
#include "cxx_util.h"
#include "utf8_util.h"
using namespace std;
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
U_NAMESPACE_USE
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
unsigned const Mask1Byte = 0x80;
unsigned const Mask2Bytes = 0xC0;
@@ -169,7 +169,7 @@
return len;
}
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
bool to_string( unicode::char_type const *in, unicode::size_type in_len,
storage_type **out, size_type *out_len ) {
@@ -233,7 +233,7 @@
return true;
}
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
storage_type const* validate( storage_type const *s ) {
while ( *s ) {
=== modified file 'src/util/utf8_util.h'
--- src/util/utf8_util.h 2012-03-28 05:19:57 +0000
+++ src/util/utf8_util.h 2012-04-11 15:45:21 +0000
@@ -23,16 +23,20 @@
#include "ascii_util.h"
#include "cxx_util.h"
+#include "string_util.h"
#include "unicode_util.h"
#include "utf8_string.h"
#include "utf8_util_base.h"
+#include "zorbatypes/collation_manager.h"
#include "zorbautils/hashfun.h"
-#ifndef ZORBA_NO_UNICODE
-#include "zorbatypes/collation_manager.h"
-#include "zorbatypes/libicu.h"
-#endif
+#ifdef ZORBA_NO_ICU
+# include "diagnostics/assert.h"
+#else
+# include <unicode/coll.h>
+# include <unicode/sortkey.h>
+#endif /* ZORBA_NO_ICU */
namespace zorba {
namespace utf8 {
@@ -304,7 +308,7 @@
////////// Encoding conversion ////////////////////////////////////////////////
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
/**
* Converts a unicode::char_type array into a UTF-8 encoded string.
@@ -374,6 +378,8 @@
return to_string( in, u_strlen( in ), out );
}
+#endif /* ZORBA_NO_ICU */
+
/**
* Converts a unicode::string into a UTF-8 encoded string.
*
@@ -383,9 +389,16 @@
*/
template<class StringType> inline
bool to_string( unicode::string const &in, StringType *out ) {
+#ifndef ZORBA_NO_ICU
return to_string( in.getBuffer(), in.length(), out );
+#else
+ *out = in.c_str();
+ return true;
+#endif /* ZORBA_NO_ICU */
}
+#ifndef ZORBA_NO_ICU
+
//
// On Windows, UChar == wchar_t, so these functions would multiply define those
// previously.
@@ -507,7 +520,7 @@
return to_wchar_t( in.data(), in.size(), out, out_len );
}
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
////////// HTML URI ///////////////////////////////////////////////////////////
@@ -665,7 +678,7 @@
////////// Unicode normalization //////////////////////////////////////////////
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
/**
* Normalizes the Unicode characters in the string.
*
@@ -677,7 +690,7 @@
template<class InputStringType,class OutputStringType>
bool normalize( InputStringType const &in, unicode::normalization::type n,
OutputStringType *out );
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
////////// Whitespace /////////////////////////////////////////////////////////
@@ -738,7 +751,6 @@
std::reverse_copy( u_in.begin(), u_in.end(), std::back_inserter( u_out ) );
}
-#ifndef ZORBA_NO_UNICODE
/**
* Strips all diacritical marks from all characters converting them to their
* closest ASCII equivalents.
@@ -751,8 +763,6 @@
template<class InputStringType,class OutputStringType>
void strip_diacritics( InputStringType const &in, OutputStringType *out );
-#endif /* ZORBA_NO_UNICODE */
-
/**
*
*/
@@ -760,6 +770,7 @@
int compare(const StringType1 &s1, const StringType2 &s2,
const XQPCollator* collation)
{
+#ifndef ZORBA_NO_ICU
if (collation == NULL || collation->doMemCmp())
return s1.compare(s2);
@@ -770,6 +781,9 @@
unicode::to_string(s2, &us2);
return static_cast<Collator*>( collation->getCollator() )->compare(us1, us2);
+#else
+ return s1.compare(s2);
+#endif /* ZORBA_NO_ICU */
}
@@ -779,7 +793,9 @@
template<class StringType> inline
uint32_t hash(const StringType& s, const XQPCollator* collation = NULL)
{
+#ifndef ZORBA_NO_ICU
if (!collation || collation->doMemCmp())
+#endif
{
const char* str = s.data();
ulong len = (ulong)s.size();
@@ -795,7 +811,7 @@
//return hashfun::h32((void*)(s.data()), s.size());
}
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
CollationKey collKey;
UErrorCode status = U_ZERO_ERROR;
@@ -813,7 +829,7 @@
return collKey.hashCode();
#else
ZORBA_ASSERT(false);
-#endif
+#endif /* ZORBA_NO_ICU */
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/util/utf8_util.tcc'
--- src/util/utf8_util.tcc 2012-03-28 05:19:57 +0000
+++ src/util/utf8_util.tcc 2012-04-11 15:45:21 +0000
@@ -99,7 +99,7 @@
return next_char( temp );
}
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
template<class InputStringType,class OutputStringType>
bool normalize( InputStringType const &in, unicode::normalization::type n,
@@ -120,10 +120,16 @@
return true;
}
+#endif /* ZORBA_NO_ICU */
+
template<class InputStringType,class OutputStringType>
void strip_diacritics( InputStringType const &in, OutputStringType *out ) {
InputStringType in_normalized;
+#ifndef ZORBA_NO_ICU
normalize( in, unicode::normalization::NFKD, &in_normalized );
+#else
+ in_normalized = in.c_str();
+#endif /* ZORBA_NO_ICU */
out->clear();
out->reserve( in_normalized.size() );
std::copy(
@@ -132,6 +138,8 @@
);
}
+#ifndef ZORBA_NO_ICU
+
template<class StringType>
bool to_string( unicode::char_type const *in, size_type in_len,
StringType *out ) {
@@ -161,7 +169,7 @@
}
#endif /* WIN32 */
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
template<class InputStringType,class OutputStringType>
void to_lower( InputStringType const &in, OutputStringType *out ) {
=== modified file 'src/zorbatypes/collation_manager.cpp'
--- src/zorbatypes/collation_manager.cpp 2012-03-28 05:19:57 +0000
+++ src/zorbatypes/collation_manager.cpp 2012-04-11 15:45:21 +0000
@@ -17,9 +17,9 @@
#include "common/common.h"
-#ifndef ZORBA_NO_UNICODE
-#include "zorbatypes/libicu.h"
-#endif
+#ifndef ZORBA_NO_ICU
+# include <unicode/coll.h>
+#endif /* ZORBA_NO_ICU */
#include <vector>
#include <iostream>
@@ -116,7 +116,7 @@
Collator* lCollator;
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
UErrorCode lError = U_ZERO_ERROR;
if (lTokens.size() == 2)
{
@@ -136,37 +136,37 @@
#else
lCollator = new Collator;
-#endif
+#endif /* ZORBA_NO_ICU */
if (lTokens[0].compare("PRIMARY") == 0)
{
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
lCollator->setStrength(Collator::PRIMARY);
-#endif
+#endif /* ZORBA_NO_ICU */
}
else if (lTokens[0].compare("SECONDARY") == 0)
{
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
lCollator->setStrength(Collator::SECONDARY);
-#endif
+#endif /* ZORBA_NO_ICU */
}
else if (lTokens[0].compare("TERTIARY") == 0)
{
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
lCollator->setStrength(Collator::TERTIARY);
-#endif
+#endif /* ZORBA_NO_ICU */
}
else if (lTokens[0].compare("QUATERNARY") == 0)
{
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
lCollator->setStrength(Collator::QUATERNARY);
-#endif
+#endif /* ZORBA_NO_ICU */
}
else if (lTokens[0].compare("IDENTICAL") == 0)
{
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
lCollator->setStrength(Collator::IDENTICAL);
-#endif
+#endif /* ZORBA_NO_ICU */
}
else
{
@@ -181,7 +181,7 @@
CollationFactory::createCollator()
{
Collator* lCollator;
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
UErrorCode lError = U_ZERO_ERROR;
lCollator = Collator::createInstance(Locale("en", "US"), lError);
if( U_FAILURE(lError) ) {
@@ -190,7 +190,7 @@
lCollator->setStrength(Collator::IDENTICAL);
#else
lCollator = new Collator;
-#endif
+#endif /* ZORBA_NO_ICU */
return new XQPCollator(lCollator, (std::string)"");
}
=== modified file 'src/zorbatypes/collation_manager.h'
--- src/zorbatypes/collation_manager.h 2012-03-28 05:19:57 +0000
+++ src/zorbatypes/collation_manager.h 2012-04-11 15:45:21 +0000
@@ -25,13 +25,13 @@
namespace zorba
{
-#ifdef ZORBA_NO_UNICODE
+#ifdef ZORBA_NO_ICU
-class Collator
+class Collator
{
};
-#endif
+#endif /* ZORBA_NO_ICU */
class XQPCollator
{
=== removed file 'src/zorbatypes/libicu.h'
--- src/zorbatypes/libicu.h 2012-03-28 05:19:57 +0000
+++ src/zorbatypes/libicu.h 1970-01-01 00:00:00 +0000
@@ -1,32 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#pragma once
-#ifndef ZORBA_LIBICU_H
-#if defined CYGWIN
-# define U_HAVE_INTTYPES 0
-# define U_HAVE_INT8_T 1
-# define U_HAVE_INT32_T 1
-# define U_HAVE_UINT32_T 1
-#endif
-
-#include <unicode/utypes.h>
-#include <unicode/coll.h>
-#include <unicode/ustring.h>
-#include <unicode/stsearch.h>
-#include <unicode/ucnv.h>
-#include <unicode/normlzr.h>
-#endif
-/* vim:set et sw=2 ts=2: */
=== modified file 'src/zorbatypes/transcoder.cpp'
--- src/zorbatypes/transcoder.cpp 2012-03-28 05:19:57 +0000
+++ src/zorbatypes/transcoder.cpp 2012-04-11 15:45:21 +0000
@@ -25,17 +25,19 @@
namespace zorba {
+///////////////////////////////////////////////////////////////////////////////
+
transcoder::transcoder( std::ostream& output_stream, bool in_utf16 ) :
os( output_stream ),
utf16( in_utf16 )
{
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
utf8_buf_len_ = 0;
utf8_char_len_ = 1;
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
}
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
void transcoder::write_utf16( char const *s, std::streamsize len ) {
unicode::char_type *u_s;
@@ -76,7 +78,9 @@
}
}
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
+
+///////////////////////////////////////////////////////////////////////////////
} // namespace zorba
/* vim:set et sw=2 ts=2: */
=== modified file 'src/zorbatypes/transcoder.h'
--- src/zorbatypes/transcoder.h 2012-03-28 05:19:57 +0000
+++ src/zorbatypes/transcoder.h 2012-04-11 15:45:21 +0000
@@ -40,21 +40,21 @@
std::ostream &os;
bool const utf16;
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
utf8::encoded_char_type utf8_buf_;
int utf8_buf_len_;
int utf8_char_len_;
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
public:
transcoder(std::ostream& output_stream, bool in_utf16);
transcoder& write( char const *s, std::streamsize n ) {
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
if ( utf16 )
write_utf16( s, n );
else
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
os.write( s, n );
return *this;
}
@@ -68,11 +68,11 @@
}
transcoder& operator<<( char ch ) {
-#ifndef ZORBA_NO_UNICODE
- if (utf16)
+#ifndef ZORBA_NO_ICU
+ if ( utf16 )
write_utf16_char(ch);
else
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
os << ch;
return *this;
}
@@ -97,10 +97,10 @@
}
private:
-#ifndef ZORBA_NO_UNICODE
+#ifndef ZORBA_NO_ICU
void write_utf16(const char* str, std::streamsize n);
void write_utf16_char(char ch);
-#endif /* ZORBA_NO_UNICODE */
+#endif /* ZORBA_NO_ICU */
};
} // namespace zorba
=== modified file 'src/zorbautils/hashmap_itemh.h'
--- src/zorbautils/hashmap_itemh.h 2012-04-10 21:25:59 +0000
+++ src/zorbautils/hashmap_itemh.h 2012-04-11 15:45:21 +0000
@@ -13,6 +13,8 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
+#ifndef HASHMAP_ITEMH_H
+#define HASHMAP_ITEMH_H
#ifndef HASHMAP_ITEMH_H
#define HASHMAP_ITEMH_H
=== modified file 'src/zorbautils/string_util.cpp'
--- src/zorbautils/string_util.cpp 2012-03-28 05:19:57 +0000
+++ src/zorbautils/string_util.cpp 2012-04-11 15:45:21 +0000
@@ -24,16 +24,23 @@
#include "diagnostics/xquery_diagnostics.h"
using namespace std;
+#ifndef ZORBA_NO_ICU
U_NAMESPACE_USE
+#endif /* ZORBA_NO_ICU */
namespace zorba {
namespace utf8 {
+///////////////////////////////////////////////////////////////////////////////
+
size_t find( char const *s, size_t s_len, char const *ss, size_t ss_len,
- XQPCollator const *collator ) {
+ XQPCollator const *collator ) {
+#ifndef ZORBA_NO_ICU
if ( !collator || collator->doMemCmp()) {
+#endif /* ZORBA_NO_ICU */
char const *const result = ::strstr( s, ss );
return result ? result - s : zstring::npos;
+#ifndef ZORBA_NO_ICU
}
unicode::string u_s, u_ss;
@@ -54,28 +61,19 @@
}
}
return zstring::npos;
+#endif /* ZORBA_NO_ICU */
}
-size_t rfind(
- char const *s,
- size_t s_len,
- char const *ss,
- size_t ss_len,
- XQPCollator const *collator )
-{
- if ( ! collator || collator->doMemCmp())
- {
+size_t rfind( char const *s, size_t s_len, char const *ss, size_t ss_len,
+ XQPCollator const *collator ) {
+#ifndef ZORBA_NO_ICU
+ if ( ! collator || collator->doMemCmp()) {
+#endif /* ZORBA_NO_ICU */
zstring_b tmp;
tmp.wrap_memory(const_cast<char*>(s), s_len);
-
- size_t pos = tmp.rfind(ss, ss_len);
-
- //if (pos == zstring::npos)
- // return -1;
- //else
- // return pos;
- return pos;
+ return tmp.rfind(ss, ss_len);
+#ifndef ZORBA_NO_ICU
}
unicode::string u_s, u_ss;
@@ -102,6 +100,7 @@
}
return zstring::npos;
+#endif /* ZORBA_NO_ICU */
}
bool match_part( char const *in, char const *pattern, char const *flags ) {
@@ -116,6 +115,8 @@
return re.match_whole( in );
}
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace utf8
} // namespace zorba
/* vim:set et sw=2 ts=2: */
=== modified file 'src/zorbautils/string_util.h'
--- src/zorbautils/string_util.h 2012-03-28 05:19:57 +0000
+++ src/zorbautils/string_util.h 2012-04-11 15:45:21 +0000
@@ -13,12 +13,15 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
+
#pragma once
#ifndef ZORBA_UTILS_STRING_UTIL_H
#define ZORBA_UTILS_STRING_UTIL_H
#include <cstring>
+#include <zorba/config.h>
+
#include "diagnostics/xquery_diagnostics.h"
#include "zorbatypes/collation_manager.h"
@@ -145,9 +148,13 @@
char const *replacement, OutputStringType *out ) {
unicode::regex re;
re.compile( pattern, flags );
+#ifndef ZORBA_NO_ICU
unicode::string u_out;
return re.replace_all( in, replacement, &u_out ) &&
utf8::to_string( u_out.getBuffer(), u_out.length(), out );
+#else
+ return re.replace_all( in, replacement, out );
+#endif /* ZORBA_NO_ICU */
}
/**
@@ -175,9 +182,13 @@
OutputStringType *out ) {
unicode::regex re;
re.compile( pattern, flags );
+#ifndef ZORBA_NO_ICU
unicode::string u_out;
return re.replace_all( in, replacement, &u_out ) &&
utf8::to_string( u_out.getBuffer(), u_out.length(), out );
+#else
+ return re.replace_all( in, replacement, out );
+#endif /* ZORBA_NO_ICU */
}
/**
@@ -207,9 +218,13 @@
OutputStringType *out ) {
unicode::regex re;
re.compile( pattern, flags );
+#ifndef ZORBA_NO_ICU
unicode::string u_out;
return re.replace_all( in, replacement, &u_out ) &&
utf8::to_string( u_out.getBuffer(), u_out.length(), out );
+#else
+ return re.replace_all( in, replacement, out );
+#endif /* ZORBA_NO_ICU */
}
///////////////////////////////////////////////////////////////////////////////
@@ -217,7 +232,6 @@
} // namespace utf8
} // namespace zorba
#endif /* ZORBA_UTILS_STRING_UTIL_H */
-
/*
* Local variables:
* mode: c++
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a1.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>aa<fn:group nr="1">a</fn:group></fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a10.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a10.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a10.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,242 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:non-match><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
+<head>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
+ <meta name="copyright" content="The FLWOR Foundation"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">link</fn:group> rel="shortcut icon" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../favicon.ico</fn:group>"</fn:match><fn:non-match>/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">link</fn:group> type="text/css" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../css/reset.css</fn:group>"</fn:match><fn:non-match> rel="stylesheet"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">link</fn:group> type="text/css" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../css/style.css</fn:group>"</fn:match><fn:non-match> rel="stylesheet"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">link</fn:group> type="text/css" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../css/cute_profiles31.css</fn:group>"</fn:match><fn:non-match> rel="stylesheet"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">script</fn:group> language="javascript" type="text/javascript" src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../js/jquery-1.6.1.min.js</fn:group>"</fn:match><fn:non-match>></script>
+
+ <script type="text/javascript">
+ SyntaxHighlighter.all()
+ </script>
+
+ <title>Zorba: The XQuery Processor</title>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">script</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://www.google.com/js/gweb/analytics/autotrack.js</fn:group>"</fn:match><fn:non-match> type="text/javascript"></script>
+ <script type="text/javascript"> //
+ new gweb.analytics.AutoTrack({profile: 'UA-4281090-1'});
+ // </script>
+</head>
+<body>
+<div id="header">
+ <div class="innerheader text-left">
+ <div id="logo_container"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">index.html</fn:group>"</fn:match><fn:non-match>><img
+ src="../images/zorba_logo.png"
+ alt="Zorba C++ XQuery Processor"/></a>
+
+ <h1></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">index.html</fn:group>"</fn:match><fn:non-match>>Zorba</a></h1>
+
+ <p></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">index.html</fn:group>"</fn:match><fn:non-match>>The XQuery Processor</a></p></div>
+ <div id="innermenu" class="box">
+ <ul>
+ <li style="width:102px !important;"><a
+ href="../doc/latest/zorba/html/index.html" class="documentation">Documentation</a>
+ </li>
+ <li></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://try.zorba-xquery.com</fn:group>"</fn:match><fn:non-match> target="_blank" class="tryzorba">Live Demo</a></li>
+ <li></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">modules.html</fn:group>"</fn:match><fn:non-match> class="modules">Modules</a></li>
+ <li></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">downloads.html</fn:group>"</fn:match><fn:non-match> class="download">Download</a></li>
+ <li></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">tools.html</fn:group>"</fn:match><fn:non-match> class="tools">Tools</a></li>
+ <li></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">blog.html</fn:group>"</fn:match><fn:non-match> class="blog">Blog</a></li>
+ <li></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">code.html</fn:group>"</fn:match><fn:non-match> class="open">Code</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+<div class="cute_profiles_sprite" style="position: absolute; z-index:100;"><a title="Facebook"
+ class="cute_profiles_facebook"
+ href="http://www.facebook.com/groups/237538576264791"
+ target="_blank"/><a title="Twitter"
+ class="cute_profiles_twitter"
+ href="http://twitter.com/ZorbaXQuery"
+ target="_blank"/><a
+ title="Youtube" class="cute_profiles_youtube" href="http://www.youtube.com/user/xqueryxpath" target="_blank"/><a
+ title="Slideshare" class="cute_profiles_slideshare"
+ href="http://www.slideshare.net/search/slideshow?type=presentations&amp;q=zorba+xquery&amp;searchfrom=basic"
+ target="_blank" style="text-decoration:none;"> </a></div>
+<div id="main">
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">link</fn:group> rel="stylesheet" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../css/slides.css</fn:group>"</fn:match><fn:non-match> type="text/css"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">script</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../js/slides.min.jquery.js</fn:group>"</fn:match><fn:non-match> type="text/javascript"></script>
+ <script type="text/javascript"> //
+ $(function()
+ {
+ $('#teaser').slides({
+ preload: true,
+ preloadImage: '../images/slides/loading.gif',
+ play: 10000,
+ pause: 2500,
+ slideSpeed: 600,
+ hoverPause: true,
+ generatePagination: false
+ });
+ });
+ // </script>
+ <div id="teaser">
+ <div class="slides_container">
+ <div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>>XQuery: Less code, less time, better apps.</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/ideas_cloud.png</fn:group>"</fn:match><fn:non-match> alt="ideas_cloud.png"/></a>
+ </div>
+ <!--div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>>XQuery: Less code, less time, better apps.</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/ideas_flwors.png</fn:group>"</fn:match><fn:non-match> alt="ideas_flwors.png" style=""/></a>
+ </div-->
+ <div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/data_converters.html</fn:group>"</fn:match><fn:non-match>>Process the Web's structured and unstructured information</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/data_converters.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/data3.png</fn:group>"</fn:match><fn:non-match> alt="hm"/></a>
+ </div>
+ <div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/schema_lifecycle.html</fn:group>"</fn:match><fn:non-match>>Understand the Web's vocabularies</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/schema_lifecycle.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> style="" src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/schemastag.png</fn:group>"</fn:match><fn:non-match> alt="schemas cloud tag"/></a>
+ </div>
+ <!--div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>>XQuery: stitching together the data from the Web</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> style="height: 345px" src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/puzzle.png</fn:group>"</fn:match><fn:non-match> alt="puzzle image"/></a>
+ </div-->
+ <div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/overview.html</fn:group>"</fn:match><fn:non-match>>Zorba: The most complete XQuery processor</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/zorba-arch.png</fn:group>"</fn:match><fn:non-match> width="540" height="343" border="0" usemap="#map" alt="modules cloud tag"/>
+
+ <map name="map" id="overview_map">
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="17,14,75,58" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/cxx/html/index.html</fn:group>"</fn:match><fn:non-match> alt="C++"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="80,14,209,56" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/commandline.html</fn:group>"</fn:match><fn:non-match> alt="CLI"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="214,14,274,56" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://xqdt.org/</fn:group>"</fn:match><fn:non-match> alt="XQDT"/>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="279,14,343,56" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/python/html/index.html</fn:group>"</fn:match><fn:non-match> alt="Python" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="348,12,409,57" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/ruby/html/index.html</fn:group>"</fn:match><fn:non-match> alt="Ruby" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="411,14,470,56" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/java/html/index.html</fn:group>"</fn:match><fn:non-match> alt="Java" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="475,14,524,57" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/php/html/index.html</fn:group>"</fn:match><fn:non-match> alt="PHP" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="191,218,353,246" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/index.html</fn:group>"</fn:match><fn:non-match> alt="Mdules" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="136,250,221,277" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_converters_json.html</fn:group>"</fn:match><fn:non-match> alt="Json" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="40,250,126,277" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_http-client.html</fn:group>"</fn:match><fn:non-match> alt="Http" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="231,251,316,278" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_oauth_client.html</fn:group>"</fn:match><fn:non-match> alt="OAuth" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="326,250,412,277" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_reflection.html</fn:group>"</fn:match><fn:non-match> alt="Reflection" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="420,250,505,277" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_image_basic.html</fn:group>"</fn:match><fn:non-match> alt="Image" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="40,286,126,313" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/expath.org_ns_file.html</fn:group>"</fn:match><fn:non-match> alt="EXPath" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="136,287,221,314" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_xsl-fo.html</fn:group>"</fn:match><fn:non-match> alt="Xsl-fo" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="231,286,316,313" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/expath.org_ns_geo.html</fn:group>"</fn:match><fn:non-match> alt="Geo" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="326,286,412,313" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_cryptography_hmac.html</fn:group>"</fn:match><fn:non-match> alt="Hmac" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="420,286,505,313" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_xqdoc.html</fn:group>"</fn:match><fn:non-match> alt="Xqdoc" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="322,68,520,199" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/data_lifecycle.html#dl_zorba_store</fn:group>"</fn:match><fn:non-match> alt="Store" />
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group>rea shape="rect" coords="17,67,216,198" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/index.html</fn:group>"</fn:match><fn:non-match> alt="Zorba" />
+ </map>
+ </div>
+ <div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">modules.html</fn:group>"</fn:match><fn:non-match>>An ecosystem of XQuery modules</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">modules.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> style="margin: 50px 0px 0px 0px;" src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/modulestag.png</fn:group>"</fn:match><fn:non-match> alt="modules cloud tag"/></a>
+ </div>
+ <!--div style="background-color:white;">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://try.zorba-xquery.com</fn:group>"</fn:match><fn:non-match>>&lt;fun&gt;</a>
+ </p>
+ <br>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://try.zorba-xquery.com</fn:group>"</fn:match><fn:non-match> style="text-decoration:none;">
+ <div style="float: left; width:40%">
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> style="width:350px; height:300px; margin:0px 15px 30px 15px;" src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/pdash.png</fn:group>"</fn:match><fn:non-match> alt="PDash">
+ </div>
+ <div style="float: left; width:60% !important;height: 150px !important;">
+ <pre class="brush: xquery;toolbar: false;">
+ declare variable $seq := fn:parse-xml("RFID.xml");
+
+ for sliding window $w in $seq/stream/event
+ start $s_curr when fn:true()
+ only end next $next
+ when $next/@time > $s_curr/@time + 3
+ return
+ let $avg := fn:avg($w/@temp)
+ where $avg * 2 lt xs:double($next/@temp) or
+ $avg div 2 gt xs:double($next/@temp)
+ return &lt;alarm&gt;Outlier detected.
+ Event id:{data($next/@time)}&lt;/alarm&gt;
+ </pre>
+ </div>
+
+ </a>
+
+ </div-->
+ <div class="center">
+ <p class="center" style="margin: 20px;"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> style="font-size:250%;" href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>> &lt;productivity&gt;</a>
+ </p>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/zorba-slide-fun.png</fn:group>"</fn:match><fn:non-match> alt="Zorba fun"/></a>
+ </div>
+ </div>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">index.html#</fn:group>"</fn:match><fn:non-match> class="prev"></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"/><fn:group nr="4"/><fn:group nr="5"><fn:group nr="6">img</fn:group> src</fn:group></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../images/slides/arrow_prev.png</fn:group>"</fn:match><fn:non-match> width="24" height="43"
+ alt="Arrow Prev"></a></fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">index.html#</fn:group>"</fn:match><fn:non-match> class="next"/><img
+ src="../images/slides/arrow_next.png" width="24" height="43" alt="Arrow Next"/></a></div>
+ <div class="box content">
+ <div>
+
+ <table id="table-index">
+ <tr>
+ <td>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/overview.html</fn:group>"</fn:match><fn:non-match> class="noDecor">
+ <div class="flavors">All Flavors Available</div>
+ </a>
+ <p>General purpose XQuery processor - written in C++.</p>
+
+ <p>Complete family of W3C familly of specifications: XPath, XQuery, Update, Scripting,
+ Full-Text, XSLT, XQueryX, and more.</p></td>
+ <td>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">modules.html</fn:group>"</fn:match><fn:non-match> class="noDecor">
+ <div class="richmodules">Rich Module Library</div>
+ </a>
+
+ <p>Web mashups, cryptography, image processing, geo projections, emails, data cleaning...
+ there is a module for that.</p></td>
+ </tr>
+ <tr>
+ <td>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/index.html</fn:group>"</fn:match><fn:non-match> class="noDecor">
+ <div class="store">Pluggable Store</div>
+ </a>
+
+ <p>Seamlessly process XML data stored in different places.</p>
+
+ <p>Main memory, mobile devices, browsers, disk-based, or cloud-based stores.</p></td>
+ <td>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">../doc/latest/zorba/html/index.html</fn:group>"</fn:match><fn:non-match> class="noDecor">
+ <div class="api">Runs Everywhere</div>
+ </a>
+
+ <p>Available on Windows, Linux, and Mac OS.</p>
+
+ <p>Bindings available for 6 Programming Languages: C++, C, PHP, Ruby, Java and Python.</p></td>
+ </tr>
+ <tr>
+ <td>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">tools.html</fn:group>"</fn:match><fn:non-match> class="noDecor">
+ <div class="tooling">Developer Friendly Tools</div>
+ </a>
+
+ <p>Benefit from a rich ecosystem of tools.</p>
+
+ <p>Eclipse plugins, command-line interface, and debugger.</p></td>
+ <td>
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match> class="noDecor">
+ <div class="fun">Fun &amp; Productive</div>
+ </a>
+
+ <p>XQuery unifies development for all tiers; database, content management, application logic,
+ and presentation.</p>
+
+ <p>Check out </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">fun.html</fn:group>"</fn:match><fn:non-match>>examples and demos</a>.</p></td>
+ </tr>
+ </table>
+ </div>
+ <div style="clear: both;"></div>
+ </div>
+</div>
+<div> </div>
+<div id="footer">
+ <div id="innerfooter"><p>Zorba is supported by the </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://flworfound.org/</fn:group>"</fn:match><fn:non-match> target="_blank">FLWOR
+ Foundation</a> and distributed under
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://www.apache.org/licenses/LICENSE-2.0.html</fn:group>"</fn:match><fn:non-match> target="_blank">Apache Licence, Version 2.0</a>.</p>
+ </div>
+</div>
+</body>
+</html></fn:non-match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a11.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a11.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a11.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,6 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:non-match><div id="footer">
+ <div id="innerfooter"><p>Zorba is supported by the </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://flworfound.org/</fn:group>"</fn:match><fn:non-match> target="_blank">FLWOR
+ Foundation</a> and distributed under
+ </fn:non-match><fn:match><fn:group nr="1"><</fn:group><fn:group nr="2"><fn:group nr="3"><fn:group nr="4">a</fn:group> href</fn:group><fn:group nr="5"/><fn:group nr="6"/></fn:group>=<fn:group nr="7">"</fn:group><fn:group nr="8">http://www.apache.org/licenses/LICENSE-2.0.html</fn:group>"</fn:match><fn:non-match> target="_blank">Apache Licence, Version 2.0</a>.</p>
+ </div>
+</fn:non-match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a2.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>aa<fn:group nr="1"><fn:group nr="2">a</fn:group></fn:group></fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a3.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>aa<fn:group nr="1"><fn:group nr="2">a</fn:group></fn:group>c</fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a5.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a5.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a5.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>aa<fn:group nr="1"><fn:group nr="2">a</fn:group><fn:group nr="3"/></fn:group>c</fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a6.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a6.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a6.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>aa<fn:group nr="1"><fn:group nr="2">a</fn:group><fn:group nr="3"/></fn:group>c</fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a7.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a7.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a7.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:match>aa<fn:group nr="1"><fn:group nr="2">a</fn:group><fn:group nr="3">a</fn:group></fn:group>c</fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a8.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a8.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a8.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:non-match>aaaa</fn:non-match><fn:match><fn:group nr="1"></fn:group>c</fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a9.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a9.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_a9.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+<fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"><fn:non-match>aaaa</fn:non-match><fn:match><fn:group nr="1"></fn:group>c<fn:group nr="2"></fn:group></fn:match></fn:analyze-string-result>
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m1.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m10.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m10.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m10.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m11.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m11.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m11.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m12.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m12.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m12.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m13.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m13.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m13.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m14.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m14.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m14.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m15.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m15.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m15.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m16.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m16.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m16.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m17.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m17.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m17.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m18.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m18.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m18.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m19.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m19.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m19.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m2.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m20.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m20.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m20.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m21.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m21.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m21.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m22.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m22.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m22.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m23.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m23.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m23.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m24.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m24.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m24.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m25.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m25.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m25.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m26.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m26.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m26.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m27.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m27.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m27.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m28.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m28.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m28.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m29.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m29.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m29.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m3.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m30.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m30.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m30.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m31.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m31.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m31.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m32.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m32.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m32.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m33.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m33.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m33.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m34.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m34.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m34.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m35.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m35.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m35.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m36.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m36.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m36.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m37.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m37.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m37.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m38.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m38.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m38.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m39.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m39.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m39.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m4.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m40.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m40.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m40.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m41.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m41.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m41.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m42.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m42.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m42.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m43.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m43.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m43.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m44.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m44.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m44.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m45.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m45.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m45.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m46.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m46.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m46.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m47.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m47.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m47.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m48.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m48.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m48.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m49.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m49.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m49.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m5.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m5.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m5.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m50.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m50.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m50.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m51.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m51.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m51.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m52.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m52.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m52.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m53.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m53.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m53.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m6.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m6.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m6.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m7.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m7.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m7.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m8.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m8.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m8.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m9.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m9.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_m9.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_prime1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_prime1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_prime1.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+true false
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r1.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+ac1ac1
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r10.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r10.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r10.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+b
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r11.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r11.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r11.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
++-+-+-0-1
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r12.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r12.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r12.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,5 @@
+(:
+ :
+ :
+ :
+:)
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r2.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+1
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r3.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+11
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r4.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+a-aba-ab
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r5.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r5.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r5.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+acbaacba
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r6.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r6.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r6.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+acaabcab
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r9.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r9.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_r9.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+11
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t1.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+ r c d r
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t2.xml.res'
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t4.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+ 0 1
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t5.xml.res'
--- test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t5.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/string/Regex/regex_t5.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+The cat sat on the mat
\ No newline at end of file
=== added file 'test/rbkt/ExpQueryResults/zorba/testdriver/bom_bug.xml.res'
--- test/rbkt/ExpQueryResults/zorba/testdriver/bom_bug.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/testdriver/bom_bug.xml.res 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+11
\ No newline at end of file
=== modified file 'test/rbkt/Queries/CMakeLists.txt'
--- test/rbkt/Queries/CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ test/rbkt/Queries/CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -476,12 +476,14 @@
# EXPECTED_FAILURE (test/rbkt/zorba/file/dirname_basename ????need bugnum???)
#ENDIF ()
+# test that must fail to pass, to check testdriver BOM bug that gives false positives
+EXPECTED_FAILURE (test/rbkt/zorba/testdriver/bom_bug 3381121)
+
# Bug 921624. If this test takes more than a couple seconds, it must be
# hitting w3.org for the DTD, which is bad.
SET_TESTS_PROPERTIES(test/rbkt/zorba/schemas/local-xhtml
PROPERTIES TIMEOUT 5)
-
# --------------------------------------------------------------------------
# the list of tests that are failing but can be accepted by the commit queue
# !!! do not abuse this list or you will be prosecuted !!!
@@ -520,6 +522,19 @@
EXPECTED_FAILURE(test/rbkt/zorba/http-client/put/put3_binary_element 3391756)
EXPECTED_FAILURE(test/rbkt/zorba/http-client/post/post3_binary_element 3391756)
+IF(NOT ZORBA_NO_ICU)
+ EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err10 974474)
+ EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err15 866874)
+ EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err16 974477)
+ EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m11 866874)
+ENDIF(NOT ZORBA_NO_ICU)
+
+IF(ZORBA_NO_ICU)
+ SET_TESTS_PROPERTIES(
+ test/rbkt/zorba/string/CodepointToStringFunc/UnicodeNormalization1
+ PROPERTIES WILL_FAIL TRUE)
+ENDIF(ZORBA_NO_ICU)
+
EXPECTED_FAILURE(test/rbkt/zorba/reference/reference_5 868640)
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a1.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a1.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaa", "(a)+")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a10.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a10.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a10.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,11 @@
+
+import module namespace file = "http://expath.org/ns/file";
+
+declare namespace ann = "http://www.zorba-xquery.com/annotations";
+
+variable $http-content;
+$http-content := file:read-text(resolve-uri("zorba.html"));
+
+(: local:get-out-links-unparsed($http-call[2]) :)
+
+fn:analyze-string($http-content, "(<|&lt;|<)(((a|link|area).+?href)|((script|img).+?src))=([""'])(.*?)\7")
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a11.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a11.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a11.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,9 @@
+
+import module namespace file = "http://expath.org/ns/file";
+
+declare namespace ann = "http://www.zorba-xquery.com/annotations";
+
+variable $http-content;
+$http-content := file:read-text(resolve-uri("zorba2.html"));
+
+fn:analyze-string($http-content, "(<|&lt;|<)(((a|link|area).+?href)|((script|img).+?src))=([""'])(.*?)\7")
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a2.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a2.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaa", "((a))+")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a3.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a3.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaac", "((a))+?c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a5.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a5.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a5.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaac", "((a)|(c))+c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a6.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a6.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a6.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaac", "((a)|(c))+c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a7.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a7.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a7.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaaac", "((a)(a))+c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a8.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a8.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a8.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaaac", "()c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_a9.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_a9.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_a9.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:analyze-string("aaaac", "()c($)")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err1.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err1.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err1.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err1.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err1.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "+")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err10.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err10.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err10.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err10.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err10.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err10.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "\p{IsBasic-Latin}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err11.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err11.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err11.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err11.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err11.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err11.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "\p{IsBasicLatin2}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err12.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err12.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err12.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err12.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err12.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err12.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "\y")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err13.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err13.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err13.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err13.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err13.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err13.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "\0")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err14.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err14.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err14.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err14.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err14.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err14.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "(1)\2")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err15.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err15.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err15.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err15.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err15.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err15.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "[a-[b] ]")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err16.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err16.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err16.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err16.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err16.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err16.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "[\s-e]")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err17.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err17.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err17.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err17.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err17.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err17.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "[e-\s]")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err18.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err18.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err18.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err18.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err18.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err18.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "[eb")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err19.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err19.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err19.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err19.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err19.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err19.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+(:backref to unended group:)
+
+fn:matches("a", "(a(b(c)\2))")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err2.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err2.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err2.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err2.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err2.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err20.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err20.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err20.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0001
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err20.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err20.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err20.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+(:unknown flag:)
+
+fn:matches("a", "a", "a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err21.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err21.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err21.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0004
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err21.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err21.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err21.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+(:$ not followed by 0-9:)
+
+fn:replace("a", "a", "$a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err22.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err22.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err22.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0004
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err22.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err22.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err22.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+(:\ outside constructs \\ or \$:)
+
+fn:replace("a", "a", "\a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err23.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err23.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err23.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err23.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err23.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err23.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+(:group with ?: is not used in backreferencing:)
+
+fn:matches("a", "(a(?:b)\2)")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err24.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err24.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err24.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err24.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err24.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err24.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+(:{min,max} min is bigger:)
+
+fn:matches("a", "a{3,2}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err25.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err25.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err25.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err25.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err25.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err25.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "a^")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err3.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err3.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err3.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err3.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err3.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "{")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err4.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err4.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err4.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err4.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err4.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "?")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err5.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err5.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err5.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err5.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err5.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err5.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "*")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err7.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err7.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err7.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err7.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err7.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err7.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "^^")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err8.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err8.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err8.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err8.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err8.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err8.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "\p ")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err9.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_err9.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err9.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0002
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_err9.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_err9.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_err9.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "\P{L ")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m1.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m1.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("abracadabra", "bra")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m10.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m10.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m10.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ba", "a?b?")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m11.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m11.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m11.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ba", "[a-z-[ab]]")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m12.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m12.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m12.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaaaab", "a*ab")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m13.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m13.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m13.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaaaab", "a*?ab")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m14.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m14.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m14.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("abc", "(a|ab)c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m15.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m15.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m15.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("bbba", "((a)|(b))*\3")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m16.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m16.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m16.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaaa", "^a*?$")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m17.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m17.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m17.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaabb", "a{1,3}ab")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m18.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m18.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m18.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,3 @@
+
+
+fn:matches("aaaa", "a{1,3}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m19.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m19.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m19.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("baac", "(?:b)(a)\1c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m2.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m2.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("abracadabra", "^a.*a$")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m20.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m20.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m20.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaaa", "(aaa|a){2,3}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m21.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m21.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m21.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaaa", "(aaa|a){2,3}?")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m22.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m22.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m22.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaac", "(aaa|a){2,3}?c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m23.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m23.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m23.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaac", "(aaa|a){2,3}c")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m24.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m24.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m24.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("aaaaab", "(a|b)*ab")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m25.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m25.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m25.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("t1t22t33", "(t.*){3}")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m26.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m26.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m26.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ac", "ab")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m27.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m27.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m27.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cat", "cat(aract|erpillar|) ")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m28.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m28.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m28.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cat", "c()a\1t")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m29.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m29.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m29.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cat", "cat(aract|erpillar|)")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m3.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m3.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("abracadabra", "^bra")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m30.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m30.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m30.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cat", "c()a\1t ")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m31.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m31.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m31.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cat", "cat(aract||erpillar)")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m32.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m32.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m32.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "|")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m33.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m33.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m33.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "^a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m34.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m34.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m34.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "^a$")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m35.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m35.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m35.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,4 @@
+fn:matches(
+"a
+b
+c", "^b", "m")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m36.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m36.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m36.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "b$|^a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m37.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m37.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m37.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,4 @@
+fn:matches(
+"a
+b
+c", "e$|^c$", "m")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m38.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m38.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m38.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,4 @@
+fn:matches(
+"a
+b
+c", "e$|(^c$)+", "m")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m39.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m39.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m39.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("a", "(^)a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m4.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m4.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,6 @@
+let $poem :=
+<poem author="Wilhelm Busch"> Kaum hat dies
+ der Hahn gesehen, Fangt er auch schon an zu krahen: Kikeriki! Kikikerikih!! Tak, tak,
+ tak! - da kommen sie. </poem>
+return
+fn:matches($poem, "Kaum.*krahen")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m40.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m40.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m40.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "^+a")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m41.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m41.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m41.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "^?b")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m42.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m42.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m42.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "(c*)*")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m43.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m43.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m43.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "(c*)*?e")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m44.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m44.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m44.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "((c)*?)*?e")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m45.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m45.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m45.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("ab", "(c*){3,}e")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m46.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m46.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m46.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cabana", "(cab|caba)na")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m47.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m47.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m47.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("cabana", "((a|c)(a|a)(a|b)|(a|c)(a|a)(a|b)(a|a))na")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m48.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m48.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m48.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("abc", "^b")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m49.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m49.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m49.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("abc", "b$")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m5.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m5.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m5.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,6 @@
+let $poem :=
+<poem author="Wilhelm Busch"> Kaum hat dies
+ der Hahn gesehen, Fangt er auch schon an zu krahen: Kikeriki! Kikikerikih!! Tak, tak,
+ tak! - da kommen sie. </poem>
+return
+fn:matches($poem, "Kaum.*krahen", "s")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m50.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m50.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m50.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,2 @@
+fn:matches("abc
+def", "b.*f", "s")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m51.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m51.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m51.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,2 @@
+fn:matches("abc
+def", "b.*f", "m")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m52.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m52.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m52.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("b", "[^B]", "i")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m53.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m53.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m53.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:matches("bc d", "b c[ ]d", "x")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m6.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m6.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m6.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,6 @@
+let $poem :=
+<poem author="Wilhelm Busch"> Kaum hat dies der Hahn gesehen,
+ Fangt er auch schon an zu krahen: Kikeriki! Kikikerikih!! Tak, tak, tak! - da kommen sie.
+</poem>
+return
+fn:matches($poem, "^ Kaum.*gesehen,$", "m")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m7.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m7.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m7.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,6 @@
+let $poem :=
+<poem author="Wilhelm Busch"> Kaum hat dies der Hahn gesehen,
+ Fangt er auch schon an zu krahen: Kikeriki! Kikikerikih!! Tak, tak, tak! - da kommen sie.
+</poem>
+return
+fn:matches($poem, "^Kaum.*gesehen,$")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m8.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m8.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m8.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,7 @@
+let $poem :=
+<poem author="Wilhelm Busch"> Kaum hat dies der Hahn gesehen,
+ Fangt er auch schon an zu krahen: Kikeriki! Kikikerikih!!
+ Tak, tak, tak! - da kommen sie.
+</poem>
+return
+fn:matches($poem, "kiki", "i")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_m9.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_m9.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_m9.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,7 @@
+let $poem :=
+<poem author="Wilhelm Busch"> Kaum hat dies der Hahn gesehen,
+ Fangt er auch schon an zu krahen: Kikeriki! Kikikerikih!!
+ Tak, tak, tak! - da kommen sie.
+</poem>
+return
+fn:matches($poem, "(tak.*){3}", "i")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_prime1.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_prime1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_prime1.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,17 @@
+
+declare function local:string-1-n($nr as xs:integer) as xs:string
+{
+ if($nr eq 0) then
+ ""
+ else
+ concat("1", local:string-1-n($nr - 1))
+};
+
+declare function local:is-prime($nr as xs:integer) as xs:boolean
+{
+ let $str1 := local:string-1-n($nr)
+ return
+ fn:not(fn:matches($str1, "^(11+)\1+$"))
+};
+
+(local:is-prime(13), local:is-prime(24))
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r1.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r1.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("acabacab", "ab", "1")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r10.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r10.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r10.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("aba", "a", "")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r11.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r11.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r11.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("a-b-c-0-1", "\p{Ll}", "+")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r12.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r12.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r12.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,7 @@
+fn:replace(
+"(:
+ *
+ *
+ *
+:)", "\n\r? *\*", "
+ :")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r2.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r2.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("aaa", "a+", "1")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r3.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r3.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("aa", "a|aa", "1")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r4.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r4.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("acabacab", "c(ab)", "-$1")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r5.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r5.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r5.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("acabacab", "(a)(b)", "$2$1")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r6.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r6.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r6.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("acabacab", "(a)(b)(a)(c)", "$3$1$2$5$4")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r7_err.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_r7_err.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r7_err.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0003
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r7_err.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r7_err.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r7_err.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("acabacab", "(a)*(b)*(a)*(c)*", "$3$1$2$5$4")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r8_err.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_r8_err.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r8_err.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0004
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r8_err.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r8_err.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r8_err.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("acabacab", "(a)(b)(a)(c)", "$$3$1$2$5$4")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_r9.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_r9.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_r9.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:replace("aaaa", "(a|aa){1,2}", "1")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_t1.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_t1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_t1.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:tokenize("abracadabra", "(ab)|(a)")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_t2.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_t2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_t2.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:tokenize("", "(ab)|(a)")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_t3_err.spec'
--- test/rbkt/Queries/zorba/string/Regex/regex_t3_err.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_t3_err.spec 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FORX0003
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_t3_err.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_t3_err.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_t3_err.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:tokenize("", "a*")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_t4.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_t4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_t4.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,2 @@
+(:extract numbers:)
+fn:tokenize("x=0,y=1", "\P{Nd}+")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/regex_t5.xq'
--- test/rbkt/Queries/zorba/string/Regex/regex_t5.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/regex_t5.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+fn:tokenize("The cat sat on the mat", "\s+")
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/zorba.html'
--- test/rbkt/Queries/zorba/string/Regex/zorba.html 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/zorba.html 2012-04-11 15:45:21 +0000
@@ -0,0 +1,242 @@
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
+<head>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
+ <meta name="copyright" content="The FLWOR Foundation"/>
+ <link rel="shortcut icon" href="../favicon.ico"/>
+ <link type="text/css" href="../css/reset.css" rel="stylesheet"/>
+ <link type="text/css" href="../css/style.css" rel="stylesheet"/>
+ <link type="text/css" href="../css/cute_profiles31.css" rel="stylesheet"/>
+ <script language="javascript" type="text/javascript" src="../js/jquery-1.6.1.min.js"></script>
+
+ <script type="text/javascript">
+ SyntaxHighlighter.all()
+ </script>
+
+ <title>Zorba: The XQuery Processor</title>
+ <script src="http://www.google.com/js/gweb/analytics/autotrack.js" type="text/javascript"></script>
+ <script type="text/javascript"> //
+ new gweb.analytics.AutoTrack({profile: 'UA-4281090-1'});
+ // </script>
+</head>
+<body>
+<div id="header">
+ <div class="innerheader text-left">
+ <div id="logo_container"><a href="index.html"><img
+ src="../images/zorba_logo.png"
+ alt="Zorba C++ XQuery Processor"/></a>
+
+ <h1><a href="index.html">Zorba</a></h1>
+
+ <p><a href="index.html">The XQuery Processor</a></p></div>
+ <div id="innermenu" class="box">
+ <ul>
+ <li style="width:102px !important;"><a
+ href="../doc/latest/zorba/html/index.html" class="documentation">Documentation</a>
+ </li>
+ <li><a href="http://try.zorba-xquery.com" target="_blank" class="tryzorba">Live Demo</a></li>
+ <li><a href="modules.html" class="modules">Modules</a></li>
+ <li><a href="downloads.html" class="download">Download</a></li>
+ <li><a href="tools.html" class="tools">Tools</a></li>
+ <li><a href="blog.html" class="blog">Blog</a></li>
+ <li><a href="code.html" class="open">Code</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+<div class="cute_profiles_sprite" style="position: absolute; z-index:100;"><a title="Facebook"
+ class="cute_profiles_facebook"
+ href="http://www.facebook.com/groups/237538576264791"
+ target="_blank"/><a title="Twitter"
+ class="cute_profiles_twitter"
+ href="http://twitter.com/ZorbaXQuery"
+ target="_blank"/><a
+ title="Youtube" class="cute_profiles_youtube" href="http://www.youtube.com/user/xqueryxpath" target="_blank"/><a
+ title="Slideshare" class="cute_profiles_slideshare"
+ href="http://www.slideshare.net/search/slideshow?type=presentations&q=zorba+xquery&searchfrom=basic"
+ target="_blank" style="text-decoration:none;"> </a></div>
+<div id="main">
+ <link rel="stylesheet" href="../css/slides.css" type="text/css"/>
+ <script src="../js/slides.min.jquery.js" type="text/javascript"></script>
+ <script type="text/javascript"> //
+ $(function()
+ {
+ $('#teaser').slides({
+ preload: true,
+ preloadImage: '../images/slides/loading.gif',
+ play: 10000,
+ pause: 2500,
+ slideSpeed: 600,
+ hoverPause: true,
+ generatePagination: false
+ });
+ });
+ // </script>
+ <div id="teaser">
+ <div class="slides_container">
+ <div class="center">
+ <p class="center" style="margin: 20px;"><a href="fun.html">XQuery: Less code, less time, better apps.</a>
+ </p>
+ <a href="fun.html"><img src="../images/ideas_cloud.png" alt="ideas_cloud.png"/></a>
+ </div>
+ <!--div class="center">
+ <p class="center" style="margin: 20px;"><a href="fun.html">XQuery: Less code, less time, better apps.</a>
+ </p>
+ <a href="fun.html"><img src="../images/ideas_flwors.png" alt="ideas_flwors.png" style=""/></a>
+ </div-->
+ <div class="center">
+ <p class="center" style="margin: 20px;"><a href="../doc/latest/zorba/html/data_converters.html">Process the Web's structured and unstructured information</a>
+ </p>
+ <a href="../doc/latest/zorba/html/data_converters.html"><img src="../images/data3.png" alt="hm"/></a>
+ </div>
+ <div class="center">
+ <p class="center" style="margin: 20px;"><a href="../doc/latest/zorba/html/schema_lifecycle.html">Understand the Web's vocabularies</a>
+ </p>
+ <a href="../doc/latest/zorba/html/schema_lifecycle.html"><img style="" src="../images/schemastag.png" alt="schemas cloud tag"/></a>
+ </div>
+ <!--div class="center">
+ <p class="center" style="margin: 20px;"><a href="fun.html">XQuery: stitching together the data from the Web</a>
+ </p>
+ <a href="fun.html"><img style="height: 345px" src="../images/puzzle.png" alt="puzzle image"/></a>
+ </div-->
+ <div class="center">
+ <p class="center" style="margin: 20px;"><a href="../doc/latest/zorba/html/overview.html">Zorba: The most complete XQuery processor</a>
+ </p>
+ <img src="../images/zorba-arch.png" width="540" height="343" border="0" usemap="#map" alt="modules cloud tag"/>
+
+ <map name="map" id="overview_map">
+ <area shape="rect" coords="17,14,75,58" href="../doc/latest/cxx/html/index.html" alt="C++"/>
+ <area shape="rect" coords="80,14,209,56" href="../doc/latest/zorba/html/commandline.html" alt="CLI"/>
+ <area shape="rect" coords="214,14,274,56" href="http://xqdt.org/" alt="XQDT"/>
+ <area shape="rect" coords="279,14,343,56" href="../doc/latest/python/html/index.html" alt="Python" />
+ <area shape="rect" coords="348,12,409,57" href="../doc/latest/ruby/html/index.html" alt="Ruby" />
+ <area shape="rect" coords="411,14,470,56" href="../doc/latest/java/html/index.html" alt="Java" />
+ <area shape="rect" coords="475,14,524,57" href="../doc/latest/php/html/index.html" alt="PHP" />
+ <area shape="rect" coords="191,218,353,246" href="../doc/latest/zorba/xqdoc/xhtml/index.html" alt="Mdules" />
+ <area shape="rect" coords="136,250,221,277" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_converters_json.html" alt="Json" />
+ <area shape="rect" coords="40,250,126,277" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_http-client.html" alt="Http" />
+ <area shape="rect" coords="231,251,316,278" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_oauth_client.html" alt="OAuth" />
+ <area shape="rect" coords="326,250,412,277" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_reflection.html" alt="Reflection" />
+ <area shape="rect" coords="420,250,505,277" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_image_basic.html" alt="Image" />
+ <area shape="rect" coords="40,286,126,313" href="../doc/latest/zorba/xqdoc/xhtml/expath.org_ns_file.html" alt="EXPath" />
+ <area shape="rect" coords="136,287,221,314" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_xsl-fo.html" alt="Xsl-fo" />
+ <area shape="rect" coords="231,286,316,313" href="../doc/latest/zorba/xqdoc/xhtml/expath.org_ns_geo.html" alt="Geo" />
+ <area shape="rect" coords="326,286,412,313" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_cryptography_hmac.html" alt="Hmac" />
+ <area shape="rect" coords="420,286,505,313" href="../doc/latest/zorba/xqdoc/xhtml/www.zorba-xquery.com_modules_xqdoc.html" alt="Xqdoc" />
+ <area shape="rect" coords="322,68,520,199" href="../doc/latest/zorba/html/data_lifecycle.html#dl_zorba_store" alt="Store" />
+ <area shape="rect" coords="17,67,216,198" href="../doc/latest/zorba/html/index.html" alt="Zorba" />
+ </map>
+ </div>
+ <div class="center">
+ <p class="center" style="margin: 20px;"><a href="modules.html">An ecosystem of XQuery modules</a>
+ </p>
+ <a href="modules.html"><img style="margin: 50px 0px 0px 0px;" src="../images/modulestag.png" alt="modules cloud tag"/></a>
+ </div>
+ <!--div style="background-color:white;">
+ <p class="center" style="margin: 20px;"><a href="http://try.zorba-xquery.com"><fun></a>
+ </p>
+ <br>
+ <a href="http://try.zorba-xquery.com" style="text-decoration:none;">
+ <div style="float: left; width:40%">
+ <img style="width:350px; height:300px; margin:0px 15px 30px 15px;" src="../images/pdash.png" alt="PDash">
+ </div>
+ <div style="float: left; width:60% !important;height: 150px !important;">
+ <pre class="brush: xquery;toolbar: false;">
+ declare variable $seq := fn:parse-xml("RFID.xml");
+
+ for sliding window $w in $seq/stream/event
+ start $s_curr when fn:true()
+ only end next $next
+ when $next/@time > $s_curr/@time + 3
+ return
+ let $avg := fn:avg($w/@temp)
+ where $avg * 2 lt xs:double($next/@temp) or
+ $avg div 2 gt xs:double($next/@temp)
+ return <alarm>Outlier detected.
+ Event id:{data($next/@time)}</alarm>
+ </pre>
+ </div>
+
+ </a>
+
+ </div-->
+ <div class="center">
+ <p class="center" style="margin: 20px;"><a style="font-size:250%;" href="fun.html"> <productivity></a>
+ </p>
+ <a href="fun.html"><img src="../images/zorba-slide-fun.png" alt="Zorba fun"/></a>
+ </div>
+ </div>
+ <a href="index.html#" class="prev"><img src="../images/slides/arrow_prev.png" width="24" height="43"
+ alt="Arrow Prev"></a><a href="index.html#" class="next"/><img
+ src="../images/slides/arrow_next.png" width="24" height="43" alt="Arrow Next"/></a></div>
+ <div class="box content">
+ <div>
+
+ <table id="table-index">
+ <tr>
+ <td>
+ <a href="../doc/latest/zorba/html/overview.html" class="noDecor">
+ <div class="flavors">All Flavors Available</div>
+ </a>
+ <p>General purpose XQuery processor - written in C++.</p>
+
+ <p>Complete family of W3C familly of specifications: XPath, XQuery, Update, Scripting,
+ Full-Text, XSLT, XQueryX, and more.</p></td>
+ <td>
+ <a href="modules.html" class="noDecor">
+ <div class="richmodules">Rich Module Library</div>
+ </a>
+
+ <p>Web mashups, cryptography, image processing, geo projections, emails, data cleaning...
+ there is a module for that.</p></td>
+ </tr>
+ <tr>
+ <td>
+ <a href="../doc/latest/zorba/html/index.html" class="noDecor">
+ <div class="store">Pluggable Store</div>
+ </a>
+
+ <p>Seamlessly process XML data stored in different places.</p>
+
+ <p>Main memory, mobile devices, browsers, disk-based, or cloud-based stores.</p></td>
+ <td>
+ <a href="../doc/latest/zorba/html/index.html" class="noDecor">
+ <div class="api">Runs Everywhere</div>
+ </a>
+
+ <p>Available on Windows, Linux, and Mac OS.</p>
+
+ <p>Bindings available for 6 Programming Languages: C++, C, PHP, Ruby, Java and Python.</p></td>
+ </tr>
+ <tr>
+ <td>
+ <a href="tools.html" class="noDecor">
+ <div class="tooling">Developer Friendly Tools</div>
+ </a>
+
+ <p>Benefit from a rich ecosystem of tools.</p>
+
+ <p>Eclipse plugins, command-line interface, and debugger.</p></td>
+ <td>
+ <a href="fun.html" class="noDecor">
+ <div class="fun">Fun & Productive</div>
+ </a>
+
+ <p>XQuery unifies development for all tiers; database, content management, application logic,
+ and presentation.</p>
+
+ <p>Check out <a href="fun.html">examples and demos</a>.</p></td>
+ </tr>
+ </table>
+ </div>
+ <div style="clear: both;"></div>
+ </div>
+</div>
+<div> </div>
+<div id="footer">
+ <div id="innerfooter"><p>Zorba is supported by the <a href="http://flworfound.org/" target="_blank">FLWOR
+ Foundation</a> and distributed under
+ <a href="http://www.apache.org/licenses/LICENSE-2.0.html" target="_blank">Apache Licence, Version 2.0</a>.</p>
+ </div>
+</div>
+</body>
+</html>
\ No newline at end of file
=== added file 'test/rbkt/Queries/zorba/string/Regex/zorba2.html'
--- test/rbkt/Queries/zorba/string/Regex/zorba2.html 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/string/Regex/zorba2.html 2012-04-11 15:45:21 +0000
@@ -0,0 +1,5 @@
+<div id="footer">
+ <div id="innerfooter"><p>Zorba is supported by the <a href="http://flworfound.org/" target="_blank">FLWOR
+ Foundation</a> and distributed under
+ <a href="http://www.apache.org/licenses/LICENSE-2.0.html" target="_blank">Apache Licence, Version 2.0</a>.</p>
+ </div>
=== added file 'test/rbkt/Queries/zorba/testdriver/bom_bug.xq'
--- test/rbkt/Queries/zorba/testdriver/bom_bug.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/testdriver/bom_bug.xq 2012-04-11 15:45:21 +0000
@@ -0,0 +1,1 @@
+1
\ No newline at end of file
=== modified file 'test/unit/static_context.cpp'
--- test/unit/static_context.cpp 2012-01-17 19:07:24 +0000
+++ test/unit/static_context.cpp 2012-04-11 15:45:21 +0000
@@ -26,7 +26,9 @@
using namespace std;
using namespace zorba;
+#ifndef ZORBA_NO_FULL_TEXT
using namespace zorba::locale;
+#endif /* ZORBA_NO_FULL_TEXT */
bool
sctx_test_1(Zorba* const zorba)
=== modified file 'test/update/CMakeLists.txt'
--- test/update/CMakeLists.txt 2012-03-28 05:19:57 +0000
+++ test/update/CMakeLists.txt 2012-04-11 15:45:21 +0000
@@ -67,6 +67,15 @@
ENDFOREACH(TESTFILE)
+IF(ZORBA_NO_FULL_TEXT)
+ SET_TESTS_PROPERTIES(
+ test/update/zorba/store/sc1
+ test/update/zorba/store/sc2_ex
+ PROPERTIES WILL_FAIL TRUE)
+ENDIF(ZORBA_NO_FULL_TEXT)
+
+
+
IF (FOUND_XQUTS AND NOT ZORBA_TEST_W3C_TO_SUBMIT_RESULTS)
# We "don't care" that these fail
EXPECTED_FAILURE(test/update/w3c_update_testsuite/XQuery/Put/fn-put-005 3354993)
Follow ups
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: noreply, 2012-04-14
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-14
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-14
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-14
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-14
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-14
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-14
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-14
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-13
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-13
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-13
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-13
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-13
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Matthias Brantner, 2012-04-13
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-12
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-12
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-12
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-12
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-12
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-12
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-12
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-11
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-11
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-11
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-11
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-11
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-11
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Zorba Build Bot, 2012-04-11
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-11
-
Re: [Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Markos Zaharioudakis, 2012-04-11
-
[Merge] lp:~zorba-coders/zorba/no_unicode into lp:zorba
From: Rodolfo Ochoa, 2012-04-11