zorba-coders team mailing list archive
-
zorba-coders team
-
Mailing list archive
-
Message #08518
[Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba
Paul J. Lucas has proposed merging lp:~zorba-coders/zorba/feature-ft_module into lp:zorba.
Requested reviews:
Matthias Brantner (matthias-brantner)
Markos Zaharioudakis (markos-za)
Related bugs:
Bug #944795 in Zorba: "XQDoc doesn't handle & in URLs"
https://bugs.launchpad.net/zorba/+bug/944795
For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/feature-ft_module/+merge/103378
1. Added a new full-text module.
2. Fixed semi-broken Thesaurus API.
3. Now supporting many more languages for tokenization including Chinese.
4. Many other full-text improvements.
--
https://code.launchpad.net/~zorba-coders/zorba/feature-ft_module/+merge/103378
Your team Zorba Coders is subscribed to branch lp:zorba.
=== modified file 'ChangeLog'
--- ChangeLog 2012-04-24 12:39:38 +0000
+++ ChangeLog 2012-04-24 20:57:30 +0000
@@ -10,6 +10,7 @@
* fn:unparsed-text-available
* Extended API for Python, Java, PHP and Ruby.
* Add jvm classpath to zorbacmd and to Zorba API. Tracked by #931816
+ * Added full-text module.
* Added support for NO_ICU (to not use ICU for unicode processing)
* Added XQJ support.
@@ -88,6 +89,8 @@
* Fixed bug 867509 (Can not handle largest xs:unsignedLong values)
* Fixed bug 924063 (sentence is incorrectly incremented when token characters end without sentence terminator)
* Fixed bug 909126 (bug in cloning of var_expr)
+ * Fixed bug 928631 (external builtin function were not executed in the module they
+ were declared)
* Fixed bug in destruction of exit_catcher_expr
* Fixed bug #867024 (error messages)
* Fixed bug #957580 (stream read failure in StringToCodepointsIteartor)
=== modified file 'cmake_modules/FindICU.cmake'
--- cmake_modules/FindICU.cmake 2012-04-24 14:35:54 +0000
+++ cmake_modules/FindICU.cmake 2012-04-24 20:57:30 +0000
@@ -28,6 +28,8 @@
# (note: in addition to ICU_LIBRARIES)
# ICU_DATA_LIBRARIES - Libraries to link against for ICU data
#
+# ICU_VERSION - ICU's version number.
+#
# Look for the header file.
find_path(
=== modified file 'doc/zorba/ft_intro.dox'
--- doc/zorba/ft_intro.dox 2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_intro.dox 2012-04-24 20:57:30 +0000
@@ -5,9 +5,9 @@
specification.
Additional documentation:
- - \ref ft_stemmer
- - \ref ft_thesaurus
- - \ref ft_tokenizer
+- \ref ft_stemmer
+- \ref ft_thesaurus
+- \ref ft_tokenizer
\section ft_unimplemented Unimplemented Features
@@ -16,11 +16,11 @@
implemented.
The features that are not (completely) implemented are:
- - The <a href="http://www.w3.org/TR/xpath-full-text-10/#ftignoreoption">Ignore Option</a>
- (bug <a href="https://bugs.launchpad.net/zorba/+bug/sf-3187470">3187470</a>).
- - <a href="http://www.w3.org/TR/xpath-full-text-10/#section-score-variables">Score Variables</a>
- and <a href="http://www.w3.org/TR/xpath-full-text-10/#section-using-weights">Using Weights Within a Scored FTContainsExpr</a>
- (bug <a href="https://bugs.launchpad.net/zorba/+bug/sf-3187462">3187462</a>).
+- The <a href="http://www.w3.org/TR/xpath-full-text-10/#ftignoreoption">Ignore Option</a>
+ (bug <a href="https://bugs.launchpad.net/zorba/+bug/866924">866924</a>).
+- <a href="http://www.w3.org/TR/xpath-full-text-10/#section-score-variables">Score Variables</a>
+ and <a href="http://www.w3.org/TR/xpath-full-text-10/#section-using-weights">Using Weights Within a Scored FTContainsExpr</a>
+ (bug <a href="https://bugs.launchpad.net/zorba/+bug/866923">866923</a>).
*/
/* vim:set et sw=2 ts=2: */
=== modified file 'doc/zorba/ft_stemmer.dox'
--- doc/zorba/ft_stemmer.dox 2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_stemmer.dox 2012-04-24 20:57:30 +0000
@@ -56,7 +56,12 @@
public:
typedef /* implementation-defined */ ptr;
+ struct Properties {
+ char const *uri;
+ };
+
virtual void destroy() const = 0;
+ virtual void properties( Properties *result ) const = 0;
virtual void stem( String const &word, locale::iso639_1::type lang, String *result ) const = 0;
protected:
virtual ~Stemmer();
@@ -89,6 +94,8 @@
Note that \c result should always be set to something.
If your stemmer doesn't know how to stem the given word,
you should set \c result to \c word.
+You also need to implement the \c properties() function
+and set the identifying URI of your stemmer.
A very simple stemmer
that stems the word "foobar" to "foo"
@@ -98,6 +105,7 @@
class MyStemmer : public Stemmer {
public:
void destroy() const;
+ void properties( Properties *result ) const;
void stem( String const &word, locale::iso639_1::type lang, String *result ) const;
private:
MyStemmer();
@@ -108,6 +116,10 @@
// Do nothing since we statically allocate a singleton instance of our stemmer.
}
+void MyStemmer::properties( Properties *props ) const {
+ props->uri = "http://my.example.com/zorba/full-text/stemmer";
+}
+
void MyStemmer::stem( String const &word, locale::iso639_1::type lang, String *result ) const {
if ( word == "foobar" )
*result = "foo";
@@ -120,7 +132,6 @@
or a dictionary look-up
to stem many words,
of course.
-
Although not used in this simple example,
\c lang can be used to allow a single stemmer instance
to stem words in more than one language.
@@ -135,16 +146,24 @@
class StemmerProvider {
public:
virtual ~StemmerProvider();
- virtual Stemmer::ptr getStemmer( locale::iso639_1::type lang ) const = 0;
+ virtual bool getStemmer( locale::iso639_1::type lang, Stemmer::ptr *s = 0 ) const = 0;
};
\endcode
+The \c getStemmer() function should return \c true
+only if it can provide a \c Stemmer
+for the given language; \c false otherwise.
+If the \c Stemmer::ptr argument is \c null,
+the caller wants to check only whether the provider
+can provide a stemmer for the given language
+and doesn't want a \c Stemmer instance created or returned.
+
A simple \c StemmerProvider for our simple stemmer can be implemented as:
\code
class MyStemmerProvider : public StemmerProvider {
public:
- Stemmer::ptr getStemmer( locale::iso639_1::type lang ) const;
+ bool getStemmer( locale::iso639_1::type lang Stemmer::ptr *s = 0 ) const;
};
Stemmer::ptr MyStemmerProvider::getStemmer( locale::iso639_1::type lang ) const {
@@ -154,15 +173,14 @@
case iso639_1::en:
case iso639_1::unknown: // Handle "unknown" language since, in many cases, the language is not known.
result.reset( &stemmer );
- break;
+ return true;
default:
//
- // We have no stemmer for the given language: leave the result as null to indicate this.
+ // We have no stemmer for the given language: return false.
// Zorba will then use the built-in stemmer for the given language.
//
- break;
+ return false;
}
- resturn std::move( result );
}
\endcode
=== modified file 'doc/zorba/ft_thesaurus.dox'
--- doc/zorba/ft_thesaurus.dox 2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_thesaurus.dox 2012-04-24 20:57:30 +0000
@@ -44,16 +44,16 @@
To download and install the WordNet database on a Unix-like system,
follow these steps:
- -# Download the WordNet database from
- <a href="http://wordnet.princeton.edu/wordnet/download/">here</a>.
- All you really need are just the database files
- (<code>WNdb-3.0.tar.gz</code>).
- -# Un-gzip and untar the files.
- This will result in a directory dict containing the database files.
- -# Move the dict directory somewhere of your choosing,
- e.g., <code>/usr/local/wordnet-3.0/dict</code>.
- -# Compile the \c dict directory into a Zorba-compatible binary thesaurus
- as described below.
+-# Download the WordNet database from
+ <a href="http://wordnet.princeton.edu/wordnet/download/">here</a>.
+ All you really need are just the database files
+ (<code>WNdb-3.0.tar.gz</code>).
+-# Un-gzip and untar the files.
+ This will result in a directory dict containing the database files.
+-# Move the dict directory somewhere of your choosing,
+ e.g., <code>/usr/local/wordnet-3.0/dict</code>.
+-# Compile the \c dict directory into a Zorba-compatible binary thesaurus
+ as described below.
To compile the WordNet database files,
use the \c zt-wn-compile script
@@ -65,12 +65,12 @@
zt-wn-compile [-v] wordnet_dict_dir [thesaurus_file]
\endcode
- - The \c -v option specifies verbose output.
- - The \e wordnet_dict_dir specifies the full path
- of the WordNet \c dict directory.
- - The \e thesaurus_file specifies the name of the resulting binary file.
- If none is given, it defaults to \c wordnet-en.zth
- ("en" for English and "zth" for "Zorba Thesaurus file").
+- The \c -v option specifies verbose output.
+- The \e wordnet_dict_dir specifies the full path
+ of the WordNet \c dict directory.
+- The \e thesaurus_file specifies the name of the resulting binary file.
+ If none is given, it defaults to \c wordnet-en.zth
+ ("en" for English and "zth" for "Zorba Thesaurus file").
For example:
@@ -78,33 +78,39 @@
zt-wn-compile -v /usr/local/wordnet-3.0/dict
\endcode
-Move the \c wordnet-en.zth file to a location of your choosing.
+To install the \c wordnet-en.zth file,
+move it onto Zorba's <i>library path</i>:
+
+\code
+LIB_PATH/edu/princeton/wordnet/wordnet-en.zth
+\endcode
\subsection ft_thesaurus_precompiled Downloading a Precompiled WordNet Database
Alternatively,
-you can download a precompiled WordNet database from
+you can download a precompiled, little-endian (Intel) CPU WordNet database from
<a href="http://www.zorba-xquery.com/downloads/WordNet-3.0/wordnet-en.zip">here</a>.
\section ft_thesaurus_mappings Thesauri Mappings
In order to use thesauri,
-you need to specify where they are to the Zorba engine
-via one or more thesaurus <i>mappings</i>.
-A <i>mapping</i> maps a symbolic URI to URI for an actual thesaurus.
+you need to specify what symbolic URI(s) <i>map</i>
+to what thesauri.
A mapping is of the form:
-<i>from_uri</i><code>:=</code><b>[</b><i>implementation</i><code>|</code><b>]</b><i>to_uri</i>
+<i>from_uri</i><code>:=</code><i>implementation-scheme</i><code>:</code><i>to_uri</i>
For example:
\code
-http://wordnet.princeton.edu:=wordnet|/usr/local/zorba/thesauri/wordnet-en.zth
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
\endcode
says that the symbolic URI \c http://wordnet.princeton.edu
maps to the WordNet implementation
-having a database file at the given path.
+having a database file at the given sub-path
+\c edu/princeton/wordnet
+on Zorba's library path.
Once a mapping is established for a symbolic URI,
it can be used in a query:
@@ -114,13 +120,8 @@
using thesaurus at "http://wordnet.princeton.edu"
\endcode
-If the \e implementation is omitted,
-it defaults to \c wordnet.
As a special-case,
-the \e from_uri can be \c default or
-\code
-##default
-\endcode
+the \e from_uri can be \c default or \c ##default
to allow for specifying the default thesaurus
as was done for the first example on this page.
@@ -130,7 +131,7 @@
use one or more –thesaurus options:
\code
-zorba --thesaurus default:=/usr/local/zorba/thesauri/wordnet-en.zth ...
+zorba --thesaurus default:=wordnet://wordnet.princeton.edu ...
\endcode
\section ft_thesaurus_rels Thesaurus Relationships
@@ -423,25 +424,26 @@
If no levels are specified in a query,
Zorba defaults the WordNet implementation to be 2 levels.
-The rationale can be found
-<a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=11444">here</a>.
+(The rationale can be found
+<a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=11444">here</a>.)
\section ft_thesaurus_providing Providing Your Own Thesaurus
Using the Zorba C++ API,
you can provide your own thesaurus
-by deriving from three classes:
+by deriving from four classes:
\c Thesaurus,
\c Thesaurus::iterator,
+\c ThesaurusProvider,
and
-\c ThesaurusProvider.
+\c URLResolver.
\subsection ft_class_thesaurus The Thesaurus Class
The \c Thesaurus class is:
\code
-class Thesaurus : public Resource {
+class Thesaurus {
public:
typedef /* implementation-defined */ ptr;
typedef /* implementation-defined */ range_type;
@@ -457,15 +459,15 @@
virtual iterator::ptr lookup( String const &phrase, String const &relationship, range_type at_least, range_type at_most ) const = 0;
- virtual void destroy() const = 0; // interited from Resource
+ virtual void destroy() const = 0;
protected:
virtual ~Thesaurus();
};
\endcode
-For details about the \c ptr type,
-the \c destroy() function,
-and why the destructor is \c protected,
+For details about the \c ptr types,
+the \c destroy() functions,
+and why the destructors are \c protected,
see the \ref memory_management document.
To implement the \c Thesaurus
@@ -482,18 +484,19 @@
</tr>
<tr>
<td>\c at_least</td>
- <td>The The minimum number of levels within the thesaurus to be traversed.</td>
+ <td>The minimum number of levels within the thesaurus to be traversed.</td>
</tr>
<tr>
<td>\c at_most</td>
- <td>The The maximum number of levels within the thesaurus to be traversed.</td>
+ <td>The maximum number of levels within the thesaurus to be traversed.</td>
</tr>
</table>
The \c lookup() function returns a pointer to an \c iterator
that is used to iterate over the phrase's synonyms.
-
-A very simple thesaurus
+You also need to implement an \c iterator.
+A very simple \c Thesaurus
+and its \c iterator
can be implemented as:
\code
@@ -505,53 +508,49 @@
//
// Define a simple thesaurus data structure as a map from a phrase to a list of its synonyms.
//
- typedef std::list<String> synonyms_t;
- typedef std::map<String,synonyms_t const*> thesaurus_t;
+ typedef std::list<String> synonyms_type;
+ typedef std::map<String,synonyms_type const*> thesaurus_data_type;
- static thesaurus_t const& get_thesaurus();
+ static thesaurus_data_type const& get_thesaurus_data();
class iterator : public Thesaurus::iterator {
public:
- iterator( synonyms_t const &s ) : synonyms_( s ), i_( s.begin() ) { }
+ iterator( synonyms_type const &s ) : synonyms_( s ), i_( s.begin() ) { }
void destroy();
bool next( String *synonym );
private:
- synonyms_t const &synonyms_; // synonyms to iterate over
- synonyms_t::const_iterator i_; // current iterator position
+ synonyms_type const &synonyms_; // synonyms to iterate over
+ synonyms_type::const_iterator i_; // current iterator position
};
};
void MyThesaurus::destroy() const {
- // Do nothing since we statically allocate a singleton instance of our thesaurus.
+ // Do nothing since we statically allocate a singleton instance of our Thesaurus.
}
-MyThesaurus::thesaurus_t const& MyThesaurus::get_thesaurus() {
- static thesaurus_t thesaurus;
- if ( thesaurus.empty() ) {
- //
- // Construct a thesaurus "by hand" for this example. A real thesaurus would probably
- // be read from disk.
- //
+MyThesaurus::thesaurus_data_type const& MyThesaurus::get_thesaurus_data() {
+ static thesaurus_data_type thesaurus_data;
+ if ( thesaurus_data.empty() ) {
+ //
+ // Construct thesaurus data "by hand" for this example. A real thesaurus would probably be read from disk.
// Note that every list of synonyms must always include the original phrase.
//
- static synonyms_t synonyms;
+ static synonyms_type synonyms;
synonyms.push_back( "foo" );
synonyms.push_back( "foobar" );
- thesaurus[ "foo" ] = &synonyms;
- thesaurus[ "foobar" ] = &synonyms;
+ thesaurus_data[ "foo" ] = &synonyms;
+ thesaurus_data[ "foobar" ] = &synonyms;
}
- return thesaurus;
+ return thesaurus_data;
}
-\endcode
-\code
MyThesaurus::iterator::ptr MyThesaurus::lookup( String const &phrase, String const &relationship,
range_type at_least, range_type at_most ) const {
- static thesaurus_t const &thesaurus = get_thesaurus();
- thesaurus_t::const_iterator const i = thesaurus.find( phrase );
+ static thesaurus_data_type const &thesaurus_data = get_thesaurus_data();
+ thesaurus_data_type::const_iterator const entry = thesaurus_data.find( phrase );
iterator::ptr result;
- if ( i != thesaurus.end() )
- result.reset( new iterator( *i->second ) );
+ if ( entry != thesaurus_data.end() )
+ result.reset( new iterator( *entry->second ) );
return std::move( result );
}
@@ -572,13 +571,71 @@
A real thesaurus would load a large number of synonyms,
of course.
+\subsection ft_class_thesaurus_provider The ThesaurusProvider Class
+
+The \c ThesaurusProvider class is:
+
+\code
+class ThesaurusProvider : public Resource {
+public:
+ typedef /* implementation-defined */ ptr;
+
+ virtual bool getThesaurus( locale::iso639_1::type lang, Thesaurus::ptr *thesaurus = 0 ) const = 0;
+ void destroy() const; // inherited from Resource
+};
+\endcode
+
+To implement a \c ThesaurusProvider,
+you need to implement the \c getThesaurus() function where:
+
+<table>
+ <tr>
+ <td>\c lang</td>
+ <td>The desired language of the thesaurus.</td>
+ </tr>
+ <tr>
+ <td>\c thesaurus</td>
+ <td>If not \c null, set to point to a thesaurus for \c lang.</td>
+ </tr>
+</table>
+
+The \c getThesaurus() function returns \c true
+only if it can provide a thesaurus for the given language.
+Continuing with the example,
+a very simple \c ThesaurusProvider
+can be implemented as:
+
+\code
+class MyThesaurusProvider : pulic ThesaurusProvider {
+public:
+ void destroy() const;
+ bool getThesaurus( iso639_1::type lang, Thesaurus::ptr* = 0 ) const;
+};
+
+void MyThesaurusProvider::destroy() const {
+ // Do nothing since we statically allocate a singleton instance of our ThesaurusProvider.
+}
+
+bool MyThesaurusProvider::getThesaurus( iso639_1::type lang, Thesaurus::ptr *result ) const {
+ //
+ // Since our tiny thesaurus contains only universally known words, we don't bother checking lang
+ // and always return true.
+ //
+ static MyThesaurus thesaurus;
+ if ( result )
+ result->reset( &thesaurus );
+ return true;
+}
+\endcode
+
\subsection ft_class_thesaurus_resolver A Thesaurus URL Resolver Class
-In addition to a \c Thesaurus,
+In addition to a \c Thesaurus
+and \c ThesaurusProvider,
you must also implement a "thesaurus resolver" class
that,
-given a URL and a language,
-provides a \c Thesaurus for that language.
+given a URI,
+provides a \c ThesaurusProvider for that URI.
A simple \c ThesaurusURLResolver
for our simple thesaurus can be implemented as:
@@ -591,23 +648,12 @@
String const url_;
};
-Resource*
-ThesaurusURLResolver::resolveURL( String const &url, EntityData const *data ) const {
- ThesaurusEntityData const *const t_data = dynamic_cast<ThesaurusEntityData const*>( data );
- assert( t_data );
- static MyThesaurus thesaurus;
- if ( url == url_ )
- switch ( t_data->getLanguage() ) {
- case locale::iso639_1::en:
- case locale::iso639_1::unknown:
- //
- // Here, we could test to ensure that the language of our thesaurus matches the
- // language sought, but in our case, we want our thesaurus to be used for all
- // languages since "foo" and "foobar" are universal.
- //
- default:
- return &thesaurus;
- }
+Resource* ThesaurusURLResolver::resolveURL( String const &url, EntityData const *data ) const {
+ if ( data->getKind() == EntityData::THESAURUS )
+ static MyThesaurusProvider provider;
+ if ( uri == uri_ )
+ return &provider;
+ }
return 0;
}
\endcode
=== modified file 'doc/zorba/ft_tokenizer.dox'
--- doc/zorba/ft_tokenizer.dox 2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_tokenizer.dox 2012-04-24 20:57:30 +0000
@@ -5,14 +5,25 @@
The Zorba XQuery processor implements the
<a href="http://www.w3.org/TR/xpath-full-text-10/">XQuery and XPath Full Text 1.0</a>
specification that, among other things,
-tokenizes a string into a sequence of tokens.
-See
-<a href="http://www.w3.org/TR/xpath-full-text-10/#TokenizationSec">Tokenization</a>.
-
-The initial implementation of the toknenizer
-uses the one provided by the
-<a href="http://site.icu-project.org/">ICU library</a>.
-However, you can provide your own tokenizer instead.
+<a ref="http://www.w3.org/TR/xpath-full-text-10/#TokenizationSec">tokenizes</a>
+a string into a sequence of tokens.
+
+\section ft_tokenizer_tokization Tokenization
+
+Using the
+<a href="http://site.icu-project.org/">ICU library</a>,
+Zorba's implementation of tokenization
+considers only alpha-numeric sequences of characters to be part of a token;
+whitespace and punctuation characters are not
+and separate tokens.
+However, alpha-numeric sequences matching the regular expression
+<code>[0-9][.,][0-9]</code>
+are retained as part of a token, e.g.:
+"98.6" and "1,432.58" are tokens.
+
+Alternatively,
+you can implement your own tokenizer
+by deriving from the \c Tokenizer class.
\section ft_class_tokenizer The Tokenizer Class
@@ -36,33 +47,43 @@
class Callback {
public:
- typedef Tokenizer::size_type size_type;;
+ typedef Tokenizer::size_type size_type;
virtual ~Callback();
- virtual void operator()( char const *utf8_s, size_type utf8_len,
- size_type token_no, size_type sent_no, size_type para_no,
- void *payload = 0 ) = 0;
- };
-
- enum ElementTraceOptions {
- trace_none = 0x0, // Trace no elements.
- trace_begin = 0x1, // Trace the beginning of elements.
- trace_end = 0x2 // Trace the ending of elements.
- };
+ virtual void token( char const *utf8_s, size_type utf8_len, locale::iso639_1::type lang,
+ size_type token_no, size_type sent_no, size_type para_no,
+ Item const *item = 0 ) = 0;
+ };
+
+ struct Properties {
+ typedef std::vector<locale::iso639_1::type> languages_type;
+
+ bool comments_separate_tokens;
+ bool elements_separate_tokens;
+ bool processing_instructions_separate_tokens;
+ languages_type languages;
+ char const *uri;
+ };
+
+ virtual void properties( Properties *result ) const = 0;
virtual void destroy() const = 0;
- virtual void element( Item const &qname, int trace_options );
Numbers& numbers();
Numbers const& numbers() const;
- int trace_options() const;
-
- virtual void tokenize( char const *utf8_s, size_type utf8_len, locale::iso639_1::type lang,
- bool wildcards, Callback &callback, void *payload = 0 ) = 0;
+
+ void tokenize_node( Item const &node, locale::iso639_1::type lang, Callback &callback );
+
+ virtual void tokenize_string( char const *utf8_s, size_type utf8_len, locale::iso639_1::type lang,
+ bool wildcards, Callback &callback, Item const *item = 0 ) = 0;
protected:
- Tokenizer( Numbers&, int trace_options = trace_none );
+ Tokenizer( Numbers& );
virtual ~Tokenizer();
+
+ bool find_lang_attribute( Item const&, locale::iso639_1::type *lang );
+ virtual void item( Item const&, bool entering );
+ virtual void tokenize_node_impl( Item const&, locale::iso639_1::type, Callback&, bool tokenize_acp );
};
\endcode
@@ -76,8 +97,8 @@
It simply keeps track of the current
token, sentence, and paragraph numbers.
-To implement the \c Tokenizer,
-you need to implement the \c %tokenize() function where:
+To implement a \c Tokenizer,
+you need to implement the \c %tokenize_string() function where:
<table>
<tr>
@@ -115,9 +136,13 @@
</td>
</tr>
<tr>
- <td>\c payload</td>
+ <td>\c item</td>
<td>
- Optional implementation-defined data.
+ The \c Item whence this token came.
+ If the token occurred within an element,
+ the \c Item is the text node.
+ If the token occurred within an attribute,
+ the \c Item is the attribute node.
</td>
</tr>
</table>
@@ -127,21 +152,30 @@
However,
the things a tokenizer should take into consideration include:
- - Detecting sentence termination ('.', '?', and '!' characters).
- - Handling floating-point numbers with possible thousands separators
- in US and European formats, e.g. "98.7", "98,7", "10,000", etc.
- - Distinguishing '.' used as a sentence terminator
- from '.' used as a decimal point.
- - Handling apostrophies, e.g., "men's".
- - Handling acronyms, e.g., "AT&T".
-
-\subsection ft_paragraphs Paragraphs
+- Detecting sentence termination ('.', '?', and '!' characters).
+- Handling floating-point numbers with possible thousands separators
+ in US and European formats, e.g. "98.7", "98,7", "10,000", etc.
+- Distinguishing '.' used as a sentence terminator
+ from '.' used as a decimal point.
+- Handling apostrophies, e.g., "men's".
+- Handling acronyms, e.g., "AT&T".
+
+The task of iterating over an XML element's child nodes
+is done by \c tokenize_node_impl().
+Its default implementation
+treats XML elements, comments, and processing instructions
+as token separators.
+(See \ref ft_tokenizer_properties.)
+If you want to change that,
+you need to override \c tokenize_node_impl().
+
+\subsection ft_tokenizer_paragraphs Paragraphs
By default,
Zorba increments the current paragraph number once
for each XML element encountered.
However,
-this doens't work well for mixed content.
+this doesn't work well for mixed content.
For example, in the XHTML:
\code
<p>The <em>best</em> thing ever!</p>
@@ -150,31 +184,65 @@
but Zorba will consider that 3 paragraphs by default.
Your tokenizer can take control over when the paragraph number is incremented
-by passing the bitwise-or
-of the \c ElementTraceOptions values
-to the constructor
-and overriding the \c element() function.
-The \c element() function is passed the QName of the current XML element
-and (depending on the initial value passed to the constructor)
-one of \c trace_begin or \c trace_end.
-Note that this function is called
-only if the trace options value
-passed to the constructor
-was non-zero.
+by overriding the \c item() function.
+The \c item() function is passed the \c Item of the current XML element
+and whether the item is being entered or exited.
For example,
-the \c element() function for tokenizing XHTML
+the \c item() function for tokenizing XHTML
would be along the lines of:
\code
-void MyTokenizer::element( Item const &qname, int trace_options ) {
- if ( trace_options & trace_end )
- return;
- String const name( qname.getLocalName() );
- if ( /* qname is an XHTML block-level element */ )
- ++numbers().para;
+void MyTokenizer::item( Item const &item, bool entering ) {
+ if ( entering && item.isNode() && item.getNodeKind() == store::StoreConsts::elementNode ) {
+ Item qname;
+ item.getNodeName( qname );
+ if ( /* qname matches an XHTML block-level element's name */ )
+ ++numbers().para;
}
\endcode
+\subsection ft_tokenizer_properties Properties
+
+To implement a \c Tokenizer,
+you need also to implement the \c %properties() function
+that fills in the \c Properties struct where:
+
+<table>
+ <tr>
+ <td>\c comments_separate_tokens</td>
+ <td>
+ If \c true, XML comments separate tokens. For example,
+ <code>net<!-- -->work</code> would be 2 tokens instead of 1.
+ </td>
+ </tr>
+ <tr>
+ <td>\c elements_separate_tokens</td>
+ <td>
+ If \c true, XML elements separate tokens. For example,
+ <code><b>B</b>old</code> would be 2 tokens instead of 1.
+ </td>
+ </tr>
+ <tr>
+ <td>\c processing_instructions_separate_tokens</td>
+ <td>
+ If \c true, XML processing instructions separate tokens. For example,
+ <code>net<?PI pi?>work</code> would be 2 tokens instead of 1.
+ </td>
+ </tr>
+ <tr>
+ <td>\c languages</td>
+ <td>
+ The list of languages supported by the tokenizer.
+ </td>
+ </tr>
+ <tr>
+ <td>\c uri</td>
+ <td>
+ The URI that uniquely identifies the %Tokenizer.
+ </td>
+ </tr>
+</table>
+
\section ft_class_tokenizer_provider The TokenizerProviderClass
In addition to a \c Tokenizer,
@@ -185,20 +253,51 @@
class TokenizerProvider {
public:
virtual ~TokenizerProvider();
- virtual Tokenizer::ptr getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers &numbers ) const = 0;
+ virtual bool getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers *numbers = 0, Tokenizer::ptr* = 0 ) const = 0;
};
\endcode
+Specifically, you need to implement the \c getTokenizer() function where:
+
+<table>
+ <tr>
+ <td>\c lang</td>
+ <td>The language to tokenize.</td>
+ </tr>
+ <tr>
+ <td>\c num</td>
+ <td>
+ The \c Numbers to use.
+ If \c null,
+ \a t is not set.
+ </td>
+ </tr>
+ <tr>
+ <td>\c t</td>
+ <td>
+ If not \c null,
+ set to point to a Tokenizer for \a lang.
+ </td>
+ </tr>
+</table>
+
A simple \c TokenizerProvider for our tokenizer can be implemented as:
\code
class MyTokenizerProvider : public TokenizerProvider {
public:
- Tokenizer::ptr getTokenizer( locale::iso639_1::type lang ) const;
+ getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers* = 0, Tokenizer::ptr* = 0 ) const;
};
-Tokenizer::ptr MyTokenizerProvider::getTokenizer( locale::iso639_1::type lang const {
- return Tokenizer::ptr( new MyTokenizer );
+bool MyTokenizerProvider::getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers *num, Tokenizer::ptr *t ) const {
+ switch ( lang ) {
+ case iso639_1::en:
+ if ( num && t )
+ t->reset( new MyTokenizer );
+ return true;
+ default:
+ return false;
+ }
}
\endcode
=== modified file 'include/zorba/locale.h'
--- include/zorba/locale.h 2012-04-24 12:39:38 +0000
+++ include/zorba/locale.h 2012-04-24 20:57:30 +0000
@@ -22,24 +22,198 @@
///////////////////////////////////////////////////////////////////////////
+ /**
+ * Defines constants for all ISO 639-1 language codes.
+ */
namespace iso639_1 {
enum type {
unknown,
- da, // Danish
- de, // German
- en, // English
- es, // Spanish
- fi, // Finnish
- fr, // French
- hu, // Hungarian
- it, // Italian
- nl, // Dutch
- no, // Norwegian
- pt, // Portuguese
- ro, // Romanian
- ru, // Russian
- sv, // Swedish
- tr, // Turkish
+ aa, ///< Afar
+ ab, ///< Abkhazian
+ ae, ///< Avestan
+ af, ///< Afrikaans
+ ak, ///< Akan
+ am, ///< Amharic
+ an, ///< Aragonese
+ ar, ///< Arabic
+ as, ///< Assamese
+ av, ///< Avaric
+ ay, ///< Aymara
+ az, ///< Azerbaijani
+ ba, ///< Bashkir
+ be, ///< Byelorussian
+ bg, ///< Bulgarian
+ bh, ///< Bihari
+ bi, ///< Bislama
+ bm, ///< Bambara
+ bn, ///< Bengali; Bangla
+ bo, ///< Tibetan
+ br, ///< Breton
+ bs, ///< Bosnian
+ ca, ///< Catalan
+ ce, ///< Chechen
+ ch, ///< Chamorro
+ co, ///< Corsican
+ cr, ///< Cree
+ cs, ///< Czech
+ cu, ///< Church Slavic; Church Slavonic
+ cv, ///< Chuvash
+ cy, ///< Welsh
+ da, ///< Danish
+ de, ///< German
+ dv, ///< Divehi
+ dz, ///< Bhutani
+ ee, ///< Ewe
+ el, ///< Greek
+ en, ///< English
+ eo, ///< Esperanto
+ es, ///< Spanish
+ et, ///< Estonian
+ eu, ///< Basque
+ fa, ///< Persian
+ ff, ///< Fulah
+ fi, ///< Finnish
+ fj, ///< Fiji
+ fo, ///< Faroese
+ fr, ///< French
+ fy, ///< Frisian
+ ga, ///< Irish
+ gd, ///< Scots Gaelic
+ gl, ///< Galician
+ gn, ///< Guarani
+ gu, ///< Gujarati
+ gv, ///< Manx
+ ha, ///< Hausa
+ he, ///< Hebrew (formerly iw)
+ hi, ///< Hindi
+ ho, ///< Hiri Motu
+ hr, ///< Croatian
+ ht, ///< Haitian Creole
+ hu, ///< Hungarian
+ hy, ///< Armenian
+ hz, ///< Herero
+ ia, ///< Interlingua
+ id, ///< Indonesian (formerly in)
+ ie, ///< Interlingue
+ ig, ///< Igbo
+ ii, ///< Nuosu
+ ik, ///< Inupiak
+ io, ///< Ido
+ is, ///< Icelandic
+ it, ///< Italian
+ iu, ///< Inuktitut
+ ja, ///< Japanese
+ jv, ///< Javanese
+ ka, ///< Georgian
+ kg, ///< Kongo
+ ki, ///< Gikuyu
+ kj, ///< Kuanyama
+ kk, ///< Kazakh
+ kl, ///< Greenlandic
+ km, ///< Cambodian
+ kn, ///< Kannada
+ ko, ///< Korean
+ kr, ///< Kanuri
+ ks, ///< Kashmiri
+ ku, ///< Kurdish
+ kv, ///< Komi
+ kw, ///< Cornish
+ ky, ///< Kirghiz
+ la, ///< Latin
+ lb, ///< Letzeburgesch
+ lg, ///< Ganda
+ li, ///< Limburgan; Limburger; Limburgish
+ ln, ///< Lingala
+ lo, ///< Laothian
+ lt, ///< Lithuanian
+ lu, ///< Luba-Katanga
+ lv, ///< Latvian
+ mg, ///< Malagasy
+ mh, ///< Marshallese
+ mi, ///< Maori
+ mk, ///< Macedonian
+ ml, ///< Malayalam
+ mn, ///< Mongolian
+ mo, ///< Moldavian
+ mr, ///< Marathi
+ ms, ///< Malay
+ mt, ///< Maltese
+ my, ///< Burmese
+ na, ///< Nauru
+ nb, ///< Norwegian Bokmal
+ nd, ///< Ndebele, North
+ ne, ///< Nepali
+ ng, ///< Ndonga
+ nl, ///< Dutch
+ nn, ///< Norwegian Nynorsk
+ no, ///< Norwegian
+ nr, ///< Ndebele, South
+ nv, ///< Navajo; Navaho
+ ny, ///< Chichewa; Chewa; Nyanja
+ oc, ///< Occitan
+ oj, ///< Ojibwa
+ om, ///< Oromo
+ or_, ///< Oriya
+ os, ///< Ossetian; Ossetic
+ pa, ///< Panjabi; Punjabi
+ pi, ///< Pali
+ pl, ///< Polish
+ ps, ///< Pashto, Pushto
+ pt, ///< Portuguese
+ qu, ///< Quechua
+ rm, ///< Romansh
+ rn, ///< Kirundi
+ ro, ///< Romanian
+ ru, ///< Russian
+ rw, ///< Kinyarwanda
+ sa, ///< Sanskrit
+ sc, ///< Sardinian
+ sd, ///< Sindhi
+ se, ///< Northern Sami
+ sg, ///< Sangho
+ sh, ///< Serbo-Croatian
+ si, ///< Sinhalese
+ sk, ///< Slovak
+ sl, ///< Slovenian
+ sm, ///< Samoan
+ sn, ///< Shona
+ so, ///< Somali
+ sq, ///< Albanian
+ sr, ///< Serbian
+ ss, ///< Siswati
+ st, ///< Sesotho
+ su, ///< Sundanese
+ sv, ///< Swedish
+ sw, ///< Swahili
+ ta, ///< Tamil
+ te, ///< Telugu
+ tg, ///< Tajik
+ th, ///< Thai
+ ti, ///< Tigrinya
+ tk, ///< Turkmen
+ tl, ///< Tagalog
+ tn, ///< Setswana
+ to, ///< Tonga
+ tr, ///< Turkish
+ ts, ///< Tsonga
+ tt, ///< Tatar
+ tw, ///< Twi
+ ty, ///< Tahitian
+ ug, ///< Uighur
+ uk, ///< Ukrainian
+ ur, ///< Urdu
+ uz, ///< Uzbek
+ ve, ///< Venda
+ vi, ///< Vietnamese
+ vo, ///< Volapuk
+ wa, ///< Walloon
+ wo, ///< Wolof
+ xh, ///< Xhosa
+ yi, ///< Yiddish
+ yo, ///< Yoruba
+ za, ///< Zhuang
+ zh, ///< Chinese
+ zu, ///< Zulu
NUM_ENTRIES
};
}
=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
--- include/zorba/pregenerated/diagnostic_list.h 2012-04-24 12:39:38 +0000
+++ include/zorba/pregenerated/diagnostic_list.h 2012-04-24 20:57:30 +0000
@@ -454,6 +454,14 @@
extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8402_THESAURUS_ENDIANNESS_MISMATCH;
extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8403_THESAURUS_DATA_ERROR;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8404_STEM_LANG_NOT_SUPPORTED;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8405_STOP_WORDS_LANG_NOT_SUPPORTED;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED;
#endif
extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQD0001_PREFIX_NOT_DECLARED;
=== modified file 'include/zorba/stemmer.h'
--- include/zorba/stemmer.h 2012-04-24 12:39:38 +0000
+++ include/zorba/stemmer.h 2012-04-24 20:57:30 +0000
@@ -52,6 +52,23 @@
virtual void destroy() const = 0;
/**
+ * Various properties of this %Stemmer.
+ */
+ struct Properties {
+ /**
+ * The URI that uniquely identifies this %Stemmer.
+ */
+ char const *uri;
+ };
+
+ /**
+ * Gets the Properties of this %Stemmer.
+ *
+ * @param result The Properties to populate.
+ */
+ virtual void properties( Properties *result ) const = 0;
+
+ /**
* Stems the given word.
*
* @param word The word to stem.
@@ -66,7 +83,7 @@
};
/**
- * A %StemmerProvider, given an language, provies a stemmer for it.
+ * A %StemmerProvider, given a language, provides a Stemmer for it.
*/
class ZORBA_DLL_PUBLIC StemmerProvider {
public:
@@ -76,10 +93,12 @@
* Gets a Stemmer for the given language.
*
* @param lang The language to get a Stemmer for.
- * @return The relevant Stemmer or \c NULL if no stemmer for the given
- * language is available.
+ * @param s If not \c null, set to point to a Stemmer for \a lang.
+ * @return Returns \c true only if this provider can provide a stemmer for
+ * \a lang.
*/
- virtual Stemmer::ptr getStemmer( locale::iso639_1::type lang ) const = 0;
+ virtual bool getStemmer( locale::iso639_1::type lang,
+ Stemmer::ptr *s = 0 ) const = 0;
};
///////////////////////////////////////////////////////////////////////////////
=== modified file 'include/zorba/thesaurus.h'
--- include/zorba/thesaurus.h 2012-04-24 12:39:38 +0000
+++ include/zorba/thesaurus.h 2012-04-24 20:57:30 +0000
@@ -32,25 +32,13 @@
///////////////////////////////////////////////////////////////////////////////
/**
- * Contains additional data for URIMappers and URLResolvers
- * when mapping/resolving a Thesaurus URI.
- */
-class ZORBA_DLL_PUBLIC ThesaurusEntityData : public EntityData {
-public:
- /**
- * Gets the language for which a thesaurus is being requested.
- *
- * @return said language.
- */
- virtual locale::iso639_1::type getLanguage() const = 0;
-};
-
-/**
- * A %Thesaurus is-a Resource for thesaurus implementations.
- */
-class ZORBA_DLL_PUBLIC Thesaurus : public Resource {
-public:
- typedef std::unique_ptr<Thesaurus,internal::ztd::destroy_delete<Thesaurus> >
+ * A %Thesaurus provides a way to look up related phrases for a given phrase.
+ */
+class ZORBA_DLL_PUBLIC Thesaurus {
+public:
+ typedef std::unique_ptr<
+ Thesaurus const,internal::ztd::destroy_delete<Thesaurus const>
+ >
ptr;
/**
@@ -88,11 +76,11 @@
* Destroys this %Thesaurus.
* This function is called by Zorba when the %Thesaurus is no longer needed.
*
- * If your URLResolver dynamically allocates %Thesaurus objects, then the
+ * If your implementation dynamically allocates %Thesaurus objects, then your
* implementation can simply be (and usually is) <code>delete this</code>.
*
- * If your URLResolver returns a pointer to a static %Thesaurus object, then
- * the implementation should do nothing.
+ * If your implementation returns a pointer to a static %Thesaurus object,
+ * then your implementation should do nothing.
*/
virtual void destroy() const = 0;
@@ -119,6 +107,32 @@
///////////////////////////////////////////////////////////////////////////////
+/**
+ * A %ThesaurusProvider is-a Resource for providing thesauri for a given
+ * language.
+ */
+class ZORBA_DLL_PUBLIC ThesaurusProvider : public Resource {
+public:
+ typedef std::unique_ptr<
+ ThesaurusProvider const,
+ internal::ztd::destroy_delete<ThesaurusProvider const>
+ >
+ ptr;
+
+ /**
+ * Gets a Thesaurus for the given language.
+ *
+ * @param lang The desired language of the thesaurus.
+ * @param t If not \c null, set to point to a Thesaurus for \a lang.
+ * @return Returns \c true only if this provider can provide a thesaurus for
+ * \a lang.
+ */
+ virtual bool getThesaurus( locale::iso639_1::type lang,
+ Thesaurus::ptr *t = 0 ) const = 0;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace zorba
#endif /* ZORBA_NO_FULL_TEXT */
#endif /* ZORBA_THESAURUS_API_H */
=== modified file 'include/zorba/tokenizer.h'
--- include/zorba/tokenizer.h 2012-04-24 12:39:38 +0000
+++ include/zorba/tokenizer.h 2012-04-24 20:57:30 +0000
@@ -18,6 +18,8 @@
#ifndef ZORBA_TOKENIZER_API_H
#define ZORBA_TOKENIZER_API_H
+#include <vector>
+
#include <zorba/config.h>
#include <zorba/locale.h>
#include <zorba/internal/unique_ptr.h>
@@ -67,8 +69,6 @@
* A %Callback is called once per token.
* This is only internally by Zorba.
* You do not need to derive from this class.
- * The only thing you need to do is call the callback's \c operator() once
- * for each token you parse in \c tokenize().
*/
class Callback {
public:
@@ -77,19 +77,75 @@
virtual ~Callback();
/**
+ * This member-function is called whenever an item that is being tokenized
+ * is entered or exited.
+ *
+ * @param item The item being entered or exited.
+ * @param entering If \c true, the item is being entered; if \c false, the
+ * item is being exited.
+ */
+ virtual void item( Item const &item, bool entering );
+
+ /**
* This member-function is called once per token.
*
* @param utf8_s The UTF-8 token string. It is not null-terminated.
* @param utf8_len The number of bytes in the token string.
+ * @param lang The language of the token.
* @param token_no The token number. Token numbers start at 0.
* @param sent_no The sentence number. Sentence numbers start at 1.
* @param para_no The paragraph number. Paragraph numbers start at 1.
- * @param payload Optional user-defined data.
- */
- virtual void operator()( char const *utf8_s, size_type utf8_len,
- size_type token_no, size_type sent_no,
- size_type para_no, void *payload = 0 ) = 0;
- };
+ * @param item The Item this token is from, if any.
+ */
+ virtual void token( char const *utf8_s, size_type utf8_len,
+ locale::iso639_1::type lang,
+ size_type token_no, size_type sent_no,
+ size_type para_no, Item const *item = 0 ) = 0;
+ };
+
+ /////////////////////////////////////////////////////////////////////////////
+
+ /**
+ * Various properties of this %Tokenizer.
+ */
+ struct Properties {
+ typedef std::vector<locale::iso639_1::type> languages_type;
+
+ /**
+ * If \c true, XML comments separate tokens. For example,
+ * \c net<!---->work would be 2 tokens instead of 1.
+ */
+ bool comments_separate_tokens;
+
+ /**
+ * If \c true, XML elements separate tokens. For example,
+ * \c <b>B</b>old would be 2 tokens instead of 1.
+ */
+ bool elements_separate_tokens;
+
+ /**
+ * If \c true, XML processing instructions separate tokens. For example,
+ * <code>net<?PI pi?>work</code> would be 2 tokens instead of 1.
+ */
+ bool processing_instructions_separate_tokens;
+
+ /**
+ * The set of languages supported.
+ */
+ languages_type languages;
+
+ /**
+ * The URI that uniquely identifies this %Tokenizer.
+ */
+ char const* uri;
+ };
+
+ /**
+ * Gets the Properties of this %Tokenizer.
+ *
+ * @param result The Properties to populate.
+ */
+ virtual void properties( Properties *result ) const = 0;
/////////////////////////////////////////////////////////////////////////////
@@ -106,39 +162,6 @@
virtual void destroy() const = 0;
/**
- * Trace options for XML elements combined via bitwise-or.
- */
- enum ElementTraceOptions {
- trace_none = 0x0, ///< Trace no elements.
- trace_begin = 0x1, ///< Trace the beginning of elements.
- trace_end = 0x2 ///< Trace the ending of elements.
- };
-
- /**
- * Gets the trace options. If the value is \c trace_none, then the paragraph
- * number will be incremented upon entering an XML element; if the value is
- * anything other than \c trace_none, then the tokenizer assumes
- * responsibility for incrementing the paragraph number.
- *
- * @return Returns said options.
- */
- int trace_options() const {
- return trace_options_;
- }
-
- /**
- * This function is called whenever an XML element is entered during
- * tokenization. Note that this function is called only if \c
- * trace_options() returns non-zero.
- *
- * @param qname The element's QName.
- * @param trace_options The bitwise-or of the trace option(s) in effect for a
- * particular call.
- * @see trace_options()
- */
- virtual void element( Item const &qname, int trace_options );
-
- /**
* Gets this %Tokenizer's associated Numbers.
*
* @return Returns said Numbers.
@@ -153,6 +176,16 @@
Numbers const& numbers() const;
/**
+ * Tokenizes the given node.
+ *
+ * @param node The node to tokenize.
+ * @param lang The default language to use.
+ * @param callback The Callback to call once per token.
+ */
+ void tokenize_node( Item const &node, locale::iso639_1::type lang,
+ Callback &callback );
+
+ /**
* Tokenizes the given string.
*
* @param utf8_s The UTF-8 string to tokenize. It need not be
@@ -162,11 +195,11 @@
* @param wildcards If \c true, allows XQuery wildcard syntax characters to
* be part of tokens.
* @param callback The Callback to call once per token.
- * @param payload Optional user-defined data.
+ * @param item The Item this string is from, if any.
*/
- virtual void tokenize( char const *utf8_s, size_type utf8_len,
- locale::iso639_1::type lang, bool wildcards,
- Callback &callback, void *payload = 0 ) = 0;
+ virtual void tokenize_string( char const *utf8_s, size_type utf8_len,
+ locale::iso639_1::type lang, bool wildcards,
+ Callback &callback, Item const *item = 0 ) = 0;
/////////////////////////////////////////////////////////////////////////////
@@ -175,27 +208,71 @@
* Constructs a %Tokenizer.
*
* @param numbers the Numbers to use.
- * @param trace_options The bitwise-or of the available trace options, if
- * any.
*/
- Tokenizer( Numbers &numbers, int trace_options = trace_none );
+ Tokenizer( Numbers &numbers );
/**
* Destroys a %Tokenizer.
*/
virtual ~Tokenizer() = 0;
+ /**
+ * Given an element, finds its \c xml:lang attribute, if any, and gets its
+ * value.
+ *
+ * @param element The element to check.
+ * @param lang A pointer to where to put the found language, if any.
+ * @return Returns \c true only if an \c xml:lang attribute is found and the
+ * value is a known language.
+ */
+ bool find_lang_attribute( Item const &element, locale::iso639_1::type *lang );
+
+ /**
+ * This member-function is called whenever an item that is being tokenized is
+ * entered or exited.
+ *
+ * @param item The item being entered or exited.
+ * @param entering If \c true, the item is being entered; if \c false, the
+ * item is being exited.
+ */
+ virtual void item( Item const &item, bool entering );
+
+ /**
+ * Tokenizes the given node and all of its child nodes, if any. For each
+ * node, it is required that this function call the item() member function of
+ * both this %Tokenizer and of the Callback twice, once each for entrance and
+ * exit.
+ *
+ * @param node The node to tokenize.
+ * @param lang The default language to use.
+ * @param callback The Callback to call per token.
+ * @param tokenize_acp If \c true, additionally tokenize all attribute,
+ * comment, and processing-instruction nodes encountered;
+ * if \c false, skip them.
+ */
+ virtual void tokenize_node_impl( Item const &node,
+ locale::iso639_1::type lang,
+ Callback &callback, bool tokenize_acp );
+
private:
- int trace_options_;
- Numbers *no_;
+ Numbers *numbers_;
};
+inline Tokenizer::Tokenizer( Numbers &numbers ) : numbers_( &numbers ) {
+}
+
inline Tokenizer::Numbers& Tokenizer::numbers() {
- return *no_;
+ return *numbers_;
}
inline Tokenizer::Numbers const& Tokenizer::numbers() const {
- return *no_;
+ return *numbers_;
+}
+
+inline void Tokenizer::tokenize_node( Item const &item,
+ locale::iso639_1::type lang,
+ Callback &callback ) {
+ tokenize_node_impl( item, lang, callback, true );
}
///////////////////////////////////////////////////////////////////////////////
@@ -211,11 +288,14 @@
* Creates a new %Tokenizer.
*
* @param lang The language of the text that the tokenizer will tokenize.
- * @param numbers The Numbers to use.
- * @return Returns said %Tokenizer.
+ * @param numbers The Numbers to use. If \c null, \a t is not set.
+ * @param t If not \c null, set to point to a Tokenizer for \a lang.
+ * @return Returns \c true only if this provider can provide a tokenizer for
+ * \a lang.
*/
- virtual Tokenizer::ptr getTokenizer( locale::iso639_1::type lang,
- Tokenizer::Numbers &numbers ) const = 0;
+ virtual bool getTokenizer( locale::iso639_1::type lang,
+ Tokenizer::Numbers *numbers = 0,
+ Tokenizer::ptr *t = 0 ) const = 0;
};
///////////////////////////////////////////////////////////////////////////////
=== modified file 'include/zorba/uri_resolvers.h'
--- include/zorba/uri_resolvers.h 2012-04-24 12:39:38 +0000
+++ include/zorba/uri_resolvers.h 2012-04-24 20:57:30 +0000
@@ -50,7 +50,8 @@
class ZORBA_DLL_PUBLIC Resource
{
public:
- typedef std::unique_ptr<Resource,internal::ztd::destroy_delete<Resource> > ptr;
+ typedef std::unique_ptr<Resource,internal::ztd::destroy_delete<Resource> >
+ ptr;
virtual ~Resource() = 0;
@@ -172,8 +173,8 @@
* object itself will be discarded.
*
* In any case, if they create a Resource, Zorba will take memory
- * ownership of the Resource and delete it when it is no longer
- * needed.
+ * ownership of the Resource and delete it (by calling destroy() on it)
+ * when it is no longer needed.
*/
virtual Resource* resolveURL(const zorba::String& aUrl,
EntityData const* aEntityData) = 0;
=== modified file 'modules/com/zorba-xquery/www/modules/CMakeLists.txt'
--- modules/com/zorba-xquery/www/modules/CMakeLists.txt 2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/CMakeLists.txt 2012-04-24 20:57:30 +0000
@@ -72,6 +72,13 @@
DECLARE_ZORBA_MODULE(FILE xqdoc.xq VERSION 2.0
URI "http://www.zorba-xquery.com/modules/xqdoc")
+IF(NOT ZORBA_NO_FULL_TEXT)
+ DECLARE_ZORBA_MODULE(FILE full-text.xq VERSION 2.0
+ URI "http://www.zorba-xquery.com/modules/full-text")
+ DECLARE_ZORBA_SCHEMA(FILE full-text.xsd
+ URI "http://www.zorba-xquery.com/modules/full-text")
+ENDIF(NOT ZORBA_NO_FULL_TEXT)
+
# Subdirectories
DECLARE_ZORBA_MODULE(FILE converters/base64.xq VERSION 2.0
URI "http://www.zorba-xquery.com/modules/converters/base64")
=== added file 'modules/com/zorba-xquery/www/modules/full-text.xq'
--- modules/com/zorba-xquery/www/modules/full-text.xq 1970-01-01 00:00:00 +0000
+++ modules/com/zorba-xquery/www/modules/full-text.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,872 @@
+xquery version "3.0";
+
+(:
+ : Copyright 2006-2011 The FLWOR Foundation.
+ :
+ : Licensed under the Apache License, Version 2.0 (the "License");
+ : you may not use this file except in compliance with the License.
+ : You may obtain a copy of the License at
+ :
+ : http://www.apache.org/licenses/LICENSE-2.0
+ :
+ : Unless required by applicable law or agreed to in writing, software
+ : distributed under the License is distributed on an "AS IS" BASIS,
+ : WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ : See the License for the specific language governing permissions and
+ : limitations under the License.
+ :)
+
+(:===========================================================================:)
+
+(:~
+ : This module provides an XQuery API to full-text functions.
+ : For general information about Zorba's implementation of the
+ : <a href="http://www.w3.org/TR/xpath-full-text-10/">XQuery and XPath Full Text 1.0 specification</a>
+ : as well as instructions for building an installing a thesaurus,
+ : see the <a href="http://www.zorba-xquery.com/html/documentation/latest/zorba/ft_thesaurus">Full Text Thesaurus documentation</a>.
+ : <h2>Notes on languages</h2>
+ : To refer to paricular human languages,
+ : Zorba uses both the
+ : <a href="http://en.wikipedia.org/wiki/ISO_639-1">ISO 639-1</a>
+ : and
+ : <a href="http://en.wikipedia.org/wiki/ISO_639-2">ISO 639-2</a>
+ : languages codes.
+ : Note that Zorba supports only a subset of the
+ : <a href="http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes">complete list of language codes</a>
+ : and not every function supports the same subset.
+ : <p/>
+ : Most functions in this module take a language as a parameter
+ : using the
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>
+ : XML schema data type.
+ : <h2>Notes on stemming</h2>
+ : The <code>stem()</code> functions return the
+ : <a href="http://en.wikipedia.org/wiki/Word_stem">stem</a>
+ : of a word.
+ : In Zorba,
+ : the stem of a word itself, however, is not guaranteed to be a word.
+ : It is best to consider a stem as an opaque byte sequence.
+ : All that is guaranteed about a stem is that,
+ : for a given word,
+ : the stem of that word will always be the same byte sequence.
+ : Hence,
+ : you sould never compare the result of one of the <code>stem()</code>
+ : functions against a non-stemmed string,
+ : for example:
+ : <pre>
+ : if ( ft:stem( "apples" ) eq "apple" ) ** WRONG **
+ : </pre>
+ : Instead do:
+ : <pre>
+ : if ( ft:stem( "apples" ) eq ft:stem( "apple" ) ) ** CORRECT **
+ : </pre>
+ : <h2>Notes on the thesaurus</h2>
+ : The <code>thesaurus-lookup()</code> functions have "levels"
+ : and "relationship" parameters.
+ : The values for these are implementation-defined.
+ : Zorba's default implementation uses the
+ : <a href="http://wordnet.princeton.edu/">WordNet lexical database</a>,
+ : version 3.0.
+ : <p/>
+ : In WordNet,
+ : the number of "levels" that two phrases are apart
+ : are how many hierarchical meanings apart they are.
+ : For example,
+ : "canary" is 5 levels away from "vertebrate"
+ : (carary > finch > oscine > passerine > bird > vertebrate).
+ : <p/>
+ : When using the WordNet implementation,
+ : Zorba supports all of the relationships (and their abbreviations)
+ : specified by
+ : <a href="http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=7776">ISO 2788</a>
+ : and
+ : <a href="http://www.niso.org/kst/reports/standards?step=2&gid=&project_key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a">ANSI/NISO Z39.19-2005</a>
+ : with the exceptions of "HN" (history note)
+ : and "X SN" (see scope note for).
+ : These relationships are:
+ : <table>
+ : <tr>
+ : <th>Rel.</th>
+ : <th>Meaning</th>
+ : <th>WordNet Rel.</th>
+ : </tr>
+ : <tr>
+ : <td>BT</td>
+ : <td>broader term</td>
+ : <td>hypernym</td>
+ : </tr>
+ : <tr>
+ : <td>BTG</td>
+ : <td>broader term generic</td>
+ : <td>hypernym</td>
+ : </tr>
+ : <tr>
+ : <td>BTI</td>
+ : <td>broader term instance</td>
+ : <td>instance hypernym</td>
+ : </tr>
+ : <tr>
+ : <td>BTP</td>
+ : <td>broader term partitive</td>
+ : <td>part meronym</td>
+ : </tr>
+ : <tr>
+ : <td>NT</td>
+ : <td>narrower term</td>
+ : <td>hyponym</td>
+ : </tr>
+ : <tr>
+ : <td>NTG</td>
+ : <td>narrower term generic</td>
+ : <td>hyponym</td>
+ : </tr>
+ : <tr>
+ : <td>NTI</td>
+ : <td>narrower term instance</td>
+ : <td>instance hyponym</td>
+ : </tr>
+ : <tr>
+ : <td>NTP</td>
+ : <td>narrower term partitive</td>
+ : <td>part holonym</td>
+ : </tr>
+ : <tr>
+ : <td>RT</td>
+ : <td>related term</td>
+ : <td>also see</td>
+ : </tr>
+ : <tr>
+ : <td>SN</td>
+ : <td>scope note</td>
+ : <td>n/a</td>
+ : </tr>
+ : <tr>
+ : <td>TT</td>
+ : <td>top term</td>
+ : <td>hypernym</td>
+ : </tr>
+ : <tr>
+ : <td>UF</td>
+ : <td>non-preferred term</td>
+ : <td>n/a</td>
+ : </tr>
+ : <tr>
+ : <td>USE</td>
+ : <td>preferred term</td>
+ : <td>n/a</td>
+ : </tr>
+ : </table>
+ : Note that you can specify relationships
+ : either by their abbreviation
+ : or their meaning.
+ : Relationships are case-insensitive.
+ :
+ : In addition to the
+ : <a href="http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=7776">ISO 2788</a>
+ : and
+ : <a href="http://www.niso.org/kst/reports/standards?step=2&gid=&project_key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a">ANSI/NISO Z39.19-2005</a>
+ : relationships,
+ : Zorba also supports all of the relationships offered by WordNet.
+ : These relationships are:
+ : <table class="ft_rels">
+ : <tr>
+ : <th>Relationship</th>
+ : <th>Meaning</th>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">also see</td>
+ : <td>
+ : A word that is related to another,
+ : e.g., for "varnished" (furniture)
+ : one should <em>also see</em> "finished."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>antonym</td>
+ : <td>
+ : A word opposite in meaning to another,
+ : e.g., "light" is an <em>antonym</em> for "heavy."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>attribute</td>
+ : <td>
+ : A noun for which adjectives express values,
+ : e.g., "weight" is an <em>attribute</em>
+ : for which the adjectives "light" and "heavy"
+ : express values.
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>cause</td>
+ : <td>
+ : A verb that causes another,
+ : e.g., "show" is a <em>cause</em> of "see."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">derivationally related form</td>
+ : <td>
+ : A word that is derived from a root word,
+ : e.g., "metric" is a <em>derivationally related form</em> of "meter."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">derived from adjective</td>
+ : <td>
+ : An adverb that is derived from an adjective,
+ : e.g., "correctly" is <em>derived from the adjective</em> "correct."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>entailment</td>
+ : <td>
+ : A verb that presupposes another,
+ : e.g., "snoring" <em>entails</em> "sleeping."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>hypernym</td>
+ : <td>
+ : A word with a broad meaning that more specific words fall under,
+ : e.g., "meal" is a <em>hypernym</em> of "breakfast."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>hyponym</td>
+ : <td>
+ : A word of more specific meaning than a general term applicable to it,
+ : e.g., "breakfast" is a <em>hyponym</em> of "meal."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">instance hypernym</td>
+ : <td>
+ : A word that denotes a category of some specific instance,
+ : e.g., "author" is an <em>instance hypernym</em> of "Asimov."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">instance hyponym</td>
+ : <td>
+ : A term that donotes a specific instance of some general category,
+ : e.g., "Asimov" is an <em>instance hyponym</em> of "author."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">member holonym</td>
+ : <td>
+ : A word that denotes a collection of individuals,
+ : e.g., "faculty" is a <em>member holonym</em> of "professor."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">member meronym</td>
+ : <td>
+ : A word that denotes a member of a larger group,
+ : e.g., a "person" is a <em>member meronym</em> of a "crowd."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">part holonym</td>
+ : <td>
+ : A word that denotes a larger whole comprised of some part,
+ : e.g., "car" is a <em>part holonym</em> of "engine."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">part meronym</td>
+ : <td>
+ : A word that denotes a part of a larger whole,
+ : e.g., an "engine" is <em>part meronym</em> of a "car."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">participle of verb</td>
+ : <td>
+ : An adjective that is the participle of some verb,
+ : e.g., "breaking" is the <em>participle of the verb</em> "break."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td>pertainym</td>
+ : <td>
+ : An adjective that classifies its noun,
+ : e.g., "musical" is a <em>pertainym</em> in "musical instrument."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">similar to</td>
+ : <td>
+ : Similar, though not necessarily interchangeable, adjectives.
+ : For example, "shiny" is <em>similar to</em> "bright",
+ : but they have subtle differences.
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">substance holonym</td>
+ : <td>
+ : A word that denotes a larger whole containing some constituent
+ : substance, e.g., "bread" is a <em>substance holonym</em> of "flour."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">substance meronym</td>
+ : <td>
+ : A word that denotes a constituant substance of some larger whole,
+ : e.g., "flour" is a <em>substance meronym</em> of "bread."
+ : </td>
+ : </tr>
+ : <tr>
+ : <td nowrap="nowrap">verb group</td>
+ : <td>
+ : A verb that is a member of a group of similar verbs,
+ : e.g., "live" is in the <em>verb group</em>
+ : of "dwell", "live", "inhabit", etc.
+ : </td>
+ : </tr>
+ : </table>
+ : <h2>Notes on tokenization</h2>
+ : For general information about Zorba's implementation of tokenization,
+ : including what constitutes a token,
+ : see the <a href="http://www.zorba-xquery.com/html/documentation/latest/zorba/ft_tokenizer">Full Text Tokenizer</a> documentation.
+ :)
+
+(:===========================================================================:)
+
+module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+import schema namespace ft-schema =
+ "http://www.zorba-xquery.com/modules/full-text";
+
+declare namespace err = "http://www.w3.org/2005/xqt-errors";
+declare namespace zerr = "http://www.zorba-xquery.com/errors";
+
+declare namespace ver = "http://www.zorba-xquery.com/options/versioning";
+declare option ver:module-version "2.0";
+
+(:===========================================================================:)
+
+(:~
+ : Predeclared constant for the Danish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-da as xs:language := xs:language("da");
+
+(:~
+ : Predeclared constant for the German
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-de as xs:language := xs:language("de");
+
+(:~
+ : Predeclared constant for the English
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-en as xs:language := xs:language("en");
+
+(:~
+ : Predeclared constant for the Spanish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-es as xs:language := xs:language("es");
+
+(:~
+ : Predeclared constant for the Finnish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-fi as xs:language := xs:language("fi");
+
+(:~
+ : Predeclared constant for the French
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-fr as xs:language := xs:language("fr");
+
+(:~
+ : Predeclared constant for the Hungarian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-hu as xs:language := xs:language("hu");
+
+(:~
+ : Predeclared constant for the Italian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-it as xs:language := xs:language("it");
+
+(:~
+ : Predeclared constant for the Dutch
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-nl as xs:language := xs:language("nl");
+
+(:~
+ : Predeclared constant for the Norwegian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-no as xs:language := xs:language("no");
+
+(:~
+ : Predeclared constant for the Portuguese
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-pt as xs:language := xs:language("pt");
+
+(:~
+ : Predeclared constant for the Romanian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-ro as xs:language := xs:language("ro");
+
+(:~
+ : Predeclared constant for the Russian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-ru as xs:language := xs:language("ru");
+
+(:~
+ : Predeclared constant for the Swedish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-sv as xs:language := xs:language("sv");
+
+(:~
+ : Predeclared constant for the Turkish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language"><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-tr as xs:language := xs:language("tr");
+
+(:===========================================================================:)
+
+(:~
+ : Gets the current
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>:
+ : either the langauge specified by the
+ : <code><a href="http://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-FTOptionDecl">declare ft-option using</a>
+ : <a href="http://www.w3.org/TR/xpath-full-text-10/#ftlanguageoption">language</a></code>
+ : statement (if any)
+ : or the one returned by <code>ft:host-lang()</code> (if none).
+ :
+ : @return said language.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq
+ :)
+declare function ft:current-lang()
+ as xs:language external;
+
+(:~
+ : Gets the host's current
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>.
+ : The "host" is the computer on which Zorba is running.
+ : The host's current language is obtained as follows:
+ : <ul>
+ : <li>
+ : For *nix systems:
+ : <ol>
+ : <li>
+ : If <a ref="http://www.cplusplus.com/reference/clibrary/clocale/setlocale/"><code>setlocale</code>(3)</a> returns non-null,
+ : the language corresponding to that locale is used.
+ : </li>
+ : <li>
+ : Else, if the <code>LANG</code> environment variable is set,
+ : that language is ued.
+ : </li>
+ : <li>
+ : Otherwise, there is no default language.
+ : </li>
+ : </ol>
+ : </li>
+ : <li>
+ : For Windows systems,
+ : the language corresponding to the locale returned by the
+ : <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/dd318101(v=vs.85).aspx"><code>GetLocaleInfo()</code></a>
+ : function is used.
+ : </li>
+ : </ul>
+ :
+ : @return said language.
+ :)
+declare function ft:host-lang()
+ as xs:language external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is supported for stemming.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq
+ :)
+declare function ft:is-stem-lang-supported( $lang as xs:language )
+ as xs:boolean external;
+
+(:~
+ : Checks whether the given word is a stop-word.
+ :
+ : @param $word The word to check.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$word</code>.
+ : @return <code>true</code> only if <code>$word</code> is a stop-word.
+ : @error zerr:ZXQP8405 if <code>$lang</code> is not supported for stop-words.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq
+ :)
+declare function ft:is-stop-word( $word as xs:string, $lang as xs:language )
+ as xs:boolean external;
+
+(:~
+ : Checks whether the given word is a stop-word.
+ :
+ : @param $word The word to check.
+ : The word's <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return <code>true</code> only if <code>$word</code> is a stop-word.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8405 if <code>ft:current-lang()</code> is not supported for
+ : stop-words specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq
+ :)
+declare function ft:is-stop-word( $word as xs:string )
+ as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is supported for stop words.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq
+ :)
+declare function ft:is-stop-word-lang-supported( $lang as xs:language )
+ as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is supported for look-up using the default thesaurus.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ :)
+declare function ft:is-thesaurus-lang-supported( $lang as xs:language )
+ as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is supported for look-up using the thesaurus specified by the given URI.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq
+ :)
+declare function ft:is-thesaurus-lang-supported( $uri as xs:string,
+ $lang as xs:language )
+ as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is supported for tokenization.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ :)
+declare function ft:is-tokenizer-lang-supported( $lang as xs:language )
+ as xs:boolean external;
+
+(:~
+ : Stems the given word.
+ :
+ : @param $word The word to stem.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$word</code>.
+ : @return the stem of <code>$word</code>.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZXQP8404 if <code>$lang</code> is not supported for stemming
+ : specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq
+ :)
+declare function ft:stem( $word as xs:string, $lang as xs:language )
+ as xs:string external;
+
+(:~
+ : Stems the given word.
+ :
+ : @param $word The word to stem.
+ : The word's <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return the stem of <code>$word</code>.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8404 if <code>ft:current-lang()</code> is not supported for
+ : stemming specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq
+ :)
+declare function ft:stem( $word as xs:string )
+ as xs:string external;
+
+(:~
+ : Strips all diacritical marks from all characters.
+ :
+ : @param $string The string to strip diacritical marks from.
+ : @return <code>$string</code> with diacritical marks stripped.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq
+ :)
+declare function ft:strip-diacritics( $string as xs:string )
+ as xs:string external;
+
+(:~
+ : Looks-up the given phrase in the default thesaurus.
+ :
+ : @param $phrase The phrase to look up.
+ : The phrase's
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data.
+ : @error zerr:ZXQP8406 if <code>ft:current-lang()</code> is not supported for
+ : thesaurus look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq
+ :)
+declare function ft:thesaurus-lookup( $phrase as xs:string )
+ as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in the thesaurus specified by the given URI.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$phrase</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>$lang</code> is not supported for thesaurus
+ : look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string,
+ $lang as xs:language )
+ as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in a thesaurus.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : The phrase's
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is assumed to be the one the one returned by <code>ft:current-lang()</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is unsupported in
+ : general.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>ft:current-lang()</code> is not supported for
+ : thesaurus look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string )
+ as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in a thesaurus.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$phrase</code>.
+ : @param $relationship The relationship the results are to have to
+ : <code>$phrase</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>$lang</code> is not supported for thesaurus
+ : look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string,
+ $lang as xs:language,
+ $relationship as xs:string )
+ as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in a thesaurus.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$phrase</code>.
+ : @param $relationship The relationship the results are to have to
+ : <code>$phrase</code>.
+ : @param $level-least The minimum number of levels within the thesaurus to be
+ : travers$ed.
+ : @param $level-most The maximum number of levels within the thesaurus to be
+ : traversed.
+ : @return the original and related phrases.
+ : @error err:FOCA0003 if either <code>$level-least</code> or
+ : <code>$level-most</code> is either negative or too large.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>$lang</code> is not supported for thesaurus
+ : look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string,
+ $lang as xs:language,
+ $relationship as xs:string,
+ $level-least as xs:integer,
+ $level-most as xs:integer )
+ as xs:string+ external;
+
+(:~
+ : Tokenizes the given document.
+ :
+ : @param $node The node to tokenize.
+ : @param $lang The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$node</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq
+ :)
+declare function ft:tokenize( $node as node(), $lang as xs:language )
+ as element(ft-schema:token)* external;
+
+(:~
+ : Tokenizes the given document.
+ :
+ : @param $node The node to tokenize.
+ : The document's default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq
+ :)
+declare function ft:tokenize( $node as node() )
+ as element(ft-schema:token)* external;
+
+(:~
+ : Tokenizes the given string.
+ :
+ : @param $string The string to tokenize.
+ : @param $lang The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : of <code>$string</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZXQP8407 if <code>$lang</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq
+ :)
+declare function ft:tokenize-string( $string as xs:string,
+ $lang as xs:language )
+ as xs:string* external;
+
+(:~
+ : Tokenizes the given string.
+ :
+ : @param $string The string to tokenize.
+ : The string's default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8407 if <code>ft:current_lang()</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq
+ :)
+declare function ft:tokenize-string( $string as xs:string )
+ as xs:string* external;
+
+(:~
+ : Gets properties of the tokenizer for the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>.
+ :
+ : @param $lang The langauage of the tokenizer to get the properties of.
+ : @return said properties.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZXQP8407 if <code>$lang</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq
+ :)
+declare function ft:tokenizer-properties( $lang as xs:language )
+ as element(ft-schema:tokenizer-properties) external;
+
+(:~
+ : Gets properties of the tokenizer for the
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language">language</a>
+ : returned by <code>ft:current-lang()</code>.
+ :
+ : @return said properties.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8407 if <code>ft:current_lang()</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq
+ :)
+declare function ft:tokenizer-properties()
+ as element(ft-schema:tokenizer-properties) external;
+
+(:===========================================================================:)
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'modules/com/zorba-xquery/www/modules/full-text.xsd'
--- modules/com/zorba-xquery/www/modules/full-text.xsd 1970-01-01 00:00:00 +0000
+++ modules/com/zorba-xquery/www/modules/full-text.xsd 2012-04-24 20:57:30 +0000
@@ -0,0 +1,134 @@
+<?xml version="1.0"?>
+<!--
+ ! Copyright 2006-2011 The FLWOR Foundation.
+ !
+ ! Licensed under the Apache License, Version 2.0 (the "License");
+ ! you may not use this file except in compliance with the License.
+ ! You may obtain a copy of the License at
+ !
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ !
+ ! Unless required by applicable law or agreed to in writing, software
+ ! distributed under the License is distributed on an "AS IS" BASIS,
+ ! WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ ! See the License for the specific language governing permissions and
+ ! limitations under the License.
+-->
+
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ targetNamespace="http://www.zorba-xquery.com/modules/full-text"
+ xmlns="http://www.zorba-xquery.com/modules/full-text"
+ elementFormDefault="qualified"
+ attributeFormDefault="unqualified">
+
+ <!--======================================================================-->
+
+ <xs:element name="compare-options">
+ <xs:complexType>
+ <xs:attributeGroup ref="compare-attributes"/>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:attributeGroup name="compare-attributes">
+ <xs:attribute name="case" type="sensitivity" default="insensitive"/>
+ <xs:attribute name="diacritics" type="sensitivity" default="insensitive"/>
+ <xs:attribute name="stem" type="yes-no-both" default="no"/>
+ </xs:attributeGroup>
+
+ <xs:simpleType name="sensitivity">
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="insensitive"/>
+ <xs:enumeration value="sensitive"/>
+ <xs:enumeration value="both"/>
+ </xs:restriction>
+ </xs:simpleType>
+
+ <xs:simpleType name="yes-no-both">
+ <xs:restriction base="xs:string">
+ <xs:enumeration value="yes"/>
+ <xs:enumeration value="no"/>
+ <xs:enumeration value="both"/>
+ </xs:restriction>
+ </xs:simpleType>
+
+ <xs:complexType name="boolean-value">
+ <xs:attribute name="value" type="xs:boolean" use="required"/>
+ </xs:complexType>
+
+ <!--======================================================================-->
+
+ <xs:element name="token">
+ <xs:complexType>
+
+ <!-- The language of the token. -->
+ <xs:attribute name="lang" type="xs:language"/>
+
+ <!-- The sentence number. -->
+ <xs:attribute name="sentence" type="xs:nonNegativeInteger" use="required"/>
+
+ <!-- The paragraph number. -->
+ <xs:attribute name="paragraph" type="xs:nonNegativeInteger" use="required"/>
+
+ <!-- The token string value. -->
+ <xs:attribute name="value" type="xs:string" use="required"/>
+
+ <!--
+ ! A reference to the originating node. If the token occurred within an
+ ! element, the reference refers to the text node. If the token occurred
+ ! within an attribute, the reference refers to the attribute node.
+ -->
+ <xs:attribute name="node-ref" type="xs:anyURI"/>
+
+ </xs:complexType>
+ </xs:element>
+
+ <!--======================================================================-->
+
+ <xs:element name="tokenizer-properties">
+ <xs:complexType>
+ <xs:all>
+
+ <!--
+ ! If true, XML comments separate tokens. (No example can be provided
+ ! here because it is illegal to nest an XML comment inside an XML
+ ! comment.)
+ -->
+ <xs:element name="comments-separate-tokens" type="boolean-value"/>
+
+ <!--
+ ! If true, XML elements separate tokens. For example,
+ ! <b>B</b>old would be 2 tokens instead of 1.
+ -->
+ <xs:element name="elements-separate-tokens" type="boolean-value"/>
+
+ <!--
+ ! If true, XML processing instructions separate tokens. For example,
+ ! net<?PI pi?>work would be 2 tokens instead of 1.
+ -->
+ <xs:element name="processing-instructions-separate-tokens" type="boolean-value"/>
+
+ <!--
+ ! The list of languages that the tokenizer can tokenize.
+ -->
+ <xs:element name="supported-languages">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element name="lang" type="xs:language" maxOccurs="unbounded"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+
+ </xs:all>
+
+ <!--
+ ! The tokenizer's identifying URI.
+ -->
+ <xs:attribute name="uri" type="xs:anyURI"/>
+
+ </xs:complexType>
+ </xs:element>
+
+ <!--======================================================================-->
+
+</xs:schema>
+<!-- vim:set et sw=2 ts=2: -->
=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/http_request_handler.cpp'
--- modules/com/zorba-xquery/www/modules/http-client.xq.src/http_request_handler.cpp 2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/http_request_handler.cpp 2012-04-24 20:57:30 +0000
@@ -39,7 +39,6 @@
theSerStream(NULL),
thePost(NULL),
theLast(NULL),
- theLastSerializerOptions(NULL),
theIsHeadRequest(false)
{
theHeaderLists.push_back(NULL);
@@ -260,6 +259,7 @@
void HttpRequestHandler::cleanUpBody()
{
delete theSerStream;
+ theSerStream = 0;
theLastBodyHadContent = false;
}
=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq 2012-04-24 20:57:30 +0000
@@ -188,6 +188,7 @@
(:~
:
+ : The thesaurus data file's endianness does not match that of the CPU.
:
:)
declare variable $zerr:ZXQP8402 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8402");
@@ -201,6 +202,22 @@
(:~
:)
+declare variable $zerr:ZXQP8404 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8404");
+
+(:~
+:)
+declare variable $zerr:ZXQP8405 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8405");
+
+(:~
+:)
+declare variable $zerr:ZXQP8406 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8406");
+
+(:~
+:)
+declare variable $zerr:ZXQP8407 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8407");
+
+(:~
+:)
declare variable $zerr:ZXQD0001 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQD0001");
(:~
=== modified file 'modules/com/zorba-xquery/www/modules/xqdoc2xhtml/index.xq'
--- modules/com/zorba-xquery/www/modules/xqdoc2xhtml/index.xq 2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/xqdoc2xhtml/index.xq 2012-04-24 20:57:30 +0000
@@ -839,9 +839,7 @@
if(fn:matches($specLine, "Args:")) then
let $arg_split := fn:substring-after($specLine, "-x")
return
- if(fn:string-length($arg_split) eq 0) then
- fn:error($err:UE008, fn:concat("Unknown Args: in spec file for example <", $exampleSource,"> .
- Add the example input and expected output by hand in the example, in a commentary that should also include the word 'output'."))
+ if(fn:string-length($arg_split) eq 0) then string-join($specLines, " ")
else
let $var_value := fn:tokenize($arg_split, "=")
let $var_name := fn:normalize-space(fn:replace($var_value[1], ":$", ""))
=== modified file 'scripts/zt-wn-get'
--- scripts/zt-wn-get 2012-04-24 12:39:38 +0000
+++ scripts/zt-wn-get 2012-04-24 20:57:30 +0000
@@ -22,7 +22,7 @@
echo 'Arguments: [--workdir <workdir>] [--builddir <builddir>]'
echo ' [--thesaurusurl <thesaurusurl>]'
echo ' <zorba_repository>'
- echo '<zorba_repository> is the top-level SVN working copy.'
+ echo '<zorba_repository> is the top-level BZR working copy.'
echo '<workdir> is a temp directory to download and unzip XQTS (default: /tmp).'
echo '<builddir> is the directory Zorba has been built in'
echo ' (default: <zorba_repository>/build)'
@@ -71,8 +71,8 @@
echo Build dir is at $BUILD
# Compile thesaurus to binary format
-mkdir -p $BUILD/test/rbkt/thesauri
-THESAURUS_DEST="$BUILD/test/rbkt/thesauri/wordnet-en.zth"
+mkdir -p $BUILD/LIB_PATH/edu/princeton/wordnet
+THESAURUS_DEST="$BUILD/LIB_PATH/edu/princeton/wordnet/wordnet-en.zth"
echo "Compiling thesaurus to $THESAURUS_DEST..."
untar_dir=`mktemp -d "$WORK/thesaurus.XXXXXX"`
cd "$untar_dir"
=== modified file 'src/api/CMakeLists.txt'
--- src/api/CMakeLists.txt 2012-04-24 12:39:38 +0000
+++ src/api/CMakeLists.txt 2012-04-24 20:57:30 +0000
@@ -62,8 +62,9 @@
IF (NOT ZORBA_NO_FULL_TEXT)
LIST(APPEND API_SRCS
stemmer.cpp
- stemmer_wrapper.cpp
- thesaurus.cpp)
+ stemmer_wrappers.cpp
+ thesaurus.cpp
+ thesaurus_wrappers.cpp)
ENDIF (NOT ZORBA_NO_FULL_TEXT)
ADD_SRC_SUBFOLDER(API_SRCS serialization API_SERIALIZATION_SRCS)
=== modified file 'src/api/staticcontextimpl.cpp'
--- src/api/staticcontextimpl.cpp 2012-04-24 12:39:38 +0000
+++ src/api/staticcontextimpl.cpp 2012-04-24 20:57:30 +0000
@@ -42,8 +42,8 @@
#include "context/static_context.h"
#include "context/static_context_consts.h"
#ifndef ZORBA_NO_FULL_TEXT
-#include "context/stemmer_wrappers.h"
-#include "context/thesaurus_wrappers.h"
+#include "stemmer_wrappers.h"
+#include "thesaurus_wrappers.h"
#endif /* ZORBA_NO_FULL_TEXT */
#include "uri_resolver_wrappers.h"
@@ -65,7 +65,6 @@
namespace zorba {
-
/*******************************************************************************
Create a StaticContextImpl obj as well as an internal static_context obj S.
S is created as a child of the zorba root sctx. This constructor is used
=== renamed file 'src/api/stemmer_wrapper.cpp' => 'src/api/stemmer_wrappers.cpp'
--- src/api/stemmer_wrapper.cpp 2012-04-24 12:39:38 +0000
+++ src/api/stemmer_wrappers.cpp 2012-04-24 20:57:30 +0000
@@ -23,7 +23,7 @@
#include "diagnostics/assert.h"
#include "util/cxx_util.h"
-#include "stemmer_wrapper.h"
+#include "stemmer_wrappers.h"
using namespace zorba::locale;
@@ -32,8 +32,8 @@
///////////////////////////////////////////////////////////////////////////////
-StemmerWrapper::StemmerWrapper( zorba::Stemmer::ptr p ) :
- api_stemmer_( std::move( p ) )
+StemmerWrapper::StemmerWrapper( zorba::Stemmer::ptr api_stemmer ) :
+ api_stemmer_( std::move( api_stemmer ) )
{
ZORBA_ASSERT( api_stemmer_.get() );
}
@@ -42,6 +42,12 @@
api_stemmer_.release()->destroy();
}
+void StemmerWrapper::properties( Properties *props ) const {
+ zorba::Stemmer::Properties api_props;
+ api_stemmer_->properties( &api_props );
+ props->uri = api_props.uri;
+}
+
void StemmerWrapper::stem( zstring const &word, iso639_1::type lang,
zstring *result ) const {
String const api_word( Unmarshaller::newString( word ) );
@@ -52,19 +58,22 @@
///////////////////////////////////////////////////////////////////////////////
StemmerProviderWrapper::
-StemmerProviderWrapper( zorba::StemmerProvider const *p ) :
- api_stemmer_provider_( p )
+StemmerProviderWrapper( zorba::StemmerProvider const *api_stemmer_provider ) :
+ api_stemmer_provider_( api_stemmer_provider )
{
ZORBA_ASSERT( api_stemmer_provider_ );
}
-Stemmer::ptr
-StemmerProviderWrapper::get_stemmer( iso639_1::type lang ) const {
- zorba::Stemmer::ptr p( api_stemmer_provider_->getStemmer( lang ) );
- Stemmer::ptr result;
- if ( p.get() )
- result.reset( new StemmerWrapper( std::move( p ) ) );
- return std::move( result );
+bool StemmerProviderWrapper::getStemmer( iso639_1::type lang,
+ Stemmer::ptr *result ) const {
+ zorba::Stemmer::ptr api_ptr;
+ zorba::Stemmer::ptr *const api_ptr_ptr = result ? &api_ptr : nullptr;
+ if ( api_stemmer_provider_->getStemmer( lang, api_ptr_ptr ) ) {
+ if ( result )
+ result->reset( new StemmerWrapper( std::move( api_ptr ) ) );
+ return true;
+ }
+ return false;
}
///////////////////////////////////////////////////////////////////////////////
=== renamed file 'src/api/stemmer_wrapper.h' => 'src/api/stemmer_wrappers.h'
--- src/api/stemmer_wrapper.h 2012-04-24 12:39:38 +0000
+++ src/api/stemmer_wrappers.h 2012-04-24 20:57:30 +0000
@@ -35,6 +35,7 @@
// inherited
void destroy() const;
+ void properties( Properties* ) const;
void stem( zstring const &word, locale::iso639_1::type lang,
zstring *result ) const;
private:
@@ -50,7 +51,7 @@
}
// inherited
- Stemmer::ptr get_stemmer( locale::iso639_1::type lang ) const;
+ bool getStemmer( locale::iso639_1::type, Stemmer::ptr* = 0 ) const;
private:
zorba::StemmerProvider const *const api_stemmer_provider_;
};
=== modified file 'src/api/thesaurus.cpp'
--- src/api/thesaurus.cpp 2012-04-24 12:39:38 +0000
+++ src/api/thesaurus.cpp 2012-04-24 20:57:30 +0000
@@ -25,9 +25,11 @@
// out-of-line since it's virtual
}
-//Thesaurus::iterator::~iterator() {
-// // out-of-line since it's virtual
-//}
+#if 0
+Thesaurus::iterator::~iterator() {
+ // out-of-line since it's virtual
+}
+#endif
///////////////////////////////////////////////////////////////////////////////
=== renamed file 'src/context/thesaurus_wrappers.cpp' => 'src/api/thesaurus_wrappers.cpp'
--- src/context/thesaurus_wrappers.cpp 2012-04-24 12:39:38 +0000
+++ src/api/thesaurus_wrappers.cpp 2012-04-24 20:57:30 +0000
@@ -87,6 +87,27 @@
///////////////////////////////////////////////////////////////////////////////
+ThesaurusProviderWrapper::
+ThesaurusProviderWrapper( zorba::ThesaurusProvider const *p ) :
+ api_thesaurus_provider_( p )
+{
+ ZORBA_ASSERT( api_thesaurus_provider_ );
+}
+
+bool ThesaurusProviderWrapper::getThesaurus( iso639_1::type lang,
+ Thesaurus::ptr *result ) const {
+ zorba::Thesaurus::ptr api_ptr;
+ zorba::Thesaurus::ptr *const api_ptr_ptr = result ? &api_ptr : nullptr;
+ if ( api_thesaurus_provider_->getThesaurus( lang, api_ptr_ptr ) ) {
+ if ( result )
+ result->reset( new ThesaurusWrapper( std::move( api_ptr ) ) );
+ return true;
+ }
+ return false;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace internal
} // namespace zorba
=== renamed file 'src/context/thesaurus_wrappers.h' => 'src/api/thesaurus_wrappers.h'
--- src/context/thesaurus_wrappers.h 2012-04-24 12:39:38 +0000
+++ src/api/thesaurus_wrappers.h 2012-04-24 20:57:30 +0000
@@ -22,6 +22,7 @@
#ifndef ZORBA_NO_FULL_TEXT
#include <zorba/thesaurus.h>
+
#include "runtime/full_text/thesaurus.h"
namespace zorba {
@@ -54,6 +55,17 @@
zorba::Thesaurus::ptr api_thesaurus_;
};
+class ThesaurusProviderWrapper : public ThesaurusProvider {
+public:
+ ThesaurusProviderWrapper( zorba::ThesaurusProvider const* );
+
+ // inherited
+ bool getThesaurus( locale::iso639_1::type, Thesaurus::ptr* ) const;
+
+private:
+ zorba::ThesaurusProvider::ptr const api_thesaurus_provider_;
+};
+
///////////////////////////////////////////////////////////////////////////////
} // namespace internal
=== modified file 'src/api/uri_resolver_wrappers.cpp'
--- src/api/uri_resolver_wrappers.cpp 2012-04-24 12:39:38 +0000
+++ src/api/uri_resolver_wrappers.cpp 2012-04-24 20:57:30 +0000
@@ -15,24 +15,20 @@
*/
#include "stdafx.h"
+#include <zorba/thesaurus.h>
+
+#include "runtime/full_text/thesaurus.h"
+
+#include "thesaurus_wrappers.h"
+#include "unmarshaller.h"
#include "uri_resolver_wrappers.h"
#include "uriresolverimpl.h"
-#include "unmarshaller.h"
-#include <zorba/thesaurus.h>
-#include <runtime/full_text/thesaurus.h>
-#include <context/thesaurus_wrappers.h>
namespace zorba
{
// "Convenience" class for passing an internal EntityData object to
- // external mappers/resolvers. This can serve as a plain EntityData or
- // a ThesaurusEntityData. However, when there's another EntityData subclass
- // in future, this won't work as EntityData becomes an ambiguous base class...
-#ifndef ZORBA_NO_FULL_TEXT
- class EntityDataWrapper : public ThesaurusEntityData
-#else
+ // external mappers/resolvers.
class EntityDataWrapper : public EntityData
-#endif /* ZORBA_NO_FULL_TEXT */
{
public:
static EntityDataWrapper const* create(internal::EntityData const* aData) {
@@ -45,12 +41,7 @@
return new EntityDataWrapper(EntityData::SCHEMA);
#ifndef ZORBA_NO_FULL_TEXT
case internal::EntityData::THESAURUS:
- {
- EntityDataWrapper* retval = new EntityDataWrapper(EntityData::THESAURUS);
- retval->theThesaurusLang =
- dynamic_cast<const internal::ThesaurusEntityData*>(aData)->getLanguage();
- return retval;
- }
+ return new EntityDataWrapper(EntityData::THESAURUS);
case internal::EntityData::STOP_WORDS:
return new EntityDataWrapper(EntityData::STOP_WORDS);
#endif /* ZORBA_NO_FULL_TEXT */
@@ -67,21 +58,12 @@
return theKind;
}
-#ifndef ZORBA_NO_FULL_TEXT
- virtual zorba::locale::iso639_1::type getLanguage() const {
- return theThesaurusLang;
- }
-#endif /* ZORBA_NO_FULL_TEXT */
-
private:
EntityDataWrapper(EntityData::Kind aKind)
: theKind(aKind)
{}
EntityData::Kind const theKind;
-#ifndef ZORBA_NO_FULL_TEXT
- zorba::locale::iso639_1::type theThesaurusLang;
-#endif /* ZORBA_NO_FULL_TEXT */
};
URIMapperWrapper::URIMapperWrapper(zorba::URIMapper& aUserMapper)
@@ -169,13 +151,13 @@
}
#ifndef ZORBA_NO_FULL_TEXT
else {
- Thesaurus* lUserThesaurus = dynamic_cast<Thesaurus*>(lUserPtr.get());
- if (lUserThesaurus != NULL) {
- // Here we pass memory ownership of the actual Thesaurus to the
- // internal ThesaurusWrapper.
- lRetval = new internal::ThesaurusWrapper
- (Thesaurus::ptr(lUserThesaurus));
- lUserPtr.release();
+ ThesaurusProvider* lUserThesaurusProvider =
+ dynamic_cast<ThesaurusProvider*>(lUserPtr.get());
+ if (lUserThesaurusProvider) {
+ // Here we pass memory ownership of the actual ThesaurusProvider to
+ // the internal ThesaurusWrapper.
+ lRetval = new internal::ThesaurusProviderWrapper
+ (lUserThesaurusProvider);
}
else {
assert(false);
=== modified file 'src/api/xmldatamanagerimpl.cpp'
--- src/api/xmldatamanagerimpl.cpp 2012-04-24 12:39:38 +0000
+++ src/api/xmldatamanagerimpl.cpp 2012-04-24 20:57:30 +0000
@@ -47,7 +47,7 @@
#include "runtime/util/flowctl_exception.h"
#ifndef ZORBA_NO_FULL_TEXT
-#include "stemmer_wrapper.h"
+#include "stemmer_wrappers.h"
#endif /* ZORBA_NO_FULL_TEXT */
namespace zorba {
=== modified file 'src/api/xmldatamanagerimpl.h'
--- src/api/xmldatamanagerimpl.h 2012-04-24 12:39:38 +0000
+++ src/api/xmldatamanagerimpl.h 2012-04-24 20:57:30 +0000
@@ -27,7 +27,7 @@
#include "util/singleton.h"
#ifndef ZORBA_NO_FULL_TEXT
-#include "stemmer_wrapper.h"
+#include "stemmer_wrappers.h"
#endif /* ZORBA_NO_FULL_TEXT */
namespace zorba {
=== modified file 'src/compiler/codegen/plan_visitor.cpp'
--- src/compiler/codegen/plan_visitor.cpp 2012-04-24 12:39:38 +0000
+++ src/compiler/codegen/plan_visitor.cpp 2012-04-24 20:57:30 +0000
@@ -250,7 +250,7 @@
class plan_ftnode_visitor : public ftnode_visitor
{
public:
- typedef std::list<PlanIter_t> PlanIter_list_t;
+ typedef std::vector<PlanIter_t> PlanIter_list_t;
plan_ftnode_visitor( plan_visitor* v ) : plan_visitor_( v ) { }
=== modified file 'src/compiler/expression/expr_put.cpp'
--- src/compiler/expression/expr_put.cpp 2012-04-24 12:39:38 +0000
+++ src/compiler/expression/expr_put.cpp 2012-04-24 20:57:30 +0000
@@ -41,6 +41,7 @@
#include "compiler/expression/function_item_expr.h"
#include "compiler/parser/parse_constants.h"
+#include "diagnostics/assert.h"
#include "functions/function.h"
#include "functions/udf.h"
=== modified file 'src/compiler/translator/translator.cpp'
--- src/compiler/translator/translator.cpp 2012-04-24 12:39:38 +0000
+++ src/compiler/translator/translator.cpp 2012-04-24 20:57:30 +0000
@@ -68,6 +68,7 @@
#include "functions/signature.h"
#include "functions/udf.h"
#include "functions/external_function.h"
+#include "functions/func_ft_module.h"
#include "annotations/annotations.h"
@@ -859,7 +860,7 @@
{
ZORBA_ASSERT(count >= 0);
- ftnode *n = NULL;
+ ftnode *n = nullptr;
while ( count-- > 0 )
{
ZORBA_FATAL( !theFTNodeStack.empty(), "" );
@@ -3294,6 +3295,41 @@
qnameItem->getLocalName())));
}
+#ifndef ZORBA_NO_FULL_TEXT
+ if (qnameItem->getNamespace() == static_context::ZORBA_FULL_TEXT_FN_NS &&
+ (qnameItem->getLocalName() == "tokenizer-properties" ||
+ qnameItem->getLocalName() == "tokenize"))
+ {
+ FunctionConsts::FunctionKind kind;
+
+ if (qnameItem->getLocalName() == "tokenizer-properties")
+ {
+ assert(numParams <= 1);
+
+ if (numParams == 1)
+ kind = FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1;
+ else
+ kind = FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_0;
+
+ f = new full_text_tokenizer_properties(f->getSignature(), kind);
+ }
+ else
+ {
+ assert(numParams == 1 || numParams == 2);
+
+ if (numParams == 2)
+ kind = FunctionConsts::FULL_TEXT_TOKENIZE_2;
+ else
+ kind = FunctionConsts::FULL_TEXT_TOKENIZE_1;
+
+ f = new full_text_tokenize(f->getSignature(), kind);
+ }
+
+ f->setStaticContext(theRootSctx);
+ bind_fn(f, numParams, loc);
+ }
+#endif /* ZORBA_NO_FULL_TEXT */
+
f->setAnnotations(theAnnotations);
theAnnotations = NULL; // important to reset
@@ -12512,7 +12548,7 @@
{
TRACE_VISIT ();
#ifndef ZORBA_NO_FULL_TEXT
- push_ftstack( NULL ); // sentinel
+ push_ftstack( nullptr ); // sentinel
#endif /* ZORBA_NO_FULL_TEXT */
return no_state;
}
@@ -12756,7 +12792,7 @@
void *begin_visit (const FTMildNot& v) {
TRACE_VISIT ();
#ifndef ZORBA_NO_FULL_TEXT
- push_ftstack( NULL ); // sentinel
+ push_ftstack( nullptr ); // sentinel
#endif /* ZORBA_NO_FULL_TEXT */
return no_state;
}
@@ -12799,7 +12835,7 @@
void *begin_visit (const FTOr& v) {
TRACE_VISIT ();
#ifndef ZORBA_NO_FULL_TEXT
- push_ftstack( NULL ); // sentinel
+ push_ftstack( nullptr ); // sentinel
#endif /* ZORBA_NO_FULL_TEXT */
return no_state;
}
@@ -13058,7 +13094,7 @@
levels = dynamic_cast<ftrange*>( pop_ftstack() );
ZORBA_ASSERT( levels );
} else
- levels = NULL;
+ levels = nullptr;
ftthesaurus_id *const tid = new ftthesaurus_id(
loc, v.get_uri(), v.get_relationship(), levels
@@ -13070,7 +13106,7 @@
void *begin_visit (const FTThesaurusOption& v) {
TRACE_VISIT ();
#ifndef ZORBA_NO_FULL_TEXT
- push_ftstack( NULL ); // sentinel
+ push_ftstack( nullptr ); // sentinel
#endif /* ZORBA_NO_FULL_TEXT */
return no_state;
}
@@ -13078,10 +13114,8 @@
void end_visit (const FTThesaurusOption& v, void* /*visit_state*/) {
TRACE_VISIT_OUT ();
#ifndef ZORBA_NO_FULL_TEXT
- ftthesaurus_id *default_tid = NULL;
- if ( v.includes_default() ) {
- default_tid = new ftthesaurus_id( loc, "##default" );
- }
+ ftthesaurus_id *const default_tid = v.includes_default() ?
+ new ftthesaurus_id( loc, "##default" ) : nullptr;
ftthesaurus_option::thesaurus_id_list_t list;
while ( true ) {
=== modified file 'src/context/CMakeLists.txt'
--- src/context/CMakeLists.txt 2012-04-24 12:39:38 +0000
+++ src/context/CMakeLists.txt 2012-04-24 20:57:30 +0000
@@ -32,11 +32,6 @@
features.cpp
)
-IF (NOT ZORBA_NO_FULL_TEXT)
- LIST(APPEND CONTEXT_SRCS
- thesaurus_wrappers.cpp)
-ENDIF (NOT ZORBA_NO_FULL_TEXT)
-
SET(CONTEXT_BUILD_SRCS
${CMAKE_CURRENT_BINARY_DIR}/context/root_static_context_init.cpp
)
=== modified file 'src/context/default_url_resolvers.cpp'
--- src/context/default_url_resolvers.cpp 2012-04-24 12:39:38 +0000
+++ src/context/default_url_resolvers.cpp 2012-04-24 20:57:30 +0000
@@ -17,6 +17,7 @@
#include "context/default_url_resolvers.h"
+#include "util/cxx_util.h"
#include "util/uri_util.h"
#include "util/http_util.h"
#include "util/fs_util.h"
@@ -41,8 +42,15 @@
HTTPURLResolver::resolveURL
(zstring const& aUrl, EntityData const* aEntityData)
{
- if (aEntityData->getKind() == EntityData::COLLECTION)
- return NULL;
+ switch ( aEntityData->getKind() ) {
+ case EntityData::COLLECTION:
+#ifndef ZORBA_NO_FULL_TEXT
+ case EntityData::THESAURUS:
+#endif /* ZORBA_NO_FULL_TEXT */
+ return nullptr;
+ default:
+ break;
+ }
uri::scheme lScheme = uri::get_scheme(aUrl);
switch (lScheme) {
@@ -82,8 +90,15 @@
FileURLResolver::resolveURL
(zstring const& aUrl, EntityData const* aEntityData)
{
- if (aEntityData->getKind() == EntityData::COLLECTION)
- return NULL;
+ switch ( aEntityData->getKind() ) {
+ case EntityData::COLLECTION:
+#ifndef ZORBA_NO_FULL_TEXT
+ case EntityData::THESAURUS:
+#endif /* ZORBA_NO_FULL_TEXT */
+ return nullptr;
+ default:
+ break;
+ }
uri::scheme lScheme = uri::get_scheme(aUrl);
if (lScheme != uri::file) {
@@ -111,7 +126,6 @@
{
if (aEntityData->getKind() != EntityData::COLLECTION)
return NULL;
-
store::Item_t lName;
GENV_STORE.getItemFactory()->createQName(lName, aUrl.c_str(), "", "zorba-internal-name-for-w3c-collections");
store::Collection_t lColl = GENV_STORE.getCollection(lName.getp(), true);
=== modified file 'src/context/static_context.cpp'
--- src/context/static_context.cpp 2012-04-24 12:39:38 +0000
+++ src/context/static_context.cpp 2012-04-24 20:57:30 +0000
@@ -378,11 +378,16 @@
static_context::ZORBA_XML_FN_NS =
"http://www.zorba-xquery.com/modules/xml";
+#ifndef ZORBA_NO_FULL_TEXT
+const char*
+static_context::ZORBA_FULL_TEXT_FN_NS =
+"http://www.zorba-xquery.com/modules/full-text";
+#endif /* ZORBA_NO_FULL_TEXT */
+
const char*
static_context::ZORBA_XML_FN_OPTIONS_NS =
"http://www.zorba-xquery.com/modules/xml-options";
-
/***************************************************************************//**
Target namespaces of zorba reserved modules
********************************************************************************/
@@ -451,8 +456,11 @@
ns == ZORBA_JSON_FN_NS ||
ns == ZORBA_FETCH_FN_NS ||
ns == ZORBA_NODE_FN_NS ||
+#ifndef ZORBA_NO_FULL_TEXT
+ ns == ZORBA_FULL_TEXT_FN_NS ||
+#endif /* ZORBA_NO_FULL_TEXT */
ns == ZORBA_XML_FN_NS);
- }
+ }
else if (ns == W3C_FN_NS || ns == XQUERY_MATH_FN_NS)
{
return true;
@@ -1585,7 +1593,7 @@
std::auto_ptr<internal::Resource>& oResource,
zstring& oErrorMessage) const
{
- oErrorMessage = "";
+ oErrorMessage.clear();
// Iterate through all candidate URLs...
for (std::vector<zstring>::iterator url = aUrls.begin();
@@ -1621,7 +1629,7 @@
}
catch (const std::exception& e)
{
- if (oErrorMessage == "")
+ if (oErrorMessage.empty())
{
// Really no point in saving anything more than the first message
oErrorMessage = e.what();
=== modified file 'src/context/static_context.h'
--- src/context/static_context.h 2012-04-24 12:39:38 +0000
+++ src/context/static_context.h 2012-04-24 20:57:30 +0000
@@ -471,6 +471,9 @@
static const char* ZORBA_FETCH_FN_NS;
static const char* ZORBA_NODE_FN_NS;
static const char* ZORBA_XML_FN_NS;
+#ifndef ZORBA_NO_FULL_TEXT
+ static const char* ZORBA_FULL_TEXT_FN_NS;
+#endif /* ZORBA_NO_FULL_TEXT */
static const char* ZORBA_XML_FN_OPTIONS_NS;
// Namespaces of virtual modules declaring zorba builtin functions
=== removed file 'src/context/stemmer_wrappers.cpp'
--- src/context/stemmer_wrappers.cpp 2012-04-24 12:39:38 +0000
+++ src/context/stemmer_wrappers.cpp 1970-01-01 00:00:00 +0000
@@ -1,74 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#include "stdafx.h"
-
-#include <zorba/config.h>
-
-#ifndef ZORBA_NO_FULL_TEXT
-
-#include "api/unmarshaller.h"
-#include "diagnostics/assert.h"
-#include "util/cxx_util.h"
-
-#include "stemmer_wrappers.h"
-
-using namespace zorba::locale;
-
-namespace zorba {
-namespace internal {
-
-///////////////////////////////////////////////////////////////////////////////
-
-StemmerWrapper::StemmerWrapper( zorba::Stemmer const *s ) :
- api_stemmer_( s )
-{
- ZORBA_ASSERT( api_stemmer_ );
-}
-
-void StemmerWrapper::stem( zstring const &word, iso639_1::type lang,
- zstring *result ) const {
- String const api_word( Unmarshaller::newString( word ) );
- String api_result( Unmarshaller::newString( *result ) );
- api_stemmer_->stem( api_word, lang, &api_result );
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
-StemmerProviderWrapper::
-StemmerProviderWrapper( zorba::StemmerProvider const *p ) :
- api_stemmer_provider_( p )
-{
- ZORBA_ASSERT( api_stemmer_provider_ );
-}
-
-Stemmer const*
-StemmerProviderWrapper::get_stemmer( iso639_1::type lang ) const {
- zorba::Stemmer const *const s = api_stemmer_provider_->getStemmer( lang );
- return s ? new StemmerWrapper( s ) : nullptr;
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
-} // namespace internal
-} // namespace zorba
-
-#endif /* ZORBA_NO_FULL_TEXT */
-/*
- * Local variables:
- * mode: c++
- * End:
- */
-/* vim:set et sw=2 ts=2: */
=== removed file 'src/context/stemmer_wrappers.h'
--- src/context/stemmer_wrappers.h 2012-04-24 12:39:38 +0000
+++ src/context/stemmer_wrappers.h 1970-01-01 00:00:00 +0000
@@ -1,63 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#pragma once
-#ifndef ZORBA_STEMMER_WRAPPERS_H
-#define ZORBA_STEMMER_WRAPPERS_H
-
-#include <zorba/config.h>
-
-#if 0
-#ifndef ZORBA_NO_FULL_TEXT
-
-#include <zorba/stemmer.h>
-#include "zorbautils/stemmer.h"
-
-namespace zorba {
-namespace internal {
-
-///////////////////////////////////////////////////////////////////////////////
-
-class StemmerWrapper : public Stemmer {
-public:
- StemmerWrapper( zorba::Stemmer const *api_stemmer );
- void stem( zstring const &word, locale::iso639_1::type lang,
- zstring *result ) const;
-private:
- zorba::Stemmer const *const api_stemmer_;
-};
-
-class StemmerProviderWrapper : public StemmerProvider {
-public:
- StemmerProviderWrapper( zorba::StemmerProvider const *p );
- Stemmer const* get_stemmer( locale::iso639_1::type lang ) const;
-private:
- zorba::StemmerProvider const *const api_stemmer_provider_;
-};
-
-///////////////////////////////////////////////////////////////////////////////
-
-} // namespace internal
-} // namespace zorba
-
-#endif /* ZORBA_NO_FULL_TEXT */
-#endif
-#endif /* ZORBA_STEMMER_WRAPPERS_H */
-/*
- * Local variables:
- * mode: c++
- * End:
- */
-/* vim:set et sw=2 ts=2: */
=== modified file 'src/context/uri_resolver.cpp'
--- src/context/uri_resolver.cpp 2012-04-24 12:39:38 +0000
+++ src/context/uri_resolver.cpp 2012-04-24 20:57:30 +0000
@@ -117,19 +117,6 @@
{
}
-#ifndef ZORBA_NO_FULL_TEXT
- ThesaurusEntityData::ThesaurusEntityData(locale::iso639_1::type aLang)
- : EntityData(EntityData::THESAURUS),
- theLang(aLang)
- {
- }
-
- locale::iso639_1::type ThesaurusEntityData::getLanguage() const
- {
- return theLang;
- }
-#endif /* ZORBA_NO_FULL_TEXT */
-
/*************
* URIMapper is an abstract class, but we have to define its vtbl and
* base destructor somewhere.
=== modified file 'src/context/uri_resolver.h'
--- src/context/uri_resolver.h 2012-04-24 12:39:38 +0000
+++ src/context/uri_resolver.h 2012-04-24 20:57:30 +0000
@@ -55,21 +55,21 @@
/**
* @brief Return the URL used to load this Resource.
*/
- zstring getUrl() { return theUrl; }
+ zstring const& getUrl() const { return theUrl; }
virtual ~Resource() = 0;
- protected:
+protected:
Resource();
- private:
+private:
/**
* Used by static_context to populate the URL.
*/
+ void setUrl(zstring const &aUrl) { theUrl = aUrl; }
friend class zorba::static_context;
- void setUrl(zstring aUrl) { theUrl = aUrl; }
zstring theUrl;
};
@@ -193,25 +193,6 @@
Kind const theKind;
};
-#ifndef ZORBA_NO_FULL_TEXT
-/**
- * @brief The class containing additional data for URIMappers and URLResolvers
- * when mapping/resolving a Thesaurus URI.
- */
-class ThesaurusEntityData : public EntityData
-{
-public:
- ThesaurusEntityData(locale::iso639_1::type aLang);
- /**
- * @brief Return the language for which a thesaurus is being requested.
- */
- virtual locale::iso639_1::type getLanguage() const;
-
-private:
- locale::iso639_1::type const theLang;
-};
-#endif /* ZORBA_NO_FULL_TEXT */
-
/**
* @brief Interface for URL resolving.
*
=== modified file 'src/diagnostics/assert.cpp'
--- src/diagnostics/assert.cpp 2012-04-24 12:39:38 +0000
+++ src/diagnostics/assert.cpp 2012-04-24 20:57:30 +0000
@@ -68,7 +68,7 @@
file,
line,
zerr::ZXQP0002_ASSERT_FAILED,
- ( msg ? ERROR_PARAMS( condition, msg ) : ERROR_PARAMS( condition ))
+ ( msg ? ERROR_PARAMS( condition, msg ) : ERROR_PARAMS( condition ) )
);
}
=== modified file 'src/diagnostics/assert.h'
--- src/diagnostics/assert.h 2012-04-24 12:39:38 +0000
+++ src/diagnostics/assert.h 2012-04-24 20:57:30 +0000
@@ -20,6 +20,10 @@
#ifndef ZORBA_ASSERT_H
#define ZORBA_ASSERT_H
+#include <sstream>
+
+#include "util/cxx_util.h"
+
namespace zorba {
/**
@@ -35,7 +39,7 @@
void assertion_failed( char const *condition,
char const *file,
int line,
- char const *msg = 0);
+ char const *msg = nullptr );
/**
* Zorba version of the standard assert(3) macro.
=== modified file 'src/diagnostics/diagnostic_en.xml'
--- src/diagnostics/diagnostic_en.xml 2012-04-24 12:39:38 +0000
+++ src/diagnostics/diagnostic_en.xml 2012-04-24 20:57:30 +0000
@@ -1746,7 +1746,7 @@
<diagnostic code="ZXQP8401" name="THESAURUS_VERSION_MISMATCH"
if="!defined(ZORBA_NO_FULL_TEXT)">
<comment>
- The version of the thesaurus is not the expected version.
+ The version of the thesaurus is not the expected version.
</comment>
<value>"$1": wrong WordNet file version; should be "$2"</value>
</diagnostic>
@@ -1754,19 +1754,39 @@
<diagnostic code="ZXQP8402" name="THESAURUS_ENDIANNESS_MISMATCH"
if="!defined(ZORBA_NO_FULL_TEXT)">
<comment>
+ The thesaurus data file's endianness does not match that of the CPU.
</comment>
<value>thesaurus data endianness does not match CPU</value>
- The thesaurus data file's endianness does not match that of the CPU.
</diagnostic>
<diagnostic code="ZXQP8403" name="THESAURUS_DATA_ERROR"
if="!defined(ZORBA_NO_FULL_TEXT)">
<comment>
- The thesaurus data contains an unexpected value.
+ The thesaurus data contains an unexpected value.
</comment>
<value>thesaurus data error${: 1}</value>
</diagnostic>
+ <diagnostic code="ZXQP8404" name="STEM_LANG_NOT_SUPPORTED"
+ if="!defined(ZORBA_NO_FULL_TEXT)">
+ <value>"$1": langauge not supported for stemming</value>
+ </diagnostic>
+
+ <diagnostic code="ZXQP8405" name="STOP_WORDS_LANG_NOT_SUPPORTED"
+ if="!defined(ZORBA_NO_FULL_TEXT)">
+ <value>"$1": langauge not supported for stop-words</value>
+ </diagnostic>
+
+ <diagnostic code="ZXQP8406" name="THESAURUS_LANG_NOT_SUPPORTED"
+ if="!defined(ZORBA_NO_FULL_TEXT)">
+ <value>"$1": langauge not supported for thesaurus</value>
+ </diagnostic>
+
+ <diagnostic code="ZXQP8407" name="TOKENIZER_LANG_NOT_SUPPORTED"
+ if="!defined(ZORBA_NO_FULL_TEXT)">
+ <value>"$1": langauge not supported for tokenizer</value>
+ </diagnostic>
+
<diagnostic code="ZXQD0001" name="PREFIX_NOT_DECLARED">
<value>"$1": prefix not declared when calling function "$2" from $3</value>
</diagnostic>
=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
--- src/diagnostics/pregenerated/diagnostic_list.cpp 2012-04-24 12:39:38 +0000
+++ src/diagnostics/pregenerated/diagnostic_list.cpp 2012-04-24 20:57:30 +0000
@@ -660,6 +660,18 @@
ZorbaErrorCode ZXQP8403_THESAURUS_DATA_ERROR( "ZXQP8403" );
+
+
+ZorbaErrorCode ZXQP8404_STEM_LANG_NOT_SUPPORTED( "ZXQP8404" );
+
+
+ZorbaErrorCode ZXQP8405_STOP_WORDS_LANG_NOT_SUPPORTED( "ZXQP8405" );
+
+
+ZorbaErrorCode ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED( "ZXQP8406" );
+
+
+ZorbaErrorCode ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED( "ZXQP8407" );
#endif
=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
--- src/diagnostics/pregenerated/dict_en.cpp 2012-04-24 12:39:38 +0000
+++ src/diagnostics/pregenerated/dict_en.cpp 2012-04-24 20:57:30 +0000
@@ -434,6 +434,18 @@
#if !defined(ZORBA_NO_FULL_TEXT)
{ "ZXQP8403", "thesaurus data error${: 1}" },
#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+ { "ZXQP8404", "\"$1\": langauge not supported for stemming" },
+#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+ { "ZXQP8405", "\"$1\": langauge not supported for stop-words" },
+#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+ { "ZXQP8406", "\"$1\": langauge not supported for thesaurus" },
+#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+ { "ZXQP8407", "\"$1\": langauge not supported for tokenizer" },
+#endif
{ "~AllMatchesHasExcludes", "AllMatches contains StringExclude" },
{ "~AlreadySpecified", "already specified" },
{ "~ArithOpNotDefinedBetween_23", "arithmetic operation not defined between types \"$2\" and \"$3\"" },
=== modified file 'src/functions/CMakeLists.txt'
--- src/functions/CMakeLists.txt 2012-04-24 12:39:38 +0000
+++ src/functions/CMakeLists.txt 2012-04-24 20:57:30 +0000
@@ -83,3 +83,7 @@
func_apply.cpp
func_serialize_impl.cpp
)
+
+IF (NOT ZORBA_NO_FULL_TEXT)
+ LIST(APPEND FUNCTIONS_SRCS func_ft_module_impl.cpp)
+ENDIF (NOT ZORBA_NO_FULL_TEXT)
=== modified file 'src/functions/external_function.cpp'
--- src/functions/external_function.cpp 2012-04-24 12:39:38 +0000
+++ src/functions/external_function.cpp 2012-04-24 20:57:30 +0000
@@ -45,12 +45,12 @@
:
function(sig, FunctionConsts::FN_UNKNOWN),
theLoc(loc),
- theModuleSctx(modSctx),
theNamespace(ns),
theScriptingKind(scriptingType),
theImpl(impl)
{
resetFlag(FunctionConsts::isBuiltin);
+ theModuleSctx = modSctx;
}
@@ -62,7 +62,6 @@
zorba::serialization::serialize_baseclass(ar, (function*)this);
ar & theLoc;
- ar & theModuleSctx;
ar & theNamespace;
ar & theScriptingKind;
=== added file 'src/functions/func_ft_module_impl.cpp'
--- src/functions/func_ft_module_impl.cpp 1970-01-01 00:00:00 +0000
+++ src/functions/func_ft_module_impl.cpp 2012-04-24 20:57:30 +0000
@@ -0,0 +1,110 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "stdafx.h"
+
+#include "functions/func_ft_module.h"
+
+#include "runtime/full_text/ft_module.h"
+
+#define FT_MODULE_NS "http://www.zorba-xquery.com/modules/full-text"
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+void populate_context_ft_module_impl( static_context *sctx ) {
+
+ xqtref_t tokenize_return_type =
+ GENV_TYPESYSTEM.create_node_type(
+ store::StoreConsts::elementNode,
+ createQName( FT_MODULE_NS, "", "token" ),
+ NULL,
+ TypeConstants::QUANT_STAR,
+ false,
+ false
+ );
+ {
+ DECL_WITH_KIND( sctx, full_text_tokenize,
+ (createQName( FT_MODULE_NS, "", "tokenize"),
+ GENV_TYPESYSTEM.ANY_NODE_TYPE_ONE,
+ tokenize_return_type),
+ FunctionConsts::FULL_TEXT_TOKENIZE_1
+ );
+ }
+ {
+ DECL_WITH_KIND( sctx, full_text_tokenize,
+ (createQName( FT_MODULE_NS, "", "tokenize"),
+ GENV_TYPESYSTEM.ANY_NODE_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ tokenize_return_type),
+ FunctionConsts::FULL_TEXT_TOKENIZE_2
+ );
+ }
+
+ xqtref_t tokenizer_properties_return_type =
+ GENV_TYPESYSTEM.create_node_type(
+ store::StoreConsts::elementNode,
+ createQName( FT_MODULE_NS, "", "tokenizer-properties" ),
+ NULL,
+ TypeConstants::QUANT_ONE,
+ false,
+ false
+ );
+ {
+ DECL_WITH_KIND( sctx, full_text_tokenizer_properties,
+ (createQName( FT_MODULE_NS, "", "tokenizer-properties"),
+ tokenizer_properties_return_type),
+ FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_0
+ );
+ }
+ {
+ DECL_WITH_KIND( sctx, full_text_tokenizer_properties,
+ (createQName( FT_MODULE_NS, "", "tokenizer-properties"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ tokenizer_properties_return_type),
+ FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1
+ );
+ }
+
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+PlanIter_t full_text_tokenizer_properties::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new TokenizerPropertiesIterator(theModuleSctx, loc, argv);
+}
+
+
+PlanIter_t full_text_tokenize::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new TokenizeIterator(theModuleSctx, loc, argv);
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */
=== modified file 'src/functions/function.cpp'
--- src/functions/function.cpp 2012-04-24 12:39:38 +0000
+++ src/functions/function.cpp 2012-04-24 20:57:30 +0000
@@ -43,6 +43,7 @@
theSignature(sig),
theKind(kind),
theFlags(0),
+ theModuleSctx(NULL),
theXQueryVersion(StaticContextConsts::xquery_version_1_0)
{
setFlag(FunctionConsts::isBuiltin);
@@ -70,6 +71,7 @@
SERIALIZE_ENUM(FunctionConsts::FunctionKind, theKind);
ar & theFlags;
ar & theAnnotationList;
+ ar & theModuleSctx;
SERIALIZE_ENUM(StaticContextConsts::xquery_version_t, theXQueryVersion);
}
@@ -92,6 +94,7 @@
return n == VARIADIC_SIG_SIZE || argv.size() == n;
}
+
/*******************************************************************************
********************************************************************************/
=== modified file 'src/functions/function.h'
--- src/functions/function.h 2012-04-24 12:39:38 +0000
+++ src/functions/function.h 2012-04-24 20:57:30 +0000
@@ -42,7 +42,10 @@
/*******************************************************************************
-
+ theModuleContext:
+ -----------------
+ The root sctx of the module containing the declaration. It is NULL for
+ functions that must be executed in the static context of the caller.
********************************************************************************/
class function : public SimpleRCObject
{
@@ -51,6 +54,7 @@
FunctionConsts::FunctionKind theKind;
uint32_t theFlags;
AnnotationList_t theAnnotationList;
+ static_context * theModuleSctx;
StaticContextConsts::xquery_version_t theXQueryVersion;
@@ -89,6 +93,10 @@
bool isVariadic() const { return theSignature.isVariadic(); }
+ static_context* getStaticContext() const { return theModuleSctx; }
+
+ void setStaticContext(static_context* sctx) { theModuleSctx = sctx; }
+
void setFlag(FunctionConsts::AnnotationFlags flag)
{
theFlags |= flag;
=== modified file 'src/functions/library.cpp'
--- src/functions/library.cpp 2012-04-24 12:39:38 +0000
+++ src/functions/library.cpp 2012-04-24 20:57:30 +0000
@@ -68,6 +68,10 @@
#include "functions/func_reflection.h"
#include "functions/func_apply.h"
#include "functions/func_fetch.h"
+#ifndef ZORBA_NO_FULL_TEXT
+#include "functions/func_ft_module.h"
+#include "runtime/full_text/ft_module_impl.h"
+#endif /* ZORBA_NO_FULL_TEXT */
#include "functions/func_function_item_iter.h"
@@ -144,6 +148,10 @@
populate_context_apply(sctx);
populate_context_fetch(sctx);
+#ifndef ZORBA_NO_FULL_TEXT
+ populate_context_ft_module(sctx);
+ populate_context_ft_module_impl(sctx);
+#endif /* ZORBA_NO_FULL_TEXT */
ar.set_loading_hardcoded_objects(false);
}
=== added file 'src/functions/pregenerated/func_ft_module.cpp'
--- src/functions/pregenerated/func_ft_module.cpp 1970-01-01 00:00:00 +0000
+++ src/functions/pregenerated/func_ft_module.cpp 2012-04-24 20:57:30 +0000
@@ -0,0 +1,496 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// ******************************************
+// * *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME *
+// * *
+// ******************************************
+
+
+#include "stdafx.h"
+#include "runtime/full_text/ft_module.h"
+#include "functions/func_ft_module.h"
+
+
+namespace zorba{
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_current_lang::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new CurrentLangIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_host_lang::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new HostLangIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_stem_lang_supported::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new IsStemLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_stop_word::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new IsStopWordIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_stop_word_lang_supported::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new IsStopWordLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_thesaurus_lang_supported::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new IsThesaurusLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_tokenizer_lang_supported::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new IsTokenizerLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_stem::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new StemIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_strip_diacritics::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new StripDiacriticsIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_thesaurus_lookup::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new ThesaurusLookupIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_tokenize_string::codegen(
+ CompilerCB*,
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& argv,
+ expr& ann) const
+{
+ return new TokenizeStringIterator(sctx, loc, argv);
+}
+
+#endif
+
+void populate_context_ft_module(static_context* sctx)
+{
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_current_lang,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","current-lang"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_CURRENT_LANG_0);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_host_lang,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","host-lang"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_HOST_LANG_0);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_stem_lang_supported,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stem-lang-supported"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_STEM_LANG_SUPPORTED_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_stop_word,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stop-word"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_STOP_WORD_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_stop_word,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stop-word"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_STOP_WORD_2);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_stop_word_lang_supported,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stop-word-lang-supported"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_STOP_WORD_LANG_SUPPORTED_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_thesaurus_lang_supported,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-thesaurus-lang-supported"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_thesaurus_lang_supported,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-thesaurus-lang-supported"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_2);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_is_tokenizer_lang_supported,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","is-tokenizer-lang-supported"),
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_IS_TOKENIZER_LANG_SUPPORTED_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_stem,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","stem"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_STEM_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_stem,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","stem"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_STEM_2);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_strip_diacritics,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","strip-diacritics"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE),
+ FunctionConsts::FULL_TEXT_STRIP_DIACRITICS_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+ FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+ FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_2);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+ FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_3);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+ FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_4);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.INTEGER_TYPE_ONE,
+ GENV_TYPESYSTEM.INTEGER_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+ FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_6);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_tokenize_string,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","tokenize-string"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_STAR),
+ FunctionConsts::FULL_TEXT_TOKENIZE_STRING_1);
+
+ }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+ {
+
+
+ DECL_WITH_KIND(sctx, full_text_tokenize_string,
+ (createQName("http://www.zorba-xquery.com/modules/full-text","","tokenize-string"),
+ GENV_TYPESYSTEM.STRING_TYPE_ONE,
+ GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+ GENV_TYPESYSTEM.STRING_TYPE_STAR),
+ FunctionConsts::FULL_TEXT_TOKENIZE_STRING_2);
+
+ }
+
+
+#endif
+}
+
+
+}
+
+
+
=== added file 'src/functions/pregenerated/func_ft_module.h'
--- src/functions/pregenerated/func_ft_module.h 1970-01-01 00:00:00 +0000
+++ src/functions/pregenerated/func_ft_module.h 2012-04-24 20:57:30 +0000
@@ -0,0 +1,259 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// ******************************************
+// * *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME *
+// * *
+// ******************************************
+
+
+#ifndef ZORBA_FUNCTIONS_FT_MODULE_H
+#define ZORBA_FUNCTIONS_FT_MODULE_H
+
+
+#include "common/shared_types.h"
+#include "functions/function_impl.h"
+
+
+namespace zorba {
+
+
+void populate_context_ft_module(static_context* sctx);
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:current-lang
+class full_text_current_lang : public function
+{
+public:
+ full_text_current_lang(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:host-lang
+class full_text_host_lang : public function
+{
+public:
+ full_text_host_lang(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-stem-lang-supported
+class full_text_is_stem_lang_supported : public function
+{
+public:
+ full_text_is_stem_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-stop-word
+class full_text_is_stop_word : public function
+{
+public:
+ full_text_is_stop_word(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-stop-word-lang-supported
+class full_text_is_stop_word_lang_supported : public function
+{
+public:
+ full_text_is_stop_word_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-thesaurus-lang-supported
+class full_text_is_thesaurus_lang_supported : public function
+{
+public:
+ full_text_is_thesaurus_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-tokenizer-lang-supported
+class full_text_is_tokenizer_lang_supported : public function
+{
+public:
+ full_text_is_tokenizer_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:stem
+class full_text_stem : public function
+{
+public:
+ full_text_stem(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:strip-diacritics
+class full_text_strip_diacritics : public function
+{
+public:
+ full_text_strip_diacritics(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:thesaurus-lookup
+class full_text_thesaurus_lookup : public function
+{
+public:
+ full_text_thesaurus_lookup(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:tokenize
+class full_text_tokenize : public function
+{
+public:
+ full_text_tokenize(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:tokenizer-properties
+class full_text_tokenizer_properties : public function
+{
+public:
+ full_text_tokenizer_properties(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ bool accessesDynCtx() const { return true; }
+
+ CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:tokenize-string
+class full_text_tokenize_string : public function
+{
+public:
+ full_text_tokenize_string(const signature& sig, FunctionConsts::FunctionKind kind)
+ :
+ function(sig, kind)
+ {
+
+ }
+
+ CODEGEN_DECL();
+};
+#endif
+
+
+} //namespace zorba
+
+
+#endif
+/*
+ * Local variables:
+ * mode: c++
+ * End:
+ */
=== modified file 'src/functions/pregenerated/function_enum.h'
--- src/functions/pregenerated/function_enum.h 2012-04-24 12:39:38 +0000
+++ src/functions/pregenerated/function_enum.h 2012-04-24 20:57:30 +0000
@@ -138,6 +138,29 @@
FN_ZORBA_FETCH_CONTENT_2,
FN_ZORBA_FETCH_CONTENT_TYPE_1,
FN_PUT_2,
+ FULL_TEXT_CURRENT_LANG_0,
+ FULL_TEXT_HOST_LANG_0,
+ FULL_TEXT_IS_STEM_LANG_SUPPORTED_1,
+ FULL_TEXT_IS_STOP_WORD_1,
+ FULL_TEXT_IS_STOP_WORD_2,
+ FULL_TEXT_IS_STOP_WORD_LANG_SUPPORTED_1,
+ FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_1,
+ FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_2,
+ FULL_TEXT_IS_TOKENIZER_LANG_SUPPORTED_1,
+ FULL_TEXT_STEM_1,
+ FULL_TEXT_STEM_2,
+ FULL_TEXT_STRIP_DIACRITICS_1,
+ FULL_TEXT_THESAURUS_LOOKUP_1,
+ FULL_TEXT_THESAURUS_LOOKUP_2,
+ FULL_TEXT_THESAURUS_LOOKUP_3,
+ FULL_TEXT_THESAURUS_LOOKUP_4,
+ FULL_TEXT_THESAURUS_LOOKUP_6,
+ FULL_TEXT_TOKENIZE_1,
+ FULL_TEXT_TOKENIZE_2,
+ FULL_TEXT_TOKENIZER_PROPERTIES_0,
+ FULL_TEXT_TOKENIZER_PROPERTIES_1,
+ FULL_TEXT_TOKENIZE_STRING_1,
+ FULL_TEXT_TOKENIZE_STRING_2,
FN_FUNCTION_NAME_1,
FN_FUNCTION_ARITY_1,
FN_PARTIAL_APPLY_2,
=== modified file 'src/runtime/full_text/CMakeLists.txt'
--- src/runtime/full_text/CMakeLists.txt 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/CMakeLists.txt 2012-04-24 20:57:30 +0000
@@ -13,6 +13,7 @@
# limitations under the License.
SET(FULLTEXT_SRCS
+ ft_util.cpp
ft_match.cpp
ft_query_item.cpp
ft_single_token_iterator.cpp
@@ -40,6 +41,7 @@
thesaurus.cpp
tokenizer.cpp
default_tokenizer.cpp
+ ft_module.cpp
)
IF (ZORBA_NO_ICU)
@@ -51,5 +53,5 @@
ADD_SRC_SUBFOLDER(FULLTEXT_SRCS stemmer LIBSTEMMER_SRCS)
IF (ZORBA_WITH_FILE_ACCESS)
- ADD_SRC_SUBFOLDER(FULLTEXT_SRCS thesauri THESAURUS_SRCS)
+ ADD_SRC_SUBFOLDER(FULLTEXT_SRCS thesauri THESAURUS_SRCS)
ENDIF (ZORBA_WITH_FILE_ACCESS)
=== modified file 'src/runtime/full_text/apply.cpp'
--- src/runtime/full_text/apply.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/apply.cpp 2012-04-24 20:57:30 +0000
@@ -26,13 +26,14 @@
#include "diagnostics/dict.h"
#include "diagnostics/xquery_diagnostics.h"
#include "store/api/item.h"
+#include "store/api/item_factory.h"
#include "store/api/store.h"
-#include "store/api/item_factory.h"
#include "system/globalenv.h"
#include "util/cxx_util.h"
#include "util/indent.h"
#include "util/stl_util.h"
#include "zorbamisc/ns_consts.h"
+#include "zorbautils/locale.h"
#ifndef NDEBUG
# include "system/properties.h"
@@ -1184,11 +1185,10 @@
{
}
- void operator()( char const *utf8_s, size_type utf8_len, size_type,
- size_type, size_type, void* ) {
- FTToken const t( utf8_s, (int)utf8_len, token_no_, lang_ );
- tokens_.push_back( t );
- }
+ // inherited
+ void item( Item const&, bool );
+ void token( char const*, size_type, iso639_1::type, size_type, size_type,
+ size_type, Item const* );
private:
FTTokenSeqIterator::FTTokens &tokens_;
@@ -1196,51 +1196,72 @@
iso639_1::type const lang_;
};
+void thesaurus_callback::item( Item const&, bool ) {
+ // out-of-line since it's virtual
+}
+
+void thesaurus_callback::token( char const *utf8_s, size_type utf8_len,
+ iso639_1::type, size_type, size_type,
+ size_type, Item const* ) {
+ FTToken const t( utf8_s, (int)utf8_len, token_no_, lang_ );
+ tokens_.push_back( t );
+}
+
} // anonymous namespace
void ftcontains_visitor::
-lookup_thesaurus( ftthesaurus_id const &tid, zstring const &query_phrase,
+lookup_thesaurus( ftthesaurus_id const &t_id, zstring const &query_phrase,
FTToken const &qt0, query_item_star_t &result ) {
ft_int at_least, at_most;
- if ( ftrange const *const levels = tid.get_levels() )
+ if ( ftrange const *const levels = t_id.get_levels() )
eval_ftrange( *levels, &at_least, &at_most );
else
at_least = 0, at_most = numeric_limits<ft_int>::max();
- zstring const &uri = tid.get_uri();
+ zstring const &uri = t_id.get_uri();
zstring error_msg;
auto_ptr<internal::Resource> rsrc = static_ctx_.resolve_uri(
- uri, internal::ThesaurusEntityData( qt0.lang() ), error_msg
+ uri, internal::EntityData::THESAURUS, error_msg
);
if ( !rsrc.get() )
throw XQUERY_EXCEPTION( err::FTST0018, ERROR_PARAMS( uri ) );
- internal::Thesaurus::ptr thesaurus(
- dynamic_cast<internal::Thesaurus*>( rsrc.release() )
- );
- if ( !thesaurus )
- throw XQUERY_EXCEPTION( err::FTST0018, ERROR_PARAMS( uri ) );
-
- internal::Thesaurus::iterator::ptr tresult(
+ internal::ThesaurusProvider const *const t_provider =
+ dynamic_cast<internal::ThesaurusProvider const*>( rsrc.get() );
+ ZORBA_ASSERT( t_provider );
+
+ internal::Thesaurus::ptr thesaurus;
+ if ( !t_provider->getThesaurus( qt0.lang(), &thesaurus ) )
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( iso639_1::string_of[ qt0.lang() ] )
+ );
+
+ internal::Thesaurus::iterator::ptr t_synonyms(
thesaurus->lookup(
- query_phrase, tid.get_relationship(), at_least, at_most
+ query_phrase, t_id.get_relationship(), at_least, at_most
)
);
- if ( !tresult )
+ if ( !t_synonyms )
return;
FTTokenSeqIterator::FTTokens synonyms;
thesaurus_callback cb( qt0.pos(), qt0.lang(), synonyms );
- Tokenizer::Numbers tno;
- Tokenizer::ptr tokenizer(
- GENV_STORE.getTokenizerProvider()->getTokenizer( qt0.lang(), tno )
- );
+ Tokenizer::Numbers t_num;
+ TokenizerProvider const *const provider = GENV_STORE.getTokenizerProvider();
+ ZORBA_ASSERT( provider );
+ Tokenizer::ptr tokenizer;
+ if ( !provider->getTokenizer( qt0.lang(), &t_num, &tokenizer ) )
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( iso639_1::string_of[ qt0.lang() ] )
+ );
- for ( zstring synonym; tresult->next( &synonym ); ) {
+ for ( zstring synonym; t_synonyms->next( &synonym ); ) {
synonyms.clear();
- tokenizer->tokenize(
+ tokenizer->tokenize_string(
synonym.data(), synonym.size(), qt0.lang(), false, cb
);
query_item_t const query_item( new FTTokenSeqIterator( synonyms ) );
=== added file 'src/runtime/full_text/ft_module_impl.cpp'
--- src/runtime/full_text/ft_module_impl.cpp 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_impl.cpp 2012-04-24 20:57:30 +0000
@@ -0,0 +1,843 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <zorba/config.h>
+
+#ifndef ZORBA_NO_FULL_TEXT
+
+# include <limits>
+# include <typeinfo>
+
+# include <zorba/diagnostic_list.h>
+
+# include "api/unmarshaller.h"
+# include "context/namespace_context.h"
+# include "context/static_context.h"
+# include "diagnostics/assert.h"
+# include "diagnostics/xquery_diagnostics.h"
+# include "store/api/index.h"
+# include "store/api/item.h"
+# include "store/api/item_factory.h"
+# include "store/api/iterator.h"
+# include "store/api/store.h"
+# include "system/globalenv.h"
+# include "types/casting.h"
+# include "types/typeimpl.h"
+# include "types/typeops.h"
+# include "util/utf8_util.h"
+# include "zorbatypes/URI.h"
+# include "zorbautils/locale.h"
+
+# include "ft_stop_words_set.h"
+# include "ft_token_seq_iterator.h"
+# include "ft_util.h"
+# include "thesaurus.h"
+
+#endif /* ZORBA_NO_FULL_TEXT */
+
+#include "runtime/full_text/ft_module.h"
+
+using namespace std;
+using namespace zorba::locale;
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef ZORBA_NO_FULL_TEXT
+inline iso639_1::type get_lang_from( static_context const *sctx ) {
+ iso639_1::type const lang = get_lang_from( sctx->get_match_options() );
+ return lang ? lang : get_host_lang();
+}
+
+static iso639_1::type get_lang_from( store::Item_t lang_item,
+ QueryLoc const &loc ) {
+ zstring lang_string;
+ lang_item->getStringValue2( lang_string );
+
+ if ( !GenericCast::instance()->castableToLanguage( lang_string ) )
+ throw XQUERY_EXCEPTION(
+ err::XPTY0004,
+ ERROR_PARAMS(
+ ZED( BadType_23o ), lang_string, ZED( NoCastTo_45o ), "xs:language"
+ ),
+ ERROR_LOC( loc )
+ );
+ if ( iso639_1::type const lang = find_lang( lang_string.c_str() ) )
+ return lang;
+ throw XQUERY_EXCEPTION(
+ err::FTST0009, ERROR_PARAMS( lang_string ), ERROR_LOC( loc )
+ );
+}
+#endif /* ZORBA_NO_FULL_TEXT */
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool CurrentLangIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ iso639_1::type const lang = get_lang_from( getStaticContext() );
+ zstring lang_string( iso639_1::string_of[ lang ] );
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ GENV_ITEMFACTORY->createLanguage( result, lang_string );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool HostLangIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ iso639_1::type const lang = get_host_lang();
+ zstring lang_string = iso639_1::string_of[ lang ];
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ GENV_ITEMFACTORY->createLanguage( result, lang_string );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsStemLangSupportedIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ bool is_supported;
+ store::Item_t item;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ consumeNext( item, theChildren[0], plan_state );
+ try {
+ internal::StemmerProvider const *const provider =
+ GENV_STORE.getStemmerProvider();
+ is_supported = provider->getStemmer( get_lang_from( item, loc ) );
+ }
+ catch ( XQueryException const &e ) {
+ if ( e.diagnostic() != err::FTST0009 )
+ throw;
+ is_supported = false;
+ }
+
+ GENV_ITEMFACTORY->createBoolean( result, is_supported );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsStopWordIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ store::Item_t item;
+ iso639_1::type lang;
+ ft_stop_words_set::ptr stop_words;
+ zstring word;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ lang = get_lang_from( getStaticContext() );
+
+ consumeNext( item, theChildren[0], plan_state );
+ item->getStringValue2( word );
+
+ if ( theChildren.size() > 1 ) {
+ consumeNext( item, theChildren[1], plan_state );
+ lang = get_lang_from( item, loc );
+ }
+
+ stop_words.reset( ft_stop_words_set::get_default( lang ) );
+ if ( !stop_words )
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8405_STOP_WORDS_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( lang ),
+ ERROR_LOC( loc )
+ );
+ GENV_ITEMFACTORY->createBoolean( result, stop_words->contains( word ) );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsStopWordLangSupportedIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ bool is_supported;
+ store::Item_t item;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ consumeNext( item, theChildren[0], plan_state );
+ try {
+ is_supported = ft_stop_words_set::get_default( get_lang_from( item, loc ) );
+ }
+ catch ( XQueryException const &e ) {
+ if ( e.diagnostic() != err::FTST0009 )
+ throw;
+ is_supported = false;
+ }
+
+ GENV_ITEMFACTORY->createBoolean( result, is_supported );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsThesaurusLangSupportedIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ bool is_supported;
+ store::Item_t item;
+ zstring uri;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ consumeNext( item, theChildren[0], plan_state );
+ if ( theChildren.size() > 1 ) {
+ item->getStringValue2( uri );
+ consumeNext( item, theChildren[1], plan_state );
+ } else {
+ uri = "##default";
+ }
+
+ try {
+ iso639_1::type const lang = get_lang_from( item, loc );
+ static_context const *const sctx = getStaticContext();
+
+ vector<zstring> comp_uris;
+ sctx->get_component_uris(
+ uri, internal::EntityData::THESAURUS, comp_uris
+ );
+ if ( comp_uris.size() != 1 )
+ throw XQUERY_EXCEPTION(
+ err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+ );
+
+ zstring error_msg;
+ auto_ptr<internal::Resource> rsrc = sctx->resolve_uri(
+ comp_uris.front(), internal::EntityData::THESAURUS, error_msg
+ );
+ if ( !rsrc.get() )
+ throw XQUERY_EXCEPTION(
+ err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+ );
+#if 0
+ if ( !error_msg.empty() )
+ cerr << "error_msg=" << error_msg << endl;
+#endif
+ internal::ThesaurusProvider const *const provider =
+ dynamic_cast<internal::ThesaurusProvider const*>( rsrc.get() );
+ ZORBA_ASSERT( provider );
+ is_supported = provider->getThesaurus( lang );
+ }
+ catch ( XQueryException const &e ) {
+ if ( e.diagnostic() != err::FTST0009 /* lang not supported by Zorba */ )
+ throw;
+ is_supported = false;
+ }
+
+ GENV_ITEMFACTORY->createBoolean( result, is_supported );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsTokenizerLangSupportedIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ bool is_supported;
+ store::Item_t item;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ consumeNext( item, theChildren[0], plan_state );
+ try {
+ TokenizerProvider const *const p = GENV_STORE.getTokenizerProvider();
+ is_supported = p && p->getTokenizer( get_lang_from( item, loc ) );
+ }
+ catch ( XQueryException const &e ) {
+ if ( e.diagnostic() != err::FTST0009 )
+ throw;
+ is_supported = false;
+ }
+
+ GENV_ITEMFACTORY->createBoolean( result, is_supported );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool StemIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ store::Item_t item;
+ iso639_1::type lang;
+ internal::StemmerProvider const *provider;
+ internal::Stemmer::ptr stemmer;
+ zstring word, stem;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ lang = get_lang_from( getStaticContext() );
+
+ consumeNext( item, theChildren[0], plan_state );
+ item->getStringValue2( word );
+ utf8::to_lower( word );
+
+ if ( theChildren.size() > 1 ) {
+ consumeNext( item, theChildren[1], plan_state );
+ lang = get_lang_from( item, loc );
+ }
+
+ // TODO: why is this always the default StemmerProvider?
+ provider = GENV_STORE.getStemmerProvider();
+ ZORBA_ASSERT( provider );
+ if ( provider->getStemmer( lang, &stemmer ) ) {
+ stemmer->stem( word, lang, &stem );
+ GENV_ITEMFACTORY->createString( result, stem );
+ STACK_PUSH( true, state );
+ } else {
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8404_STEM_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( lang ),
+ ERROR_LOC( loc )
+ );
+ }
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool StripDiacriticsIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ store::Item_t item;
+ zstring phrase, stripped_phrase;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ consumeNext( item, theChildren[0], plan_state );
+ item->getStringValue2( phrase );
+ utf8::strip_diacritics( phrase, &stripped_phrase );
+ GENV_ITEMFACTORY->createString( result, stripped_phrase );
+ STACK_PUSH( true, state );
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool ThesaurusLookupIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ vector<zstring> comp_uris;
+ zstring error_msg;
+ store::Item_t item;
+ iso639_1::type lang;
+ auto_ptr<internal::Resource> rsrc;
+ zstring uri = "##default";
+ static_context const *sctx;
+ zstring synonym;
+ internal::ThesaurusProvider const *provider;
+
+ ThesaurusLookupIteratorState *state;
+ DEFAULT_STACK_INIT( ThesaurusLookupIteratorState, state, plan_state );
+
+ sctx = getStaticContext();
+ lang = get_lang_from( sctx );
+ state->at_least_ = 0;
+ state->at_most_ = numeric_limits<internal::Thesaurus::level_type>::max();
+
+ if ( theChildren.size() == 1 ) {
+ consumeNext( item, theChildren[0], plan_state );
+ item->getStringValue2( state->phrase_ );
+ } else if ( theChildren.size() > 1 ) {
+ consumeNext( item, theChildren[0], plan_state );
+ item->getStringValue2( uri );
+ consumeNext( item, theChildren[1], plan_state );
+ item->getStringValue2( state->phrase_ );
+ if ( theChildren.size() > 2 ) {
+ consumeNext( item, theChildren[2], plan_state );
+ lang = get_lang_from( item, loc );
+ if ( theChildren.size() > 3 ) {
+ consumeNext( item, theChildren[3], plan_state );
+ item->getStringValue2( state->relationship_ );
+ if ( theChildren.size() > 4 ) {
+ ZORBA_ASSERT( theChildren.size() == 6 );
+ consumeNext( item, theChildren[4], plan_state );
+ state->at_least_ = to_ft_int( item->getIntegerValue() );
+ consumeNext( item, theChildren[5], plan_state );
+ state->at_most_ = to_ft_int( item->getIntegerValue() );
+ }
+ }
+ }
+ }
+
+ sctx->get_component_uris(
+ uri, internal::EntityData::THESAURUS, comp_uris
+ );
+ if ( comp_uris.size() != 1 )
+ throw XQUERY_EXCEPTION(
+ err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+ );
+
+ rsrc = sctx->resolve_uri(
+ comp_uris.front(), internal::EntityData::THESAURUS, error_msg
+ );
+ if ( !rsrc.get() )
+ throw XQUERY_EXCEPTION(
+ err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+ );
+
+ provider = dynamic_cast<internal::ThesaurusProvider const*>( rsrc.get() );
+ ZORBA_ASSERT( provider );
+ if ( !provider->getThesaurus( lang, &state->thesaurus_ ) )
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( lang ),
+ ERROR_LOC( loc )
+ );
+
+ state->tresult_ = std::move(
+ state->thesaurus_->lookup(
+ state->phrase_, state->relationship_, state->at_least_, state->at_most_
+ )
+ );
+ ZORBA_ASSERT( state->tresult_.get() );
+
+ while ( state->tresult_->next( &synonym ) ) {
+ GENV_ITEMFACTORY->createString( result, synonym );
+ STACK_PUSH( true, state );
+ }
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+void ThesaurusLookupIterator::resetImpl( PlanState &plan_state ) const {
+#ifndef ZORBA_NO_FULL_TEXT
+ NaryBaseIterator<ThesaurusLookupIterator,ThesaurusLookupIteratorState>::
+ resetImpl( plan_state );
+ ThesaurusLookupIteratorState *const state =
+ StateTraitsImpl<ThesaurusLookupIteratorState>::getState(
+ plan_state, this->theStateOffset
+ );
+ state->tresult_ = std::move(
+ state->thesaurus_->lookup(
+ state->phrase_, state->relationship_, state->at_least_, state->at_most_
+ )
+ );
+ ZORBA_ASSERT( state->tresult_.get() );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool TokenizeIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ store::Item_t attr_name, attr_node;
+ zstring base_uri;
+ store::Item_t item;
+ iso639_1::type lang;
+ Tokenizer::Numbers no;
+ store::NsBindings const ns_bindings;
+ static_context const *sctx;
+ TokenizerProvider const *tokenizer_provider;
+ store::Item_t type_name;
+ zstring value_string;
+
+ sctx = getStaticContext();
+
+ TokenizeIteratorState *state;
+ DEFAULT_STACK_INIT( TokenizeIteratorState, state, plan_state );
+
+ lang = get_lang_from( sctx );
+
+ if ( consumeNext( state->doc_item_, theChildren[0], plan_state ) ) {
+ if ( theChildren.size() > 1 ) {
+ consumeNext( item, theChildren[1], plan_state );
+ lang = get_lang_from( item, loc );
+ }
+
+ tokenizer_provider = GENV_STORE.getTokenizerProvider();
+ state->doc_tokens_ =
+ state->doc_item_->getTokens( *tokenizer_provider, no, lang );
+
+ while ( state->doc_tokens_->hasNext() ) {
+ FTToken const *token;
+ token = state->doc_tokens_->next();
+ ZORBA_ASSERT( token );
+
+ if ( state->token_qname_.isNull() )
+ GENV_ITEMFACTORY->createQName(
+ state->token_qname_, static_context::ZORBA_FULL_TEXT_FN_NS, "",
+ "token"
+ );
+
+ base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createElementNode(
+ result, nullptr, state->token_qname_, type_name, false, false,
+ ns_bindings, base_uri
+ );
+
+ if ( token->lang() ) {
+ value_string = iso639_1::string_of[ token->lang() ];
+ GENV_ITEMFACTORY->createQName( attr_name, "", "", "lang" );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, attr_name, type_name, item
+ );
+ }
+
+ ztd::to_string( token->para(), &value_string );
+ GENV_ITEMFACTORY->createQName( attr_name, "", "", "paragraph" );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, attr_name, type_name, item
+ );
+
+ ztd::to_string( token->sent(), &value_string );
+ GENV_ITEMFACTORY->createQName( attr_name, "", "", "sentence" );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, attr_name, type_name, item
+ );
+
+ value_string = token->value();
+ GENV_ITEMFACTORY->createQName( attr_name, "", "", "value" );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, attr_name, type_name, item
+ );
+
+ if ( store::Item const *const token_item = token->item() ) {
+ if ( GENV_STORE.getNodeReference( item, token_item ) ) {
+ item->getStringValue2( value_string );
+ GENV_ITEMFACTORY->createQName( attr_name, "", "", "node-ref" );
+ GENV_ITEMFACTORY->createString( item, value_string );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode(
+ attr_node, result, attr_name, type_name, item
+ );
+ }
+ }
+
+#ifndef ZORBA_NO_XMLSCHEMA
+ sctx->validate( result, result, StaticContextConsts::strict_validation );
+#endif /* ZORBA_NO_XMLSCHEMA */
+
+ STACK_PUSH( true, state );
+ } // while
+ }
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+void TokenizeIterator::resetImpl( PlanState &plan_state ) const {
+#ifndef ZORBA_NO_FULL_TEXT
+ NaryBaseIterator<TokenizeIterator,TokenizeIteratorState>::
+ resetImpl( plan_state );
+ TokenizeIteratorState *const state =
+ StateTraitsImpl<TokenizeIteratorState>::getState(
+ plan_state, this->theStateOffset
+ );
+ state->doc_tokens_->reset();
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool TokenizerPropertiesIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ store::Item_t element, item, junk, name;
+ zstring base_uri;
+ iso639_1::type lang;
+ Tokenizer::Numbers no;
+ store::NsBindings const ns_bindings;
+ static_context const *sctx;
+ Tokenizer::ptr tokenizer;
+ store::Item_t type_name;
+ Tokenizer::Properties props;
+ TokenizerProvider const *tokenizer_provider;
+ zstring value_string;
+
+ PlanIteratorState *state;
+ DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+ sctx = getStaticContext();
+ lang = get_lang_from( getStaticContext() );
+
+ if ( theChildren.size() > 0 ) {
+ consumeNext( item, theChildren[0], plan_state );
+ lang = get_lang_from( item, loc );
+ }
+
+ tokenizer_provider = GENV_STORE.getTokenizerProvider();
+ ZORBA_ASSERT( tokenizer_provider );
+ if ( !tokenizer_provider->getTokenizer( lang, &no, &tokenizer ) )
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( iso639_1::string_of[ lang ] )
+ );
+ tokenizer->properties( &props );
+
+ GENV_ITEMFACTORY->createQName(
+ name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "tokenizer-properties"
+ );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+ GENV_ITEMFACTORY->createElementNode(
+ result, nullptr, name, type_name, false, false, ns_bindings, base_uri
+ );
+
+ // uri="..."
+ GENV_ITEMFACTORY->createQName( name, "", "", "uri" );
+ GENV_ITEMFACTORY->createAnyURI( item, props.uri );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode( junk, result, name, type_name, item );
+
+ // <comments-separate-tokens value="..."/>
+ GENV_ITEMFACTORY->createQName(
+ name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "comments-separate-tokens"
+ );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createElementNode(
+ element, result, name, type_name, false, false, ns_bindings, base_uri
+ );
+ GENV_ITEMFACTORY->createQName( name, "", "", "value" );
+ GENV_ITEMFACTORY->createBoolean( item, props.comments_separate_tokens );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode( junk, element, name, type_name, item );
+
+ // <elements-separate-tokens value="..."/>
+ GENV_ITEMFACTORY->createQName(
+ name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "elements-separate-tokens"
+ );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createElementNode(
+ element, result, name, type_name, false, false, ns_bindings, base_uri
+ );
+ GENV_ITEMFACTORY->createQName( name, "", "", "value" );
+ GENV_ITEMFACTORY->createBoolean( item, props.elements_separate_tokens );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode( junk, element, name, type_name, item );
+
+ // <processing-instructions-separate-tokens value="..."/>
+ GENV_ITEMFACTORY->createQName(
+ name, static_context::ZORBA_FULL_TEXT_FN_NS, "",
+ "processing-instructions-separate-tokens"
+ );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createElementNode(
+ element, result, name, type_name, false, false, ns_bindings, base_uri
+ );
+ GENV_ITEMFACTORY->createQName( name, "", "", "value" );
+ GENV_ITEMFACTORY->createBoolean( item, props.processing_instructions_separate_tokens );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createAttributeNode( junk, element, name, type_name, item );
+
+ // <supported-languages>...</supported-languages>
+ GENV_ITEMFACTORY->createQName(
+ name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "supported-languages"
+ );
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createElementNode(
+ element, result, name, type_name, false, false, ns_bindings, base_uri
+ );
+
+ // <lang>...</lang>
+ FOR_EACH( Tokenizer::Properties::languages_type, i, props.languages ) {
+ store::Item_t lang_element;
+ type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+ GENV_ITEMFACTORY->createQName(
+ name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "lang"
+ );
+ GENV_ITEMFACTORY->createElementNode(
+ lang_element, element, name, type_name, false, false, ns_bindings,
+ base_uri
+ );
+ value_string = iso639_1::string_of[ *i ];
+ GENV_ITEMFACTORY->createTextNode( junk, lang_element.getp(), value_string );
+ }
+
+#ifndef ZORBA_NO_XMLSCHEMA
+ sctx->validate( result, result, StaticContextConsts::strict_validation );
+#endif /* ZORBA_NO_XMLSCHEMA */
+
+ STACK_PUSH( true, state );
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef ZORBA_NO_FULL_TEXT
+struct TokenizeStringIteratorCallback : Tokenizer::Callback {
+ void token( char const*, size_type, iso639_1::type, size_type, size_type,
+ size_type, Item const* );
+
+ FTTokenSeqIterator::FTTokens tokens_;
+};
+
+void TokenizeStringIteratorCallback::
+token( char const *utf8_s, size_type utf8_len, iso639_1::type lang,
+ size_type token_no, size_type sent_no, size_type para_no,
+ Item const *item ) {
+ store::Item const *const store_item =
+ item ? Unmarshaller::getInternalItem( *item ) : nullptr;
+
+ FTToken const token(
+ utf8_s, utf8_len, token_no, sent_no, para_no, store_item, lang
+ );
+ tokens_.push_back( token );
+}
+#endif /* ZORBA_NO_FULL_TEXT */
+
+bool TokenizeStringIterator::nextImpl( store::Item_t &result,
+ PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+ return false;
+#else
+ store::Item_t item;
+ iso639_1::type lang;
+ zstring value_string;
+
+ TokenizeStringIteratorState *state;
+ DEFAULT_STACK_INIT( TokenizeStringIteratorState, state, plan_state );
+
+ lang = get_lang_from( getStaticContext() );
+
+ if ( consumeNext( item, theChildren[0], plan_state ) ) {
+ item->getStringValue2( value_string );
+ if ( theChildren.size() > 1 ) {
+ consumeNext( item, theChildren[1], plan_state );
+ lang = get_lang_from( item, loc );
+ }
+
+ { // local scope
+ TokenizerProvider const *const tokenizer_provider =
+ GENV_STORE.getTokenizerProvider();
+ ZORBA_ASSERT( tokenizer_provider );
+ Tokenizer::Numbers no;
+ Tokenizer::ptr tokenizer;
+ if ( !tokenizer_provider->getTokenizer( lang, &no, &tokenizer ) )
+ throw XQUERY_EXCEPTION(
+ zerr::ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED,
+ ERROR_PARAMS( iso639_1::string_of[ lang ] )
+ );
+
+ TokenizeStringIteratorCallback callback;
+ tokenizer->tokenize_string(
+ value_string.data(), value_string.size(), lang, false, callback
+ );
+ state->string_tokens_.take( callback.tokens_ );
+ } // local scope
+
+ while ( state->string_tokens_.hasNext() ) {
+ FTToken const *token;
+ token = state->string_tokens_.next();
+ ZORBA_ASSERT( token );
+ value_string = token->value();
+ GENV_ITEMFACTORY->createString( result, value_string );
+ STACK_PUSH( true, state );
+ }
+ }
+
+ STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+void TokenizeStringIterator::resetImpl( PlanState &plan_state ) const {
+#ifndef ZORBA_NO_FULL_TEXT
+ NaryBaseIterator<TokenizeStringIterator,TokenizeStringIteratorState>::
+ resetImpl( plan_state );
+ TokenizeStringIteratorState *const state =
+ StateTraitsImpl<TokenizeStringIteratorState>::getState(
+ plan_state, this->theStateOffset
+ );
+ state->string_tokens_.reset();
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */
=== added file 'src/runtime/full_text/ft_module_impl.h'
--- src/runtime/full_text/ft_module_impl.h 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_impl.h 2012-04-24 20:57:30 +0000
@@ -0,0 +1,32 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef ZORBA_FT_MODULE_IMPL_H
+#define ZORBA_FT_MODULE_IMPL_H
+
+namespace zorba {
+
+class static_context;
+
+///////////////////////////////////////////////////////////////////////////////
+
+void populate_context_ft_module_impl( static_context *sctx );
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+#endif /* ZORBA_FT_MODULE_IMPL_H */
+/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/ft_query_item.h'
--- src/runtime/full_text/ft_query_item.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_query_item.h 2012-04-24 20:57:30 +0000
@@ -18,6 +18,7 @@
#define ZORBA_FULL_TEXT_FT_QUERY_ITEM_H
#include <list>
+#include <vector>
#include "store/api/ft_token_iterator.h"
@@ -59,7 +60,7 @@
void reset();
private:
- typedef std::list<Mark_t> MarkSeq;
+ typedef std::vector<Mark_t> MarkSeq;
struct LocalMark : Mark {
MarkSeq marks_;
=== modified file 'src/runtime/full_text/ft_single_token_iterator.h'
--- src/runtime/full_text/ft_single_token_iterator.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_single_token_iterator.h 2012-04-24 20:57:30 +0000
@@ -17,8 +17,6 @@
#ifndef ZORBA_FULL_TEXT_SINGLE_TOKEN_ITERATOR_H
#define ZORBA_FULL_TEXT_SINGLE_TOKEN_ITERATOR_H
-#include <list>
-
#include "store/api/ft_token_iterator.h"
namespace zorba {
=== modified file 'src/runtime/full_text/ft_stop_words_set.cpp'
--- src/runtime/full_text/ft_stop_words_set.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_stop_words_set.cpp 2012-04-24 20:57:30 +0000
@@ -72,7 +72,7 @@
///////////////////////////////////////////////////////////////////////////////
-void ft_stop_words_set::apply_word( zstring const &word, set_t &word_set,
+void ft_stop_words_set::apply_word( zstring const &word, word_set_t &word_set,
ft_stop_words_unex::type mode ) {
// TODO: should "word" be converted to lower-case?
std::cout << "applying word " << word << std::endl;
@@ -87,33 +87,33 @@
}
void ft_stop_words_set::apply_word( char const *begin, char const *end,
- set_t &word_set,
+ word_set_t &word_set,
ft_stop_words_unex::type mode ) {
- set_t::value_type const word( begin, end - begin );
+ word_set_t::value_type const word( begin, end - begin );
apply_word( word, word_set, mode );
}
-ft_stop_words_set const*
+ft_stop_words_set::ptr
ft_stop_words_set::construct( ftstop_word_option const &option,
iso639_1::type lang,
static_context const& sctx ) {
bool must_delete = false;
- set_t *word_set = nullptr; // pointless init. to stifle warning
+ word_set_t *word_set = nullptr; // pointless init. to stifle warning
switch ( option.get_mode() ) {
case ft_stop_words_mode::with:
- word_set = new set_t;
+ word_set = new word_set_t;
must_delete = true;
break;
case ft_stop_words_mode::with_default:
word_set = get_default_word_set_for( lang );
if ( !word_set ) {
// TODO: throw exception?
- return 0;
+ return ptr();
}
break;
case ft_stop_words_mode::without:
- return 0;
+ return ptr();
}
FOR_EACH( ftstop_word_option::list_t, ftsw, option.get_stop_words() ) {
@@ -122,31 +122,30 @@
if ( !uri.empty() ) {
if ( !must_delete ) {
- word_set = new set_t( *word_set );
+ word_set = new word_set_t( *word_set );
must_delete = true;
}
zstring error_msg;
std::auto_ptr<internal::Resource> rsrc =
- sctx.resolve_uri(uri, internal::EntityData::STOP_WORDS, error_msg);
- internal::StreamResource* stream_rsrc =
- dynamic_cast<internal::StreamResource*>(rsrc.get());
+ sctx.resolve_uri( uri, internal::EntityData::STOP_WORDS, error_msg );
+ internal::StreamResource *const stream_rsrc =
+ dynamic_cast<internal::StreamResource*>( rsrc.get() );
if ( !stream_rsrc ) {
// Technically this should be thrown during static analysis.
- throw ZORBA_EXCEPTION(err::FTST0008, ERROR_PARAMS(uri));
+ throw ZORBA_EXCEPTION( err::FTST0008, ERROR_PARAMS( uri ) );
}
- std::istream* stream = stream_rsrc->getStream();
+ std::istream *const stream = stream_rsrc->getStream();
bool in_word = false;
zstring cur_word;
- cur_word.reserve(128);
+ cur_word.reserve( 128 );
char c;
- while (stream->good()) {
- stream->get(c);
+ while ( stream->good() ) {
+ stream->get( c );
// Have to check for EOF *after* attempting the read
- if (stream->eof()) {
+ if ( stream->eof() )
break;
- }
if ( is_word_char( c ) ) {
if ( !in_word ) {
cur_word.clear();
@@ -167,25 +166,31 @@
ftstop_words::list_t const &word_list = (*ftsw)->get_list();
if ( !word_list.empty() ) {
if ( !must_delete ) {
- word_set = new set_t( *word_set );
+ word_set = new word_set_t( *word_set );
must_delete = true;
}
FOR_EACH( ftstop_words::list_t, word, word_list )
apply_word( *word, *word_set, mode );
}
}
- return new ft_stop_words_set( word_set, must_delete );
-}
-
-ft_stop_words_set::set_t*
+ return ptr( new ft_stop_words_set( word_set, must_delete ) );
+}
+
+ft_stop_words_set const*
+ft_stop_words_set::get_default( iso639_1::type lang ) {
+ word_set_t const *const word_set = get_default_word_set_for( lang );
+ return word_set ? new ft_stop_words_set( word_set, false ) : nullptr;
+}
+
+ft_stop_words_set::word_set_t*
ft_stop_words_set::get_default_word_set_for( iso639_1::type lang ) {
- static set_t* cached_word_sets[ iso639_1::NUM_ENTRIES ];
+ static word_set_t *cached_word_sets[ iso639_1::NUM_ENTRIES ];
if ( !lang )
lang = get_host_lang();
- set_t *&word_set = cached_word_sets[ lang ];
+ word_set_t *&word_set = cached_word_sets[ lang ];
if ( !word_set ) {
if ( ft_stop_table const table = get_table_for( lang ) ) {
- word_set = new set_t;
+ word_set = new word_set_t;
for ( ft_stop_table word = table; *word; ++word )
word_set->insert( *word );
}
=== modified file 'src/runtime/full_text/ft_stop_words_set.h'
--- src/runtime/full_text/ft_stop_words_set.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_stop_words_set.h 2012-04-24 20:57:30 +0000
@@ -20,6 +20,7 @@
#include <set>
#include <zorba/locale.h>
+#include <zorba/internal/unique_ptr.h>
#include "compiler/expression/ftnode.h"
#include "zorbatypes/zstring.h"
@@ -27,26 +28,29 @@
namespace zorba {
/**
- * An %ft_stop_words_set is (as its name suggests) a set of stop-wors.
+ * An %ft_stop_words_set is (as its name suggests) a set of stop-words.
*/
class ft_stop_words_set {
public:
+ typedef std::unique_ptr<ft_stop_words_set const> ptr;
+
~ft_stop_words_set() {
if ( delete_ )
delete word_set_;
}
/**
- * Constructs an %ft_stop_words_set.
+ * Constructs an %ft_stop_words_set for the given language.
*
* @param option The ftstop_word_option to use to possibly add or remove
* stop-words.
* @param lang The language of the stop-words.
- * @return Returns a new %ft_stop_words_set.
+ * @return Returns a new %ft_stop_words_set or \c nullptr if stop-words for
+ * \a lang are unsupported.
*/
- static ft_stop_words_set const* construct( ftstop_word_option const &option,
- locale::iso639_1::type lang,
- static_context const& sctx );
+ static ptr construct( ftstop_word_option const &option,
+ locale::iso639_1::type lang,
+ static_context const& sctx );
/**
* Checks whether this %ft_stop_words_set contains the given word.
@@ -60,22 +64,33 @@
return word_set_->find( word ) != word_set_->end();
}
+ /**
+ * Gets the default %ft_stop_words_set.
+ *
+ * @param lang The language of the stop-words.
+ * @return Returns said default or \c nullptr if stop-words for \a lang are
+ * unsupported.
+ */
+ static ft_stop_words_set const* get_default( locale::iso639_1::type lang );
+
private:
- typedef std::set<zstring> set_t;
+ typedef std::set<zstring> word_set_t;
- set_t const *const word_set_;
+ word_set_t const *const word_set_;
bool const delete_;
- ft_stop_words_set( set_t const *word_set, bool must_delete ) :
+ ft_stop_words_set( word_set_t const *word_set, bool must_delete ) :
word_set_( word_set ), delete_( must_delete )
{
}
- static void apply_word( zstring const&, set_t&, ft_stop_words_unex::type );
- static void apply_word( char const*, char const*, set_t&,
- ft_stop_words_unex::type );
-
- static set_t* get_default_word_set_for( locale::iso639_1::type );
+ static void apply_word( zstring const&, word_set_t&,
+ ft_stop_words_unex::type );
+
+ static void apply_word( char const*, char const*, word_set_t&,
+ ft_stop_words_unex::type );
+
+ static word_set_t* get_default_word_set_for( locale::iso639_1::type );
// forbid these
ft_stop_words_set( ft_stop_words_set const& );
=== modified file 'src/runtime/full_text/ft_token_matcher.cpp'
--- src/runtime/full_text/ft_token_matcher.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_matcher.cpp 2012-04-24 20:57:30 +0000
@@ -47,12 +47,12 @@
return false;
}
-inline ft_stop_words_set const* get_stop_words( ftmatch_options const &options,
- iso639_1::type lang,
- static_context const& sctx ) {
+inline ft_stop_words_set::ptr
+get_stop_words( ftmatch_options const &options, iso639_1::type lang,
+ static_context const& sctx ) {
if ( ftstop_word_option const *const sw = options.get_stop_word_option() )
return ft_stop_words_set::construct( *sw, lang, sctx );
- return nullptr;
+ return ft_stop_words_set::ptr();
}
///////////////////////////////////////////////////////////////////////////////
@@ -69,7 +69,7 @@
}
ft_token_matcher::~ft_token_matcher() {
- delete stop_words_;
+ // out-of-line since it's virtual
}
///////////////////////////////////////////////////////////////////////////////
@@ -83,8 +83,8 @@
void ft_token_matcher::match_stemmer::
operator()( string_t const &word, iso639_1::type lang,
string_t *result ) const {
- internal::Stemmer::ptr stemmer( provider_->get_stemmer( lang ) );
- if ( stemmer )
+ internal::Stemmer::ptr stemmer;
+ if ( provider_->getStemmer( lang, &stemmer ) )
stemmer->stem( word, lang, result );
else
*result = word;
=== modified file 'src/runtime/full_text/ft_token_matcher.h'
--- src/runtime/full_text/ft_token_matcher.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_matcher.h 2012-04-24 20:57:30 +0000
@@ -62,7 +62,7 @@
locale::iso639_1::type const lang_;
bool const stemming_;
match_stemmer const stemmer_;
- ft_stop_words_set const *const stop_words_;
+ ft_stop_words_set::ptr stop_words_;
bool const wildcards_;
};
=== modified file 'src/runtime/full_text/ft_token_seq_iterator.cpp'
--- src/runtime/full_text/ft_token_seq_iterator.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_seq_iterator.cpp 2012-04-24 20:57:30 +0000
@@ -25,8 +25,7 @@
namespace zorba {
FTTokenSeqIterator::FTTokenSeqIterator( FTTokens &tokens ) {
- tokens_.swap( tokens );
- pos_ = 0;
+ take( tokens );
}
FTTokenSeqIterator::~FTTokenSeqIterator() {
@@ -38,7 +37,7 @@
}
FTTokenIterator::index_t FTTokenSeqIterator::end() const {
- return (FTTokenIterator::index_t)tokens_.size();;
+ return static_cast<FTTokenIterator::index_t>( tokens_.size() );
}
bool FTTokenSeqIterator::hasNext() const {
@@ -61,5 +60,10 @@
pos_ = 0;
}
+void FTTokenSeqIterator::take( FTTokens &tokens ) {
+ tokens.swap( tokens_ );
+ pos_ = 0;
+}
+
} // namespace zorba
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/ft_token_seq_iterator.h'
--- src/runtime/full_text/ft_token_seq_iterator.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_seq_iterator.h 2012-04-24 20:57:30 +0000
@@ -33,9 +33,12 @@
public:
typedef std::vector<FTToken> FTTokens;
+ FTTokenSeqIterator() { }
FTTokenSeqIterator( FTTokens& );
~FTTokenSeqIterator();
+ void take( FTTokens& );
+
// inherited
index_t begin() const;
index_t end() const;
=== modified file 'src/runtime/full_text/ft_token_span.h'
--- src/runtime/full_text/ft_token_span.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_span.h 2012-04-24 20:57:30 +0000
@@ -20,7 +20,7 @@
#ifndef NDEBUG
#include <iostream>
#endif /* NDEBUG */
-#include <list>
+#include <vector>
#include "zorbatypes/ft_token.h"
@@ -51,7 +51,7 @@
/**
* An %ft_token_spans contains zero or more ft_token_span objects.
*/
-typedef std::list<ft_token_span> ft_token_spans;
+typedef std::vector<ft_token_span> ft_token_spans;
////////// Comparison operators ///////////////////////////////////////////////
=== added file 'src/runtime/full_text/ft_util.cpp'
--- src/runtime/full_text/ft_util.cpp 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_util.cpp 2012-04-24 20:57:30 +0000
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "stdafx.h"
+
+#include <stdexcept>
+
+#include "diagnostics/xquery_diagnostics.h"
+#include "zorbatypes/numconversions.h"
+
+#include "ft_util.h"
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+ft_int to_ft_int( xs_integer const &i ) {
+ try {
+ return to_xs_unsignedInt( i );
+ }
+ catch ( std::range_error const& ) {
+ throw XQUERY_EXCEPTION( err::FOCA0003, ERROR_PARAMS( i.toString() ) );
+ }
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/ft_util.h'
--- src/runtime/full_text/ft_util.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_util.h 2012-04-24 20:57:30 +0000
@@ -21,6 +21,7 @@
#include "compiler/expression/ftnode.h"
#include "zorbatypes/schema_types.h"
+#include "util/cxx_util.h"
#include "ft_match.h"
@@ -70,7 +71,7 @@
if ( ftthesaurus_option const *const t = options->get_thesaurus_option() )
if ( !t->no_thesaurus() )
return t;
- return 0;
+ return nullptr;
}
/**
@@ -87,6 +88,16 @@
return false;
}
+/**
+ * Attempts to convert an \c xs:integer to an \c xs:unsignedInt.
+ *
+ * @param i The \c xs:integer to convert.
+ * @return Returns the value converted to an \c xs:unsignedInt.
+ * @throws \c err::FOCA0003 if the value can not be represented as an \c
+ * xs:unsignedInt.
+ */
+ft_int to_ft_int( xs_integer const &i );
+
} // namespace zorba
#endif /* ZORBA_FULL_TEXT_UTIL_H */
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/ftcontains_visitor.cpp'
--- src/runtime/full_text/ftcontains_visitor.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ftcontains_visitor.cpp 2012-04-24 20:57:30 +0000
@@ -27,7 +27,6 @@
#include "util/cxx_util.h"
#include "util/indent.h"
#include "util/stl_util.h"
-#include "zorbatypes/numconversions.h"
#ifndef NDEBUG
#include "system/properties.h"
@@ -77,15 +76,6 @@
return d.getNumber();
}
-inline ft_int to_ft_int( xs_integer const &i ) {
- try {
- return to_xs_unsignedInt( i );
- }
- catch ( std::range_error const& ) {
- throw XQUERY_EXCEPTION( err::FOCA0003, ERROR_PARAMS( i.toString() ) );
- }
-}
-
////////// PUSH/POP ///////////////////////////////////////////////////////////
/**
=== modified file 'src/runtime/full_text/full_text.h'
--- src/runtime/full_text/full_text.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/full_text.h 2012-04-24 20:57:30 +0000
@@ -40,7 +40,7 @@
SERIALIZABLE_CLASS_CONSTRUCTOR2(FTContainsIterator,base_type);
void serialize( serialization::Archiver& );
- typedef std::list<PlanIter_t> sub_iter_list_t;
+ typedef std::vector<PlanIter_t> sub_iter_list_t;
FTContainsIterator(
static_context*,
=== modified file 'src/runtime/full_text/icu_tokenizer.cpp'
--- src/runtime/full_text/icu_tokenizer.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/icu_tokenizer.cpp 2012-04-24 20:57:30 +0000
@@ -16,6 +16,7 @@
#include "stdafx.h"
#include <cctype>
+#include <cstring>
#include <unicode/unistr.h>
#define DEBUG_TOKENIZER 0
@@ -54,6 +55,8 @@
public:
typedef Tokenizer::size_type size_type;
+ temp_token( iso639_1::type lang ) : lang_( lang ) { }
+
void append( char const *s, size_type slen ) {
value_.append( s, slen );
}
@@ -66,12 +69,14 @@
return value_.empty();
}
- void send( void *payload, Tokenizer::Callback &callback ) {
+ void send( Item const *item, Tokenizer::Callback &callback ) {
if ( !empty() ) {
# if DEBUG_TOKENIZER
cout << "TOKEN: \"" << value_ << "\" (" << pos_ << ',' << sent_ << ',' << para_ << ")\n";
# endif
- callback( value_.data(), value_.size(), pos_, sent_, para_, payload );
+ callback.token(
+ value_.data(), value_.size(), lang_, pos_, sent_, para_, item
+ );
clear();
}
}
@@ -87,6 +92,7 @@
private:
string value_;
+ iso639_1::type const lang_;
size_type pos_, sent_, para_;
};
@@ -158,6 +164,21 @@
delete this;
}
+void ICU_Tokenizer::properties( Properties *p ) const {
+ p->comments_separate_tokens = true;
+ p->elements_separate_tokens = true;
+ p->processing_instructions_separate_tokens = true;
+
+ p->languages.clear();
+ for ( int32_t n = ubrk_countAvailable(), i = 0; i < n; ++i ) {
+ if ( char const *const icu_locale = ubrk_getAvailable( i ) )
+ if ( iso639_1::type const lang = find_lang( icu_locale ) )
+ p->languages.push_back( lang );
+ }
+
+ p->uri = "http://www.zorba-xquery.com/full-text/tokenizer/icu";
+}
+
#define HANDLE_BACKSLASH() \
if ( !got_backslash ) ; else { \
got_backslash = in_wild = false; \
@@ -174,9 +195,9 @@
#define IS_WORD_BREAK(TYPE,STATUS) \
( (STATUS) >= UBRK_WORD_##TYPE && (STATUS) < UBRK_WORD_##TYPE##_LIMIT )
-void ICU_Tokenizer::tokenize( char const *utf8_s, size_type utf8_len,
- iso639_1::type lang, bool wildcards,
- Callback &callback, void *payload ) {
+void ICU_Tokenizer::tokenize_string( char const *utf8_s, size_type utf8_len,
+ iso639_1::type lang, bool wildcards,
+ Callback &callback, Item const *item ) {
ZORBA_ASSERT( lang == lang_ );
unicode::char_type *utf16_buf;
@@ -206,7 +227,7 @@
sent_it_->setText( utf16_s );
unicode::size_type sent_end = sent_it_->first(); sent_end = sent_it_->next();
- temp_token t;
+ temp_token t( lang );
// True only if the previous token was a backslash ('\').
bool got_backslash = false;
@@ -295,8 +316,8 @@
else if ( IS_WORD_BREAK( NUMBER, rule_status ) ) {
//
// "NUMBER" tokens are obviously for numbers. Note that a sequence of
- // digits containing a ',' (e.g., "1,2") is considered a single token by
- // ICU.
+ // digits containing either a '.' (e.g., "98.6") or a ',' (e.g., "1,2")
+ // are considered a single tokens by ICU.
//
# if DEBUG_TOKENIZER
cout << "(NUMBER)" << endl;
@@ -346,7 +367,7 @@
}
if ( !in_wild && !got_backslash )
- t.send( payload, callback );
+ t.send( item, callback );
set_token:
# if DEBUG_TOKENIZER
@@ -395,7 +416,7 @@
throw XQUERY_EXCEPTION(
err::FTDY0020, ERROR_PARAMS( "", ZED( UnbalancedChar_3 ), '}' )
);
- t.send( payload, callback );
+ t.send( item, callback );
// Incrementing "sent" here fixes:
// https://bugs.launchpad.net/bugs/897800
++numbers().sent;
@@ -406,10 +427,18 @@
///////////////////////////////////////////////////////////////////////////////
-Tokenizer::ptr
-ICU_TokenizerProvider::getTokenizer( iso639_1::type lang,
- Tokenizer::Numbers &no ) const {
- return Tokenizer::ptr( new ICU_Tokenizer( lang, no ) );
+bool ICU_TokenizerProvider::getTokenizer( iso639_1::type lang,
+ Tokenizer::Numbers *num,
+ Tokenizer::ptr *t ) const {
+ for ( int32_t n = ubrk_countAvailable(), i = 0; i < n; ++i ) {
+ if ( char const *const icu_locale = ubrk_getAvailable( i ) )
+ if ( lang == find_lang( icu_locale ) ) {
+ if ( num && t )
+ t->reset( new ICU_Tokenizer( lang, *num ) );
+ return true;
+ }
+ }
+ return false;
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/runtime/full_text/icu_tokenizer.h'
--- src/runtime/full_text/icu_tokenizer.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/icu_tokenizer.h 2012-04-24 20:57:30 +0000
@@ -48,8 +48,9 @@
// inherited
void destroy() const;
- void tokenize( char const*, size_type, locale::iso639_1::type, bool,
- Callback&, void* );
+ void properties( Properties* ) const;
+ void tokenize_string( char const*, size_type, locale::iso639_1::type, bool,
+ Callback&, Item const* );
private:
typedef std::unique_ptr<RuleBasedBreakIterator> rbbi_ptr;
@@ -63,10 +64,11 @@
class ICU_TokenizerProvider : public TokenizerProvider {
public:
- ICU_TokenizerProvider () {}
+ ICU_TokenizerProvider() { } // needed to work-around compiler bug
+
// inherited
- Tokenizer::ptr
- getTokenizer( locale::iso639_1::type, Tokenizer::Numbers& ) const;
+ bool getTokenizer( locale::iso639_1::type, Tokenizer::Numbers* = 0,
+ Tokenizer::ptr* = 0 ) const;
};
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/runtime/full_text/latin_tokenizer.cpp'
--- src/runtime/full_text/latin_tokenizer.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/latin_tokenizer.cpp 2012-04-24 20:57:30 +0000
@@ -82,15 +82,26 @@
return ++s < end ? *s : '\0';
}
+void LatinTokenizer::properties( Properties *p ) const {
+ p->comments_separate_tokens = true;
+ p->elements_separate_tokens = true;
+ p->processing_instructions_separate_tokens = true;
+
+ p->languages.clear();
+ p->languages.push_back( iso639_1::en );
+
+ p->uri = "http://www.zorba-xquery.com/full-text/tokenizer/latin";
+}
+
#define HANDLE_BACKSLASH() \
if ( !got_backslash ) ; else { \
got_backslash = in_wild = false; \
break; \
}
-void LatinTokenizer::tokenize( char const *s, size_type s_len,
- iso639_1::type lang, bool wildcards,
- Callback &callback, void *payload ) {
+void LatinTokenizer::tokenize_string( char const *s, size_type s_len,
+ iso639_1::type lang, bool wildcards,
+ Callback &callback, Item const *item ) {
bool got_backslash = false;
bool in_wild = false;
string_type token;
@@ -167,7 +178,7 @@
} else {
if ( is_word_char( *s ) )
token += *s;
- else if ( send_token( token, callback, payload ) ) {
+ else if ( send_token( token, lang, callback, item ) ) {
token.clear();
t_type_ = t_generic;
}
@@ -203,13 +214,13 @@
}
} // for
- send_token( token, callback, payload );
+ send_token( token, lang, callback, item );
}
#define PRINT_TOKENS 0
-bool LatinTokenizer::send_token( string_type const &token,
- Callback &callback, void *payload ) {
+bool LatinTokenizer::send_token( string_type const &token, iso639_1::type lang,
+ Callback &callback, Item const *item ) {
if ( !token.empty() ) {
#if PRINT_TOKENS
cout << "t=" << setw(2) << numbers().token
@@ -219,8 +230,8 @@
#endif /* PRINT_TOKENS */
callback(
- token.data(), token.size(),
- numbers().token, numbers().sent, numbers().para, payload
+ token.data(), token.size(), lang,
+ numbers().token, numbers().sent, numbers().para, item
);
++numbers().token;
return true;
@@ -230,10 +241,17 @@
///////////////////////////////////////////////////////////////////////////////
-Tokenizer::ptr
-LatinTokenizerProvider::getTokenizer( iso639_1::type lang,
- Tokenizer::Numbers &num ) const {
- return Tokenizer::ptr( new LatinTokenizer( num ) );
+bool LatinTokenizerProvider::getTokenizer( iso639_1::type lang,
+ Tokenizer::Numbers *num,
+ Tokenizer::ptr *t ) const {
+ switch ( lang ) {
+ case iso639_1::en:
+ if ( num && t )
+ t->reset( new LatinTokenizer( *num ) );
+ return true;
+ default:
+ return false;
+ }
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/runtime/full_text/latin_tokenizer.h'
--- src/runtime/full_text/latin_tokenizer.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/latin_tokenizer.h 2012-04-24 20:57:30 +0000
@@ -38,8 +38,9 @@
// inherited
void destroy() const;
- void tokenize( char const*, size_type, locale::iso639_1::type, bool,
- Callback&, void* );
+ void properties( Properties* ) const;
+ void tokenize_string( char const*, size_type, iso639_1::type, bool, Callback&,
+ Item const* );
private:
typedef zstring string_type;
@@ -56,7 +57,8 @@
static bool is_word_begin_char( char );
bool is_word_char( char );
static char peek( char const *s, char const *end );
- bool send_token( string_type const &token, Callback&, void* );
+ bool send_token( string_type const &token, locale::iso639_1::type, Callback&,
+ Item const* );
};
///////////////////////////////////////////////////////////////////////////////
@@ -64,8 +66,8 @@
class LatinTokenizerProvider : public TokenizerProvider {
public:
// inherited
- Tokenizer::ptr getTokenizer( locale::iso639_1::type,
- Tokenizer::Numbers& ) const;
+ bool getTokenizer( locale::iso639_1::type, Tokenizer::Numbers* = 0,
+ Tokenizer::ptr* = 0 ) const;
};
///////////////////////////////////////////////////////////////////////////////
=== added directory 'src/runtime/full_text/pregenerated'
=== added file 'src/runtime/full_text/pregenerated/ft_module.cpp'
--- src/runtime/full_text/pregenerated/ft_module.cpp 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/pregenerated/ft_module.cpp 2012-04-24 20:57:30 +0000
@@ -0,0 +1,362 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// ******************************************
+// * *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME *
+// * *
+// ******************************************
+
+#include "stdafx.h"
+#include "zorbatypes/rchandle.h"
+#include "zorbatypes/zstring.h"
+#include "runtime/visitors/planiter_visitor.h"
+#include "runtime/full_text/ft_module.h"
+#include "system/globalenv.h"
+
+
+#include "store/api/iterator.h"
+
+namespace zorba {
+
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentLangIterator>
+CurrentLangIterator::class_factory<CurrentLangIterator>
+CurrentLangIterator::g_class_factory;
+
+
+void CurrentLangIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+CurrentLangIterator::~CurrentLangIterator() {}
+
+// </CurrentLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <HostLangIterator>
+HostLangIterator::class_factory<HostLangIterator>
+HostLangIterator::g_class_factory;
+
+
+void HostLangIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+HostLangIterator::~HostLangIterator() {}
+
+// </HostLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStemLangSupportedIterator>
+IsStemLangSupportedIterator::class_factory<IsStemLangSupportedIterator>
+IsStemLangSupportedIterator::g_class_factory;
+
+
+void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
+
+// </IsStemLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordIterator>
+IsStopWordIterator::class_factory<IsStopWordIterator>
+IsStopWordIterator::g_class_factory;
+
+
+void IsStopWordIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsStopWordIterator::~IsStopWordIterator() {}
+
+// </IsStopWordIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordLangSupportedIterator>
+IsStopWordLangSupportedIterator::class_factory<IsStopWordLangSupportedIterator>
+IsStopWordLangSupportedIterator::g_class_factory;
+
+
+void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
+
+// </IsStopWordLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsThesaurusLangSupportedIterator>
+IsThesaurusLangSupportedIterator::class_factory<IsThesaurusLangSupportedIterator>
+IsThesaurusLangSupportedIterator::g_class_factory;
+
+
+void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
+
+// </IsThesaurusLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsTokenizerLangSupportedIterator>
+IsTokenizerLangSupportedIterator::class_factory<IsTokenizerLangSupportedIterator>
+IsTokenizerLangSupportedIterator::g_class_factory;
+
+
+void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
+
+// </IsTokenizerLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StemIterator>
+StemIterator::class_factory<StemIterator>
+StemIterator::g_class_factory;
+
+
+void StemIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+StemIterator::~StemIterator() {}
+
+// </StemIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StripDiacriticsIterator>
+StripDiacriticsIterator::class_factory<StripDiacriticsIterator>
+StripDiacriticsIterator::g_class_factory;
+
+
+void StripDiacriticsIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+StripDiacriticsIterator::~StripDiacriticsIterator() {}
+
+// </StripDiacriticsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <ThesaurusLookupIterator>
+ThesaurusLookupIterator::class_factory<ThesaurusLookupIterator>
+ThesaurusLookupIterator::g_class_factory;
+
+
+void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
+
+ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
+
+ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
+
+
+void ThesaurusLookupIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </ThesaurusLookupIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeIterator>
+TokenizeIterator::class_factory<TokenizeIterator>
+TokenizeIterator::g_class_factory;
+
+
+void TokenizeIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizeIterator::~TokenizeIterator() {}
+
+TokenizeIteratorState::TokenizeIteratorState() {}
+
+TokenizeIteratorState::~TokenizeIteratorState() {}
+
+
+void TokenizeIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </TokenizeIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizerPropertiesIterator>
+TokenizerPropertiesIterator::class_factory<TokenizerPropertiesIterator>
+TokenizerPropertiesIterator::g_class_factory;
+
+
+void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
+
+// </TokenizerPropertiesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeStringIterator>
+TokenizeStringIterator::class_factory<TokenizeStringIterator>
+TokenizeStringIterator::g_class_factory;
+
+
+void TokenizeStringIterator::accept(PlanIterVisitor& v) const {
+ v.beginVisit(*this);
+
+ std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+ std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+ for ( ; lIter != lEnd; ++lIter ){
+ (*lIter)->accept(v);
+ }
+
+ v.endVisit(*this);
+}
+
+TokenizeStringIterator::~TokenizeStringIterator() {}
+
+TokenizeStringIteratorState::TokenizeStringIteratorState() {}
+
+TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
+
+
+void TokenizeStringIteratorState::reset(PlanState& planState) {
+ PlanIteratorState::reset(planState);
+}
+// </TokenizeStringIterator>
+
+#endif
+
+}
+
+
=== added file 'src/runtime/full_text/pregenerated/ft_module.h'
--- src/runtime/full_text/pregenerated/ft_module.h 1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/pregenerated/ft_module.h 2012-04-24 20:57:30 +0000
@@ -0,0 +1,561 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// ******************************************
+// * *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME *
+// * *
+// ******************************************
+#ifndef ZORBA_RUNTIME_FULL_TEXT_FT_MODULE_H
+#define ZORBA_RUNTIME_FULL_TEXT_FT_MODULE_H
+
+
+#include "common/shared_types.h"
+
+
+
+#include "runtime/base/narybase.h"
+#include "runtime/full_text/ft_token_seq_iterator.h"
+#include "runtime/full_text/thesaurus.h"
+
+
+namespace zorba {
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class CurrentLangIterator : public NaryBaseIterator<CurrentLangIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(CurrentLangIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(CurrentLangIterator,
+ NaryBaseIterator<CurrentLangIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
+ }
+
+ CurrentLangIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<CurrentLangIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~CurrentLangIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class HostLangIterator : public NaryBaseIterator<HostLangIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(HostLangIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(HostLangIterator,
+ NaryBaseIterator<HostLangIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
+ }
+
+ HostLangIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<HostLangIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~HostLangIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class IsStemLangSupportedIterator : public NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(IsStemLangSupportedIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsStemLangSupportedIterator,
+ NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
+ }
+
+ IsStemLangSupportedIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~IsStemLangSupportedIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class IsStopWordIterator : public NaryBaseIterator<IsStopWordIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(IsStopWordIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsStopWordIterator,
+ NaryBaseIterator<IsStopWordIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
+ }
+
+ IsStopWordIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<IsStopWordIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~IsStopWordIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class IsStopWordLangSupportedIterator : public NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(IsStopWordLangSupportedIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsStopWordLangSupportedIterator,
+ NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
+ }
+
+ IsStopWordLangSupportedIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~IsStopWordLangSupportedIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class IsThesaurusLangSupportedIterator : public NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(IsThesaurusLangSupportedIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsThesaurusLangSupportedIterator,
+ NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
+ }
+
+ IsThesaurusLangSupportedIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~IsThesaurusLangSupportedIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class IsTokenizerLangSupportedIterator : public NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(IsTokenizerLangSupportedIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsTokenizerLangSupportedIterator,
+ NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
+ }
+
+ IsTokenizerLangSupportedIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~IsTokenizerLangSupportedIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class StemIterator : public NaryBaseIterator<StemIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(StemIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(StemIterator,
+ NaryBaseIterator<StemIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
+ }
+
+ StemIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<StemIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~StemIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class StripDiacriticsIterator : public NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(StripDiacriticsIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(StripDiacriticsIterator,
+ NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
+ }
+
+ StripDiacriticsIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~StripDiacriticsIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class ThesaurusLookupIteratorState : public PlanIteratorState
+{
+public:
+ zstring phrase_; //
+ zstring relationship_; //
+ internal::Thesaurus::level_type at_least_; //
+ internal::Thesaurus::level_type at_most_; //
+ internal::Thesaurus::ptr thesaurus_; //
+ internal::Thesaurus::iterator::ptr tresult_; //
+
+ ThesaurusLookupIteratorState();
+
+ ~ThesaurusLookupIteratorState();
+
+ void reset(PlanState&);
+};
+
+class ThesaurusLookupIterator : public NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(ThesaurusLookupIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(ThesaurusLookupIterator,
+ NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
+ }
+
+ ThesaurusLookupIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~ThesaurusLookupIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+ void resetImpl(PlanState&) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class TokenizeIteratorState : public PlanIteratorState
+{
+public:
+ store::Item_t doc_item_; //
+ FTTokenIterator_t doc_tokens_; //
+ store::Item_t token_qname_; //
+
+ TokenizeIteratorState();
+
+ ~TokenizeIteratorState();
+
+ void reset(PlanState&);
+};
+
+class TokenizeIterator : public NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(TokenizeIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeIterator,
+ NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>*)this);
+ }
+
+ TokenizeIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~TokenizeIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+ void resetImpl(PlanState&) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class TokenizerPropertiesIterator : public NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(TokenizerPropertiesIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizerPropertiesIterator,
+ NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
+ }
+
+ TokenizerPropertiesIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~TokenizerPropertiesIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ *
+ * Author:
+ */
+class TokenizeStringIteratorState : public PlanIteratorState
+{
+public:
+ FTTokenSeqIterator string_tokens_; //
+
+ TokenizeStringIteratorState();
+
+ ~TokenizeStringIteratorState();
+
+ void reset(PlanState&);
+};
+
+class TokenizeStringIterator : public NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>
+{
+public:
+ SERIALIZABLE_CLASS(TokenizeStringIterator);
+
+ SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeStringIterator,
+ NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>);
+
+ void serialize( ::zorba::serialization::Archiver& ar)
+ {
+ serialize_baseclass(ar,
+ (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
+ }
+
+ TokenizeStringIterator(
+ static_context* sctx,
+ const QueryLoc& loc,
+ std::vector<PlanIter_t>& children)
+ :
+ NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>(sctx, loc, children)
+ {}
+
+ virtual ~TokenizeStringIterator();
+
+ void accept(PlanIterVisitor& v) const;
+
+ bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+ void resetImpl(PlanState&) const;
+};
+
+#endif
+
+}
+#endif
+/*
+ * Local variables:
+ * mode: c++
+ * End:
+ */
=== modified file 'src/runtime/full_text/stemmer.cpp'
--- src/runtime/full_text/stemmer.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer.cpp 2012-04-24 20:57:30 +0000
@@ -43,7 +43,8 @@
return default_provider;
}
-Stemmer::ptr StemmerProvider::get_stemmer( iso639_1::type lang ) const {
+bool StemmerProvider::getStemmer( iso639_1::type lang,
+ Stemmer::ptr *result ) const {
typedef unique_ptr<SnowballStemmer const> cache_ptr;
static cache_ptr cached_stemmers[ iso639_1::NUM_ENTRIES ];
@@ -56,7 +57,12 @@
cache_ptr &ptr_ref = cached_stemmers[ lang ];
if ( !ptr_ref )
ptr_ref.reset( SnowballStemmer::create( lang ) );
- return Stemmer::ptr( ptr_ref.get() );
+ if ( ptr_ref ) {
+ if ( result )
+ result->reset( ptr_ref.get() );
+ return true;
+ }
+ return false;
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/runtime/full_text/stemmer.h'
--- src/runtime/full_text/stemmer.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer.h 2012-04-24 20:57:30 +0000
@@ -41,11 +41,28 @@
ztd::destroy_delete<Stemmer const> > ptr;
/**
+ * Various properties of this %Stemmer.
+ */
+ struct Properties {
+ /**
+ * The URI that uniquely identifies this %Stemmer.
+ */
+ char const * uri;
+ };
+
+ /**
* Destroys this %Stemmer.
*/
virtual void destroy() const = 0;
/**
+ * Gets the Properties of this %Stemmer.
+ *
+ * @param result The Properties to populate.
+ */
+ virtual void properties( Properties *result ) const = 0;
+
+ /**
* Gets the stem of the given word.
*
* @param word The word to stem.
@@ -74,13 +91,15 @@
static StemmerProvider const& get_default();
/**
- * Gets an instance of a Stemmer for the given language.
+ * Gets a Stemmer for the given language.
*
- * @param lang The language for the stemmer.
- * @return Returns said Stemmer or \c nullptr if no stemmer is availabe for
- * the given language.
+ * @param lang The language to get a Stemmer for.
+ * @param s If not \c null, set to point to a Stemmer for \a lang.
+ * @return Returns \c true only if this provider can provide a stemmer for
+ * \a lang.
*/
- virtual Stemmer::ptr get_stemmer( locale::iso639_1::type lang ) const;
+ virtual bool getStemmer( locale::iso639_1::type lang,
+ Stemmer::ptr *s = 0 ) const;
};
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/runtime/full_text/stemmer/sb_stemmer.cpp'
--- src/runtime/full_text/stemmer/sb_stemmer.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer/sb_stemmer.cpp 2012-04-24 20:57:30 +0000
@@ -32,18 +32,21 @@
static bool is_lang_supported( iso639_1::type lang ) {
using namespace iso639_1;
switch ( lang ) {
- case da:
- case de:
- case en:
- case es:
- case fi:
- case hu:
- case it:
- case nl:
- case no:
- case pt:
- case sv:
- case ru:
+ case da: // Danish
+ case de: // German
+ case en: // English
+ case es: // Spanish
+ case fi: // Finnish
+ case fr: // French
+ case hu: // Hungarian
+ case it: // Italian
+ case nl: // Dutch
+ case no: // Norwegian
+ case pt: // Portuguese
+ case ro: // Romanian
+ case ru: // Russian
+ case sv: // Swedish
+ case tr: // Turkish
return true;
default:
return false;
@@ -70,7 +73,11 @@
// Do nothing since built-in stemmers are cached for re-use.
}
-void SnowballStemmer::stem( zstring const &word, iso639_1::type lang,
+void SnowballStemmer::properties( Properties *p ) const {
+ p->uri = "http://www.zorba-xquery.com/full-text/stemmer/snowball";
+}
+
+void SnowballStemmer::stem( zstring const &word, iso639_1::type,
zstring *result ) const {
//
// We need a mutex since the libstemmer library is not thread-safe.
=== modified file 'src/runtime/full_text/stemmer/sb_stemmer.h'
--- src/runtime/full_text/stemmer/sb_stemmer.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer/sb_stemmer.h 2012-04-24 20:57:30 +0000
@@ -43,6 +43,7 @@
// inherited
void destroy() const;
+ void properties( Properties* ) const;
void stem( zstring const &word, locale::iso639_1::type lang,
zstring *result ) const;
=== modified file 'src/runtime/full_text/thesauri/wn_thesaurus.cpp'
--- src/runtime/full_text/thesauri/wn_thesaurus.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/wn_thesaurus.cpp 2012-04-24 20:57:30 +0000
@@ -23,6 +23,8 @@
#include <zorba/util/path.h>
+#include <context/static_context.h>
+
#include "util/cxx_util.h"
#include "util/fs_util.h"
#include "util/less.h"
@@ -56,6 +58,18 @@
////////// Helper functions ///////////////////////////////////////////////////
/**
+ * Appends the name of the file Zorba uses for a WordNet thesaurus files.
+ *
+ * @param path The path to append to.
+ * @param lang The language of the thesaurus file.
+ */
+static void append_wordnet_file( zstring &path, iso639_1::type lang ) {
+ fs::append( path, "wordnet-" );
+ path += iso639_1::string_of[ lang ];
+ path += ".zth";
+}
+
+/**
* "Fixes" the "at most" parameter. The Full Text specification section 3.4.3
* says in part:
*
@@ -70,8 +84,10 @@
* broad, hence if at_most specifies "all levels" (max int), clamp it at 2
* (which seems to work well in practice).
*/
-inline ft_int fix_at_most( ft_int at_most ) {
- return at_most == numeric_limits<ft_int>::max() ? 2 : at_most;
+inline internal::Thesaurus::level_type
+fix_at_most( internal::Thesaurus::level_type at_most ) {
+ return at_most == numeric_limits<internal::Thesaurus::level_type>::max() ?
+ 2 : at_most;
}
/**
@@ -191,9 +207,7 @@
for ( bool loop = true; loop; ) {
switch ( fs::get_type( path ) ) {
case fs::directory:
- fs::append( path, "wordnet-" );
- path += iso639_1::string_of[ iso639_1::en ];
- path += ".zth";
+ append_wordnet_file( path, iso639_1::en );
break;
case fs::file:
loop = false;
@@ -216,7 +230,7 @@
*
* @param relationship The XQuery thesaurus relationship.
* @param lang The language of the relationship.
- * @return Returns the corresponding Wordnet pointer type.
+ * @return Returns the corresponding WordNet pointer type.
*/
static pointer::type map_xquery_rel( zstring const &relationship,
iso639_1::type lang ) {
@@ -233,8 +247,8 @@
thesaurus::iterator::LevelMarker = make_pair( ~0u, iso2788::neutral );
thesaurus::iterator::iterator( thesaurus const &t, char const *p,
- pointer::type ptr_type, ft_int at_least,
- ft_int at_most ) :
+ pointer::type ptr_type, level_type at_least,
+ level_type at_most ) :
thesaurus_( t ), query_ptr_type_( ptr_type ),
at_least_( at_least ), at_most_( fix_at_most( at_most ) ), level_( 0 )
{
@@ -506,7 +520,7 @@
thesaurus::iterator::ptr
thesaurus::lookup( zstring const &phrase, zstring const &relationship,
- ft_int at_least, ft_int at_most ) const {
+ level_type at_least, level_type at_most ) const {
iterator::ptr result;
# if DEBUG_FT_THESAURUS
cout << "==================================================" << endl;
@@ -524,6 +538,62 @@
///////////////////////////////////////////////////////////////////////////////
+provider::provider( zstring const &path ) : path_( path ) {
+ ZORBA_ASSERT( !path.empty() );
+}
+
+bool provider::getThesaurus( iso639_1::type lang,
+ internal::Thesaurus::ptr *t ) const {
+#ifdef ZORBA_WITH_FILE_ACCESS
+ zstring file_path;
+
+ switch ( lang ) {
+ case iso639_1::unknown:
+ lang = iso639_1::en;
+ // no break;
+ case iso639_1::en:
+ file_path = path_;
+ append_wordnet_file( file_path, lang );
+ break;
+ default:
+ return false;
+ }
+
+ //
+ // We want to look for the WordNet thesaurus file on the library path.
+ // Unfortunately every static_context can have its own library path and we
+ // don't have direct access to the query's static_context here. So, for now
+ // we only look on the root static_context's library path.
+ //
+ static_context &sctx = GENV.getRootStaticContext();
+ std::vector<zstring> lib_path_components;
+ sctx.get_full_lib_path( lib_path_components );
+ MUTATE_EACH( std::vector<zstring>, path, lib_path_components ) {
+ fs::append( *path, file_path );
+ if ( fs::get_type( *path ) == fs::file ) {
+ if ( t )
+ t->reset( new thesaurus( *path, lang ) );
+ return true;
+ }
+ }
+ return false;
+#else
+ switch ( lang ) {
+ case iso639_1::unknown:
+ lang = iso639_1::en;
+ // no break;
+ case iso639_1::en:
+ if ( t )
+ t->reset();
+ return true;
+ default:
+ return false;
+ }
+#endif /* ZORBA_WITH_FILE_ACCESS */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace wordnet
} // namespace zorba
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/thesauri/wn_thesaurus.h'
--- src/runtime/full_text/thesauri/wn_thesaurus.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/wn_thesaurus.h 2012-04-24 20:57:30 +0000
@@ -34,7 +34,7 @@
///////////////////////////////////////////////////////////////////////////////
/**
- * A %wordnet::thesaurus is an ft_thesaurus for Wordnet.
+ * A %wordnet::thesaurus is a Thesaurus for Wordnet.
* See: http://wordnet.princeton.edu/
*/
class thesaurus : public internal::Thesaurus {
@@ -44,7 +44,8 @@
// inherited
void destroy() const;
- iterator::ptr lookup( zstring const&, zstring const&, ft_int, ft_int ) const;
+ iterator::ptr lookup( zstring const&, zstring const&, level_type,
+ level_type ) const;
private:
/**
@@ -86,7 +87,7 @@
private:
iterator( thesaurus const&, char const *lemma, pointer::type,
- ft_int at_least, ft_int at_most );
+ level_type at_least, level_type at_most );
~iterator();
thesaurus const &thesaurus_;
@@ -97,8 +98,8 @@
*/
pointer::type query_ptr_type_;
- ft_int const at_least_, at_most_;
- ft_int level_;
+ level_type const at_least_, at_most_;
+ level_type level_;
typedef std::pair<synset_id_t,iso2788::rel_dir> candidate_t;
typedef std::deque<candidate_t> candidate_queue_t;
@@ -124,6 +125,29 @@
///////////////////////////////////////////////////////////////////////////////
+/**
+ * A %wordnet::provider is a ThesaurusProvider for Wordnet.
+ */
+class provider : public internal::ThesaurusProvider {
+public:
+ /**
+ * Constructs a %provider.
+ *
+ * @param path The relative path of where the wordnet-LL.zth file is located
+ * (where LL is the ISO 639-1 language code of the language).
+ */
+ provider( zstring const &path );
+
+ // inherited
+ bool getThesaurus( locale::iso639_1::type,
+ internal::Thesaurus::ptr* = nullptr ) const;
+
+private:
+ zstring const path_;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace wordnet
} // namespace zorba
=== modified file 'src/runtime/full_text/thesauri/xqftts_thesaurus.cpp'
--- src/runtime/full_text/thesauri/xqftts_thesaurus.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/xqftts_thesaurus.cpp 2012-04-24 20:57:30 +0000
@@ -60,8 +60,8 @@
make_pair( static_cast<synonym*>(0), iso2788::neutral );
thesaurus::iterator::iterator( thesaurus_t const &t, zstring const &phrase,
- zstring const &rel_string, ft_int at_least,
- ft_int at_most ) :
+ zstring const &rel_string, level_type at_least,
+ level_type at_most ) :
thesaurus_( t ), at_least_( at_least ), at_most_( at_most ), level_( 1 )
{
using namespace iso2788;
@@ -217,7 +217,7 @@
thesaurus::iterator::ptr
thesaurus::lookup( zstring const &phrase, zstring const &relationship,
- ft_int at_least, ft_int at_most ) const {
+ level_type at_least, level_type at_most ) const {
# if DEBUG_THESAURUS
cout << "==================================================" << endl;
cout << "query phrase: " << phrase << endl;
@@ -364,6 +364,31 @@
///////////////////////////////////////////////////////////////////////////////
+provider::provider( zstring const &path ) : path_( path ) {
+}
+
+bool provider::getThesaurus( iso639_1::type lang,
+ internal::Thesaurus::ptr *t ) const {
+ switch ( lang ) {
+ case iso639_1::unknown:
+ lang = iso639_1::en;
+ // no break;
+ case iso639_1::en:
+ if ( t ) {
+#ifdef ZORBA_WITH_FILE_ACCESS
+ t->reset( new thesaurus( path_, lang ) );
+#else
+ t->reset();
+#endif /* ZORBA_WITH_FILE_ACCESS */
+ }
+ return true;
+ default:
+ return false;
+ }
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace xqftts
} // namespace zorba
/* vim:set et sw=2 ts=2: */
=== modified file 'src/runtime/full_text/thesauri/xqftts_thesaurus.h'
--- src/runtime/full_text/thesauri/xqftts_thesaurus.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/xqftts_thesaurus.h 2012-04-24 20:57:30 +0000
@@ -44,7 +44,8 @@
// inherited
void destroy() const;
- iterator::ptr lookup( zstring const&, zstring const&, ft_int, ft_int ) const;
+ iterator::ptr lookup( zstring const&, zstring const&, level_type,
+ level_type ) const;
private:
//
@@ -123,13 +124,14 @@
private:
iterator( thesaurus_t const&, zstring const &phrase,
- zstring const &relationship, ft_int at_least, ft_int at_most );
+ zstring const &relationship, level_type at_least,
+ level_type at_most );
~iterator();
thesaurus_t const &thesaurus_;
- ft_int const at_least_, at_most_;
- ft_int level_;
+ level_type const at_least_, at_most_;
+ level_type level_;
typedef std::pair<synonym const*,iso2788::rel_dir> candidate_t;
typedef std::deque<candidate_t> candidate_queue_t;
@@ -155,6 +157,28 @@
///////////////////////////////////////////////////////////////////////////////
+/**
+ * A %xqftts::provider is a ThesaurusProvider for XQFTTS.
+ */
+class provider : public internal::ThesaurusProvider {
+public:
+ /**
+ * Constructs a %provider.
+ *
+ * @param path The absolute path of the thesaurus XML file.
+ */
+ provider( zstring const &path );
+
+ // inherited
+ bool getThesaurus( locale::iso639_1::type,
+ internal::Thesaurus::ptr* = nullptr ) const;
+
+private:
+ zstring const path_;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace xqftts
} // namespace zorba
=== modified file 'src/runtime/full_text/thesaurus.cpp'
--- src/runtime/full_text/thesaurus.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesaurus.cpp 2012-04-24 20:57:30 +0000
@@ -31,6 +31,7 @@
#include "thesaurus.h"
#ifdef ZORBA_WITH_FILE_ACCESS
# include "thesauri/wn_thesaurus.h"
+# include "zorbatypes/URI.h"
#endif
#include "thesauri/xqftts_thesaurus.h"
@@ -57,7 +58,8 @@
type const DEFAULT = wordnet;
/**
- * Given a thesaurus implementation name, finds its corresponding type.
+ * Given a thesaurus implementation name (as identified by the URI scheme),
+ * finds its corresponding type.
*
* @param name The thesaurus implementation's name.
* @return Returns the implementation's type or \c unknown.
@@ -66,7 +68,6 @@
typedef map<char const*,type> impl_map_t;
static impl_map_t impl_map;
if ( impl_map.empty() ) {
- impl_map[ "default" ] = DEFAULT;
impl_map[ "wordnet" ] = wordnet;
impl_map[ "xqftts" ] = xqftts;
}
@@ -78,28 +79,6 @@
///////////////////////////////////////////////////////////////////////////////
-/**
- * Parses a thesaurus mapping string. A mapping string is of the form:
- *
- * [implementation_name|]URI
- *
- * @param mapping The mapping to parse.
- * @param t A pointer to receive the implementation type.
- * @param uri A pointer to the string to receive the URI.
- */
-static void parse_mapping( zstring const &mapping, thesaurus_impl::type *t,
- zstring *uri ) {
- zstring impl_name;
- if ( zorba::ztd::split( mapping, '|', &impl_name, uri ) ) {
- *t = thesaurus_impl::find( impl_name );
- } else {
- *t = thesaurus_impl::DEFAULT;
- *uri = mapping;
- }
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
Thesaurus::iterator::~iterator() {
// out-of-line since it's virtual
}
@@ -112,36 +91,41 @@
Resource*
ThesaurusURLResolver::resolveURL( zstring const &url, EntityData const *data ) {
+ // Only resolve thesaurus URLs
if ( data->getKind() != internal::EntityData::THESAURUS )
return nullptr;
- ThesaurusEntityData const *const t_data =
- dynamic_cast<ThesaurusEntityData const*>( data );
- iso639_1::type const lang = t_data->getLanguage();
-
- thesaurus_impl::type t_impl;
- zstring mapped_url;
- parse_mapping( url, &t_impl, &mapped_url );
-
- zstring t_path;
- switch ( uri::get_scheme( mapped_url ) ) {
- case uri::file:
- case uri::none:
- t_path = fs::get_normalized_path( mapped_url );
- break;
- default:
- throw XQUERY_EXCEPTION(
- zerr::ZXQP0004_NOT_IMPLEMENTED,
- ERROR_PARAMS( ZED( NonFileThesaurusURI ) )
- );
- }
-
- switch ( t_impl ) {
+
+ zstring const url_copy(
+ url == "##default" ? "wordnet://wordnet.princeton.edu/": url
+ );
+
+ zstring scheme_name;
+ if ( !uri::get_scheme( url_copy, &scheme_name ) )
+ return nullptr;
+
+ switch ( thesaurus_impl::find( scheme_name ) ) {
+ case thesaurus_impl::xqftts: {
+ //
+ // Currently, we presume that an "xqftts:" URL should be used exactly
+ // like a "file:" URL.
+ //
+ zstring t_uri( url_copy );
+ t_uri.replace( 0, 6, "file" ); // xqftts -> file
+ zstring const t_path( fs::get_normalized_path( t_uri ) );
+ return new xqftts::provider( t_path );
+ }
# ifdef ZORBA_WITH_FILE_ACCESS
- case thesaurus_impl::wordnet:
- return new wordnet::thesaurus( t_path, lang );
+ case thesaurus_impl::wordnet: {
+ //
+ // Wordnet, on the other hand, needs to find its data file in Zorba's
+ // library path using the mangled form of the original URI. So, mangle
+ // here for convenience.
+ //
+ URI const t_uri( url_copy );
+ zstring const t_path( t_uri.toPathNotation() );
+ return new wordnet::provider( t_path );
+ }
# endif /* ZORBA_WITH_FILE_ACCESS */
- case thesaurus_impl::xqftts:
- return new xqftts::thesaurus( t_path, lang );
default:
throw XQUERY_EXCEPTION( err::FTST0018, ERROR_PARAMS( url ) );
}
=== modified file 'src/runtime/full_text/thesaurus.h'
--- src/runtime/full_text/thesaurus.h 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesaurus.h 2012-04-24 20:57:30 +0000
@@ -34,9 +34,14 @@
/**
* A %Thesaurus is the abstract base class for thesaurus implementations.
*/
-class Thesaurus : public internal::Resource {
+class Thesaurus {
public:
- typedef std::unique_ptr<Thesaurus,ztd::destroy_delete<Thesaurus> > ptr;
+ typedef ft_int level_type;
+
+ typedef std::unique_ptr<
+ Thesaurus const,ztd::destroy_delete<Thesaurus const>
+ >
+ ptr;
/**
* An %iterator is used to iterate over lookup results.
@@ -82,8 +87,9 @@
* the phrase was not found.
*/
virtual iterator::ptr lookup( zstring const &phrase,
- zstring const &relationship, ft_int at_least,
- ft_int at_most ) const = 0;
+ zstring const &relationship,
+ level_type at_least,
+ level_type at_most ) const = 0;
protected:
Thesaurus() { }
@@ -97,6 +103,26 @@
///////////////////////////////////////////////////////////////////////////////
+/**
+ * A %ThesaurusProvider is-a Resource for providing thesauri for a given
+ * language.
+ */
+class ThesaurusProvider : public internal::Resource {
+public:
+ /**
+ * Gets a Thesaurus for the given language.
+ *
+ * @param lang The desired language of the thesaurus.
+ * @param t If not \c null, set to point to a Thesaurus for \a lang.
+ * @return Returns \c true only if this provider can provide a thesaurus for
+ * \a lang.
+ */
+ virtual bool getThesaurus( locale::iso639_1::type lang,
+ Thesaurus::ptr *t = nullptr ) const = 0;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
} // namespace internal
} // namespace zorba
#endif /* ZORBA_THESAURUS_H */
=== modified file 'src/runtime/full_text/tokenizer.cpp'
--- src/runtime/full_text/tokenizer.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/tokenizer.cpp 2012-04-24 20:57:30 +0000
@@ -15,24 +15,98 @@
*/
#include "stdafx.h"
+#include <zorba/item.h>
+#include <zorba/iterator.h>
+#include <zorba/store_consts.h>
#include <zorba/tokenizer.h>
+#include <zorba/zorba_string.h>
+
+#include "diagnostics/assert.h"
+#include "store/api/store.h"
+#include "system/globalenv.h"
+#include "zorbamisc/ns_consts.h"
+#include "zorbautils/locale.h"
+
+using namespace zorba::locale;
namespace zorba {
///////////////////////////////////////////////////////////////////////////////
-Tokenizer::Tokenizer( Numbers &no, int trace_options ) :
- trace_options_( trace_options ),
- no_( &no )
-{
-}
-
Tokenizer::~Tokenizer() {
// out-of-line since it's virtual
}
-void Tokenizer::element( Item const&, int ) {
- // do nothing
+bool Tokenizer::find_lang_attribute( Item const &item, iso639_1::type *lang ) {
+ bool found_lang = false;
+ if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
+ Iterator_t i( item.getAttributes() );
+ i->open();
+ for ( Item attr; i->next( attr ); ) {
+ Item qname;
+ if ( attr.getNodeName( qname ) &&
+ qname.getLocalName() == "lang" && qname.getNamespace() == XML_NS ) {
+ *lang = locale::find_lang( attr.getStringValue().c_str() );
+ found_lang = true;
+ break;
+ }
+ }
+ i->close();
+ }
+ return found_lang;
+}
+
+void Tokenizer::item( Item const &item, bool entering ) {
+ if ( entering && item.isNode() &&
+ item.getNodeKind() == store::StoreConsts::elementNode ) {
+ ++numbers().para;
+ }
+}
+
+void Tokenizer::tokenize_node_impl( Item const &item, iso639_1::type lang,
+ Callback &callback, bool tokenize_acp ) {
+ if ( item.isNode() ) {
+ Iterator_t i;
+ Tokenizer *t_raw = this;
+ Tokenizer::ptr t_ptr;
+
+ this->item( item, true );
+ callback.item( item, true );
+
+ switch ( item.getNodeKind() ) {
+ case store::StoreConsts::elementNode:
+ if ( find_lang_attribute( item, &lang ) ) {
+ TokenizerProvider const *const p = GENV_STORE.getTokenizerProvider();
+ ZORBA_ASSERT( p );
+ if ( !p->getTokenizer( lang, numbers_, &t_ptr ) )
+ break;
+ t_raw = t_ptr.get();
+ }
+ // no break;
+
+ case store::StoreConsts::documentNode:
+ i = item.getChildren();
+ i->open();
+ for ( Item child; i->next( child ); )
+ t_raw->tokenize_node_impl( child, lang, callback, false );
+ i->close();
+ break;
+
+ case store::StoreConsts::attributeNode:
+ case store::StoreConsts::commentNode:
+ case store::StoreConsts::piNode:
+ if ( !tokenize_acp )
+ break;
+ case store::StoreConsts::textNode: {
+ String const s( item.getStringValue() );
+ tokenize_string( s.data(), s.size(), lang, false, callback, &item );
+ break;
+ }
+ } // switch
+
+ this->item( item, false );
+ callback.item( item, false );
+ }
}
Tokenizer::Numbers::Numbers() {
@@ -44,6 +118,10 @@
// out-of-line since it's virtual
}
+void Tokenizer::Callback::item( Item const&, bool ) {
+ // out-of-line since it's virtual
+}
+
///////////////////////////////////////////////////////////////////////////////
TokenizerProvider::~TokenizerProvider() {
=== modified file 'src/runtime/spec/codegen-cpp.xq'
--- src/runtime/spec/codegen-cpp.xq 2012-04-24 12:39:38 +0000
+++ src/runtime/spec/codegen-cpp.xq 2012-04-24 20:57:30 +0000
@@ -95,7 +95,8 @@
if (exists($iter/@preprocessorGuard))
then
- concat($gen:newline, "#endif")
+ concat($gen:newline, "#endif
+")
else
""
)
@@ -194,7 +195,7 @@
if (count($function/zorba:signature) = 0)
then
(: TODO user fn:error :)
- 'Error: could not find "prefix" and "localname" attributes for "zorba:function" element'
+ 'Error: could not find \"prefix\" and \"localname\" attributes for \"zorba:function\" element'
else
let $name := concat(local:function-name($function), $suffix)
let $ret := if($iter/@name = "") then "return NULL;"
@@ -275,7 +276,8 @@
($gen:newline,
if (exists($iter/@preprocessorGuard))
then
- concat($gen:newline, $iter/@preprocessorGuard)
+ concat($gen:newline, $iter/@preprocessorGuard, "
+")
else
"",
$gen:indent,
@@ -336,7 +338,13 @@
'),', $gen:newline, gen:indent(4),
'FunctionConsts::', gen:function-kind($sig) ,');',
$gen:newline, $gen:newline, $gen:indent,
- '}', $gen:newline, $gen:newline
+ '}', $gen:newline, $gen:newline,
+ if (exists($iter/@preprocessorGuard))
+ then
+ concat($gen:newline, "#endif
+")
+ else
+ ""
),
''),
'')
@@ -351,7 +359,7 @@
then
$tmp/@uri
else (: TODO user fn:error :)
- 'Error: could not find "prefix" and "localname" attributes for "zorba:function" element'
+ 'Error: could not find \"prefix\" and \"localname\" attributes for \"zorba:function\" element'
};
=== modified file 'src/runtime/spec/codegen-h.xq'
--- src/runtime/spec/codegen-h.xq 2012-04-24 12:39:38 +0000
+++ src/runtime/spec/codegen-h.xq 2012-04-24 20:57:30 +0000
@@ -146,7 +146,7 @@
if(count($function/zorba:signature) = 0)
then
(: TODO user fn:error :)
- 'Error: could not find "prefix" and "localname" attributes for "zorba:function" element'
+ 'Error: could not find \"prefix\" and \"localname\" attributes for \"zorba:function\" element'
else
local:create-function-XQuery-30($iter, $function)
(: local:create-function-arity($iter, $function, xs:integer(1)) :)
=== added directory 'src/runtime/spec/full_text'
=== added file 'src/runtime/spec/full_text/ft_module.xml'
--- src/runtime/spec/full_text/ft_module.xml 1970-01-01 00:00:00 +0000
+++ src/runtime/spec/full_text/ft_module.xml 2012-04-24 20:57:30 +0000
@@ -0,0 +1,247 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<zorba:iterators
+ xmlns:zorba="http://www.zorba-xquery.com"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.zorba-xquery.com ../runtime.xsd">
+
+<zorba:header>
+ <zorba:include form="Quoted">runtime/full_text/ft_token_seq_iterator.h</zorba:include>
+ <zorba:include form="Quoted">runtime/full_text/thesaurus.h</zorba:include>
+</zorba:header>
+
+<zorba:source>
+ <zorba:include form="Quoted">store/api/iterator.h</zorba:include>
+</zorba:source>
+
+<zorba:iterator name="CurrentLangIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="current-lang" prefix="full-text">
+ <zorba:output>xs:language</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="HostLangIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="host-lang" prefix="full-text">
+ <zorba:output>xs:language</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsStemLangSupportedIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="is-stem-lang-supported" prefix="full-text">
+ <zorba:param>xs:language</zorba:param>
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsStopWordIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="is-stop-word" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- word -->
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="is-stop-word" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- word -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsStopWordLangSupportedIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="is-stop-word-lang-supported" prefix="full-text">
+ <zorba:param>xs:language</zorba:param>
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsThesaurusLangSupportedIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="is-thesaurus-lang-supported" prefix="full-text">
+ <zorba:param>xs:language</zorba:param>
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="is-thesaurus-lang-supported" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- URI -->
+ <zorba:param>xs:language</zorba:param>
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsTokenizerLangSupportedIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="is-tokenizer-lang-supported" prefix="full-text">
+ <zorba:param>xs:language</zorba:param>
+ <zorba:output>xs:boolean</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="StemIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="stem" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- word -->
+ <zorba:output>xs:string</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="stem" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- word -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:output>xs:string</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="StripDiacriticsIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="strip-diacritics" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- phrase -->
+ <zorba:output>xs:string</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="ThesaurusLookupIterator"
+ generateResetImpl="true"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+ <zorba:function>
+ <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- phrase -->
+ <zorba:output>xs:string+</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- URI -->
+ <zorba:param>xs:string</zorba:param> <!-- phrase -->
+ <zorba:output>xs:string+</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- URI -->
+ <zorba:param>xs:string</zorba:param> <!-- phrase -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:output>xs:string+</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- URI -->
+ <zorba:param>xs:string</zorba:param> <!-- phrase -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:param>xs:string</zorba:param> <!-- relationship -->
+ <zorba:output>xs:string+</zorba:output>
+ </zorba:signature>
+ <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- URI -->
+ <zorba:param>xs:string</zorba:param> <!-- phrase -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:param>xs:string</zorba:param> <!-- relationship -->
+ <zorba:param>xs:integer</zorba:param> <!-- level-least -->
+ <zorba:param>xs:integer</zorba:param> <!-- level-most -->
+ <zorba:output>xs:string+</zorba:output>
+ </zorba:signature>
+ </zorba:function>
+ <zorba:state generateInit="use-default">
+ <zorba:member type="zstring" name="phrase_"/>
+ <zorba:member type="zstring" name="relationship_"/>
+ <zorba:member type="internal::Thesaurus::level_type" name="at_least_"/>
+ <zorba:member type="internal::Thesaurus::level_type" name="at_most_"/>
+ <zorba:member type="internal::Thesaurus::ptr" name="thesaurus_"/>
+ <zorba:member type="internal::Thesaurus::iterator::ptr" name="tresult_"/>
+ </zorba:state>
+</zorba:iterator>
+
+<zorba:iterator name="TokenizeIterator"
+ generateResetImpl="true"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+
+ <zorba:function generateCodegen="false" generateDECL="false">
+
+ <zorba:signature localname="tokenize" prefix="full-text">
+ <zorba:param>node()</zorba:param> <!-- doc -->
+ <zorba:output>node()*</zorba:output>
+ </zorba:signature>
+
+ <zorba:signature localname="tokenize" prefix="full-text">
+ <zorba:param>node()</zorba:param> <!-- doc -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:output>node()*</zorba:output>
+ </zorba:signature>
+
+ </zorba:function>
+
+ <zorba:state generateInit="use-default">
+ <zorba:member type="store::Item_t" name="doc_item_"/>
+ <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
+ <zorba:member type="store::Item_t" name="token_qname_"/>
+ </zorba:state>
+
+</zorba:iterator>
+
+<zorba:iterator name="TokenizerPropertiesIterator"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+
+ <zorba:function generateCodegen="false" generateDECL="false">
+
+ <zorba:signature localname="tokenizer-properties" prefix="full-text">
+ <zorba:output>node()</zorba:output>
+ </zorba:signature>
+
+ <zorba:signature localname="tokenizer-properties" prefix="full-text">
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:output>node()</zorba:output>
+ </zorba:signature>
+
+ <zorba:methods>
+ <!--
+ ! Mark the function as accessing the dyn ctx so that it won't be
+ ! const-folded. We must prevent const-folding because the function
+ ! returns a node that is validated with a schema that may not be
+ ! imported in the module where the function is invoked from.
+ -->
+ <zorba:accessesDynCtx returnValue="true"/>
+ </zorba:methods>
+
+ </zorba:function>
+
+</zorba:iterator>
+
+<zorba:iterator name="TokenizeStringIterator"
+ generateResetImpl="true"
+ preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+
+ <zorba:function>
+
+ <zorba:signature localname="tokenize-string" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- string -->
+ <zorba:output>xs:string*</zorba:output>
+ </zorba:signature>
+
+ <zorba:signature localname="tokenize-string" prefix="full-text">
+ <zorba:param>xs:string</zorba:param> <!-- string -->
+ <zorba:param>xs:language</zorba:param> <!-- lang -->
+ <zorba:output>xs:string*</zorba:output>
+ </zorba:signature>
+
+ </zorba:function>
+
+ <zorba:state generateInit="use-default">
+ <zorba:member type="FTTokenSeqIterator" name="string_tokens_"/>
+ </zorba:state>
+
+</zorba:iterator>
+
+</zorba:iterators>
+<!-- vim:set et sw=2 ts=2: -->
=== modified file 'src/runtime/spec/mappings.xml'
--- src/runtime/spec/mappings.xml 2012-04-24 12:39:38 +0000
+++ src/runtime/spec/mappings.xml 2012-04-24 20:57:30 +0000
@@ -82,6 +82,11 @@
define="ZORBA_STORE_DYNAMIC_UNORDERED_MAP_FN_NS"
prefix="zorba-store-data-structure-unordered-map"/>
+ <zorba:namespace
+ uri="http://www.zorba-xquery.com/modules/full-text"
+ define="ZORBA_FULL_TEXT_FN_NS"
+ prefix="full-text"/>
+
<zorba:namespace uri="http://www.zorba-xquery.com/modules/xqdoc"
define="ZORBA_XQDOC_FN_NS"
prefix="fn-zorba-xqdoc"/>
@@ -150,9 +155,9 @@
<zorba:type zorbaType="ANY_NODE">node()</zorba:type>
<zorba:type zorbaType="ELEMENT">element()</zorba:type>
-
<zorba:type zorbaType="ANY_ATOMIC">xs:anyAtomicType</zorba:type>
<zorba:type zorbaType="UNTYPED_ATOMIC">xs:untypedAtomic</zorba:type>
+
<zorba:type zorbaType="STRING">xs:string</zorba:type>
<zorba:type zorbaType="NORMALIZED_STRING">xs:normalizedString</zorba:type>
<zorba:type zorbaType="TOKEN">xs:token</zorba:type>
@@ -160,21 +165,25 @@
<zorba:type zorbaType="NMTOKEN">xs:NMTOKEN</zorba:type>
<zorba:type zorbaType="NAME">xs:Name</zorba:type>
<zorba:type zorbaType="NCNAME">xs:NCName</zorba:type>
+
<zorba:type zorbaType="ID">xs:ID</zorba:type>
<zorba:type zorbaType="IDREF">xs:IDREF</zorba:type>
+
<zorba:type zorbaType="ENTITY">xs:ENTITY</zorba:type>
+
<zorba:type zorbaType="DATETIME">xs:dateTime</zorba:type>
<zorba:type zorbaType="DATE">xs:date</zorba:type>
<zorba:type zorbaType="TIME">xs:time</zorba:type>
<zorba:type zorbaType="DURATION">xs:duration</zorba:type>
<zorba:type zorbaType="DT_DURATION">xs:dayTimeDuration</zorba:type>
<zorba:type zorbaType="YM_DURATION">xs:yearMonthDuration</zorba:type>
+
<zorba:type zorbaType="FLOAT">xs:float</zorba:type>
<zorba:type zorbaType="DOUBLE">xs:double</zorba:type>
<zorba:type zorbaType="DECIMAL">xs:decimal</zorba:type>
<zorba:type zorbaType="INTEGER">xs:integer</zorba:type>
<zorba:type zorbaType="NON_POSITIVE_INTEGER">xs:nonPositiveInteger</zorba:type>
- <zorba:type zorbaType="NEGATIVE_INTEGER">xs:nonNegativeInteger</zorba:type>
+ <zorba:type zorbaType="NEGATIVE_INTEGER">xs:negativeInteger</zorba:type>
<zorba:type zorbaType="LONG">xs:long</zorba:type>
<zorba:type zorbaType="INT">xs:int</zorba:type>
<zorba:type zorbaType="SHORT">xs:short</zorba:type>
@@ -185,14 +194,17 @@
<zorba:type zorbaType="UNSIGNED_SHORT">xs:unsignedShort</zorba:type>
<zorba:type zorbaType="UNSIGNED_BYTE">xs:unsignedByte</zorba:type>
<zorba:type zorbaType="POSITIVE_INTEGER">xs:positiveInteger</zorba:type>
+
<zorba:type zorbaType="GYEAR_MONTH">xs:gYearMonth</zorba:type>
<zorba:type zorbaType="GYEAR">xs:gYear</zorba:type>
<zorba:type zorbaType="GMONTH_DAY">xs:gMonthDay</zorba:type>
<zorba:type zorbaType="GDAY">xs:gDay</zorba:type>
<zorba:type zorbaType="GMONTH">xs:gMonth</zorba:type>
+
<zorba:type zorbaType="BOOLEAN">xs:boolean</zorba:type>
<zorba:type zorbaType="BASE64BINARY">xs:base64Binary</zorba:type>
<zorba:type zorbaType="HEXBINARY">xs:hexBinary</zorba:type>
+
<zorba:type zorbaType="ANY_URI">xs:anyURI</zorba:type>
<zorba:type zorbaType="QNAME">xs:QName</zorba:type>
<zorba:type zorbaType="NOTATION">xs:NOTATION</zorba:type>
=== modified file 'src/runtime/visitors/pregenerated/planiter_visitor.h'
--- src/runtime/visitors/pregenerated/planiter_visitor.h 2012-04-24 12:39:38 +0000
+++ src/runtime/visitors/pregenerated/planiter_visitor.h 2012-04-24 20:57:30 +0000
@@ -193,6 +193,45 @@
class FnPutIterator;
+#ifndef ZORBA_NO_FULL_TEXT
+ class CurrentLangIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class HostLangIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class IsStemLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class IsStopWordIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class IsStopWordLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class IsThesaurusLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class IsTokenizerLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class StemIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class StripDiacriticsIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class ThesaurusLookupIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class TokenizeIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class TokenizerPropertiesIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ class TokenizeStringIterator;
+#endif
class FunctionNameIterator;
class FunctionArityIterator;
@@ -862,6 +901,58 @@
virtual void beginVisit ( const FnPutIterator& ) = 0;
virtual void endVisit ( const FnPutIterator& ) = 0;
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const CurrentLangIterator& ) = 0;
+ virtual void endVisit ( const CurrentLangIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const HostLangIterator& ) = 0;
+ virtual void endVisit ( const HostLangIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const IsStemLangSupportedIterator& ) = 0;
+ virtual void endVisit ( const IsStemLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const IsStopWordIterator& ) = 0;
+ virtual void endVisit ( const IsStopWordIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const IsStopWordLangSupportedIterator& ) = 0;
+ virtual void endVisit ( const IsStopWordLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const IsThesaurusLangSupportedIterator& ) = 0;
+ virtual void endVisit ( const IsThesaurusLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const IsTokenizerLangSupportedIterator& ) = 0;
+ virtual void endVisit ( const IsTokenizerLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const StemIterator& ) = 0;
+ virtual void endVisit ( const StemIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const StripDiacriticsIterator& ) = 0;
+ virtual void endVisit ( const StripDiacriticsIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const ThesaurusLookupIterator& ) = 0;
+ virtual void endVisit ( const ThesaurusLookupIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const TokenizeIterator& ) = 0;
+ virtual void endVisit ( const TokenizeIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const TokenizerPropertiesIterator& ) = 0;
+ virtual void endVisit ( const TokenizerPropertiesIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+ virtual void beginVisit ( const TokenizeStringIterator& ) = 0;
+ virtual void endVisit ( const TokenizeStringIterator& ) = 0;
+#endif
virtual void beginVisit ( const FunctionNameIterator& ) = 0;
virtual void endVisit ( const FunctionNameIterator& ) = 0;
=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.cpp'
--- src/runtime/visitors/pregenerated/printer_visitor.cpp 2012-04-24 12:39:38 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.cpp 2012-04-24 20:57:30 +0000
@@ -47,6 +47,7 @@
#include "runtime/errors_and_diagnostics/other_diagnostics.h"
#include "runtime/fetch/fetch.h"
#include "runtime/fnput/fnput.h"
+#include "runtime/full_text/ft_module.h"
#include "runtime/function_item/function_item_iter.h"
#include "runtime/indexing/ic_ddl.h"
#include "runtime/introspection/sctx.h"
@@ -1245,6 +1246,201 @@
}
// </FnPutIterator>
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentLangIterator>
+void PrinterVisitor::beginVisit ( const CurrentLangIterator& a) {
+ thePrinter.startBeginVisit("CurrentLangIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const CurrentLangIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </CurrentLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <HostLangIterator>
+void PrinterVisitor::beginVisit ( const HostLangIterator& a) {
+ thePrinter.startBeginVisit("HostLangIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const HostLangIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </HostLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStemLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsStemLangSupportedIterator& a) {
+ thePrinter.startBeginVisit("IsStemLangSupportedIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsStemLangSupportedIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </IsStemLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordIterator>
+void PrinterVisitor::beginVisit ( const IsStopWordIterator& a) {
+ thePrinter.startBeginVisit("IsStopWordIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsStopWordIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </IsStopWordIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsStopWordLangSupportedIterator& a) {
+ thePrinter.startBeginVisit("IsStopWordLangSupportedIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsStopWordLangSupportedIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </IsStopWordLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsThesaurusLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsThesaurusLangSupportedIterator& a) {
+ thePrinter.startBeginVisit("IsThesaurusLangSupportedIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsThesaurusLangSupportedIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </IsThesaurusLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsTokenizerLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsTokenizerLangSupportedIterator& a) {
+ thePrinter.startBeginVisit("IsTokenizerLangSupportedIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsTokenizerLangSupportedIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </IsTokenizerLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StemIterator>
+void PrinterVisitor::beginVisit ( const StemIterator& a) {
+ thePrinter.startBeginVisit("StemIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const StemIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </StemIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StripDiacriticsIterator>
+void PrinterVisitor::beginVisit ( const StripDiacriticsIterator& a) {
+ thePrinter.startBeginVisit("StripDiacriticsIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const StripDiacriticsIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </StripDiacriticsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <ThesaurusLookupIterator>
+void PrinterVisitor::beginVisit ( const ThesaurusLookupIterator& a) {
+ thePrinter.startBeginVisit("ThesaurusLookupIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const ThesaurusLookupIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </ThesaurusLookupIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeIterator>
+void PrinterVisitor::beginVisit ( const TokenizeIterator& a) {
+ thePrinter.startBeginVisit("TokenizeIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizeIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </TokenizeIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizerPropertiesIterator>
+void PrinterVisitor::beginVisit ( const TokenizerPropertiesIterator& a) {
+ thePrinter.startBeginVisit("TokenizerPropertiesIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizerPropertiesIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </TokenizerPropertiesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeStringIterator>
+void PrinterVisitor::beginVisit ( const TokenizeStringIterator& a) {
+ thePrinter.startBeginVisit("TokenizeStringIterator", ++theId);
+ printCommons( &a, theId );
+ thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizeStringIterator& ) {
+ thePrinter.startEndVisit();
+ thePrinter.endEndVisit();
+}
+// </TokenizeStringIterator>
+
+#endif
// <FunctionNameIterator>
void PrinterVisitor::beginVisit ( const FunctionNameIterator& a) {
=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.h'
--- src/runtime/visitors/pregenerated/printer_visitor.h 2012-04-24 12:39:38 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.h 2012-04-24 20:57:30 +0000
@@ -292,6 +292,71 @@
void beginVisit( const FnPutIterator& );
void endVisit ( const FnPutIterator& );
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const CurrentLangIterator& );
+ void endVisit ( const CurrentLangIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const HostLangIterator& );
+ void endVisit ( const HostLangIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const IsStemLangSupportedIterator& );
+ void endVisit ( const IsStemLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const IsStopWordIterator& );
+ void endVisit ( const IsStopWordIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const IsStopWordLangSupportedIterator& );
+ void endVisit ( const IsStopWordLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const IsThesaurusLangSupportedIterator& );
+ void endVisit ( const IsThesaurusLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const IsTokenizerLangSupportedIterator& );
+ void endVisit ( const IsTokenizerLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const StemIterator& );
+ void endVisit ( const StemIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const StripDiacriticsIterator& );
+ void endVisit ( const StripDiacriticsIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const ThesaurusLookupIterator& );
+ void endVisit ( const ThesaurusLookupIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const TokenizeIterator& );
+ void endVisit ( const TokenizeIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const TokenizerPropertiesIterator& );
+ void endVisit ( const TokenizerPropertiesIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+ void beginVisit( const TokenizeStringIterator& );
+ void endVisit ( const TokenizeStringIterator& );
+#endif
+
void beginVisit( const FunctionNameIterator& );
void endVisit ( const FunctionNameIterator& );
=== modified file 'src/store/naive/atomic_items.cpp'
--- src/store/naive/atomic_items.cpp 2012-04-24 12:39:38 +0000
+++ src/store/naive/atomic_items.cpp 2012-04-24 20:57:30 +0000
@@ -1657,10 +1657,13 @@
{
typedef NaiveFTTokenIterator::container_type tokens_t;
unique_ptr<tokens_t> tokens( new tokens_t );
+ AtomicItemTokenizerCallback callback( *tokens );
- Tokenizer::ptr t( provider.getTokenizer( lang, numbers ) );
- AtomicItemTokenizerCallback cb( *t, lang, *tokens );
- cb.tokenize( theValue.data(), theValue.size(), wildcards );
+ Tokenizer::ptr tokenizer;
+ if ( provider.getTokenizer( lang, &numbers, &tokenizer ) )
+ tokenizer->tokenize_string(
+ theValue.data(), theValue.size(), lang, wildcards, callback
+ );
return FTTokenIterator_t( new NaiveFTTokenIterator( tokens.release() ) );
}
@@ -3588,25 +3591,22 @@
********************************************************************************/
AtomicItemTokenizerCallback::AtomicItemTokenizerCallback(
- Tokenizer &tokenizer,
- locale::iso639_1::type lang,
container_type &tokens
) :
- tokenizer_( tokenizer ),
- lang_( lang ),
tokens_( tokens )
{
}
-void AtomicItemTokenizerCallback::operator()(
+void AtomicItemTokenizerCallback::token(
char const *utf8_s,
size_type utf8_len,
+ iso639_1::type lang,
size_type token_no,
size_type sent_no,
size_type para_no,
- void*
+ Item const*
) {
- FTToken const t( utf8_s, utf8_len, token_no, lang_ );
+ FTToken const t( utf8_s, utf8_len, token_no, lang );
tokens_.push_back( t );
}
=== modified file 'src/store/naive/atomic_items.h'
--- src/store/naive/atomic_items.h 2012-04-24 12:39:38 +0000
+++ src/store/naive/atomic_items.h 2012-04-24 20:57:30 +0000
@@ -1461,7 +1461,7 @@
xs_integer getIntegerValue() const { return theValue; }
- xs_long getLongValue() const;
+ xs_long getLongValue() const;
xs_unsignedInt getUnsignedIntValue() const;
@@ -2603,28 +2603,15 @@
public:
typedef FTTokenStore::container_type container_type;
- AtomicItemTokenizerCallback(
- Tokenizer &tokenizer,
- locale::iso639_1::type lang,
- container_type &tokens );
-
- void operator()(
- char const *utf8_s,
- size_type utf8_len,
- size_type token_no,
- size_type sent_no,
- size_type para_no,
- void* = 0 );
-
- void tokenize( char const *utf8_s, size_t len, bool wildcards = false )
- {
- tokenizer_.tokenize( utf8_s, len, lang_, wildcards, *this );
- }
+ AtomicItemTokenizerCallback( container_type &tokens );
+
+ // inherited
+ void token( char const *utf8_s, size_type utf8_len, locale::iso639_1::type,
+ size_type token_no, size_type sent_no, size_type para_no,
+ Item const* );
private:
- Tokenizer & tokenizer_;
- locale::iso639_1::type const lang_;
- container_type & tokens_;
+ container_type &tokens_;
};
#endif /* ZORBA_NO_FULL_TEXT */
=== modified file 'src/store/naive/node_items.cpp'
--- src/store/naive/node_items.cpp 2012-04-24 12:39:38 +0000
+++ src/store/naive/node_items.cpp 2012-04-24 20:57:30 +0000
@@ -21,6 +21,7 @@
#include <zorba/config.h>
#include <zorba/item.h>
+#include "api/unmarshaller.h"
#include "diagnostics/assert.h"
#include "diagnostics/xquery_diagnostics.h"
#include "zorbatypes/URI.h"
@@ -4761,108 +4762,57 @@
******************************************************************************/
XmlNodeTokenizerCallback::XmlNodeTokenizerCallback(
- TokenizerProvider const &provider,
- Tokenizer::Numbers &numbers,
- iso639_1::type lang,
FTTokenStore &token_store
) :
- provider_( provider ),
- numbers_( numbers ),
token_store_( &token_store ),
tokens_( token_store.getDocumentTokens() )
{
- push_lang( lang );
}
XmlNodeTokenizerCallback::XmlNodeTokenizerCallback(
- TokenizerProvider const &provider,
- Tokenizer::Numbers &numbers,
- iso639_1::type lang,
container_type &tokens
) :
- provider_( provider ),
- numbers_( numbers ),
- token_store_( NULL ),
+ token_store_( nullptr ),
tokens_( tokens )
{
- push_lang( lang );
-}
-
-
-XmlNodeTokenizerCallback::~XmlNodeTokenizerCallback()
-{
- while ( !tokenizer_stack_.empty() )
- ztd::pop_stack( tokenizer_stack_ )->destroy();
-}
-
-
-inline XmlNodeTokenizerCallback::begin_type
-XmlNodeTokenizerCallback::beginTokenization() const
-{
- return token_store_->getDocumentTokens().size();
-}
-
-
-inline void XmlNodeTokenizerCallback::endTokenization(
- XmlNode const *node,
- XmlNodeTokenizerCallback::begin_type begin )
-{
- token_store_->putRange(node, begin, token_store_->getDocumentTokens().size());
-}
-
-
-void XmlNodeTokenizerCallback::pop_lang()
-{
- lang_stack_.pop();
- ztd::pop_stack( tokenizer_stack_ )->destroy();
-}
-
-
-void XmlNodeTokenizerCallback::push_lang( iso639_1::type lang )
-{
- lang_stack_.push( lang );
- Tokenizer::ptr t( provider_.getTokenizer( lang, numbers_ ) );
- ZORBA_ASSERT( t.get() );
- tokenizer_stack_.push( t.get() );
- t.release();
+}
+
+
+void XmlNodeTokenizerCallback::item( Item const &api_item, bool entering ) {
+ if ( token_store_ ) {
+ store::Item const *const item = Unmarshaller::getInternalItem( api_item );
+ if ( entering ) {
+ push_item( item );
+ range_stack_.push( token_store_->getDocumentTokens().size() );
+ } else {
+ pop_item();
+ token_store_->putRange(
+ item,
+ ztd::pop_stack( range_stack_ ),
+ token_store_->getDocumentTokens().size()
+ );
+ }
+ }
}
void XmlNodeTokenizerCallback::
-operator()( char const *utf8_s, size_type utf8_len, size_type pos,
- size_type sent, size_type para, void *payload )
+token( char const *utf8_s, size_type utf8_len, iso639_1::type lang,
+ size_type pos, size_type sent, size_type para, Item const *api_item )
{
- store::Item const *const item = static_cast<store::Item*>( payload );
- FTToken t( utf8_s, utf8_len, pos, sent, para, item, get_lang() );
+ store::Item const *const item = Unmarshaller::getInternalItem( *api_item );
+ FTToken t( utf8_s, utf8_len, pos, sent, para, item, lang );
tokens_.push_back( t );
}
-inline void XmlNodeTokenizerCallback::tokenize( char const *utf8_s,
- size_t len )
-{
- tokenizer().tokenize(
- utf8_s, len, get_lang(), false, *this,
- element_stack_.empty() ? NULL : static_cast<void*>( get_element() )
- );
-}
-
-
void XmlNode::tokenize( XmlNodeTokenizerCallback& )
{
// do nothing
}
-void AttributeNode::tokenize( XmlNodeTokenizerCallback &cb )
-{
- zstring text;
- getStringValue2( text );
- cb.tokenize( text.data(), text.size() );
-}
-
-
FTTokenIterator_t
AttributeNode::getTokens( TokenizerProvider const &provider,
Tokenizer::Numbers &numbers, iso639_1::type lang,
@@ -4875,62 +4825,21 @@
return FTTokenIterator_t(
new NaiveFTTokenIterator( *tokens, 0, tokens->size() )
);
+
FTTokenStore::container_type att_tokens;
- XmlNodeTokenizerCallback cb( provider, numbers, lang, att_tokens );
- const_cast<AttributeNode*>( this )->tokenize( cb );
- token_store.putAttr( this, att_tokens );
- }
-}
-
-
-void InternalNode::tokenize( XmlNodeTokenizerCallback& cb )
-{
- XmlNodeTokenizerCallback::begin_type const begin = cb.beginTokenization();
- for ( csize i = 0; i < numChildren(); ++i )
- getChild( i )->tokenize( cb );
- cb.endTokenization( this, begin );
-}
-
-
-void ElementNode::tokenize( XmlNodeTokenizerCallback& cb )
-{
- Tokenizer &tokenizer = cb.tokenizer();
-
- zorba::Item element_name;
- if ( tokenizer.trace_options() )
- element_name = getNodeName();
-
- if ( tokenizer.trace_options() & Tokenizer::trace_begin )
- tokenizer.element( element_name, Tokenizer::trace_begin );
- else if ( !tokenizer.trace_options() )
- ++tokenizer.numbers().para;
-
- //
- // See if this XML element has an xml:lang attribute: if so, switch to that
- // language.
- //
- bool pushed_lang = false;
- for ( ulong i = 0; i < numAttrs(); ++i ) {
- AttributeNode *const at = getAttr( i );
- Item const *const name = at->getNodeName();
- if ( name->getLocalName() == "lang" && name->getNamespace() == XML_NS ) {
- cb.push_lang( locale::find_lang( at->getStringValue().c_str() ) );
- pushed_lang = true;
- break;
+ XmlNodeTokenizerCallback callback( att_tokens );
+
+ zorba::Item const api_attr( this );
+ Tokenizer::ptr tokenizer;
+ if ( provider.getTokenizer( lang, &numbers, &tokenizer ) ) {
+ tokenizer->tokenize_node( api_attr, lang, callback );
+ token_store.putAttr( this, att_tokens );
}
}
-
- cb.push_element( this );
- InternalNode::tokenize( cb );
- cb.pop_element();
- if ( pushed_lang )
- cb.pop_lang();
-
- if ( tokenizer.trace_options() & Tokenizer::trace_end )
- tokenizer.element( element_name, Tokenizer::trace_end );
}
+#if 0
void TextNode::tokenize( XmlNodeTokenizerCallback &cb )
{
const zstring* text;
@@ -4986,6 +4895,7 @@
cb.tokenize( text->data(), text->size() );
cb.endTokenization( this, begin );
}
+#endif
FTTokenIterator_t
@@ -4998,8 +4908,11 @@
if ( tokens.empty() )
{
- XmlNodeTokenizerCallback cb( provider, numbers, lang, token_store );
- getRoot()->tokenize( cb );
+ zorba::Item const api_root( getRoot() );
+ XmlNodeTokenizerCallback callback( token_store );
+ Tokenizer::ptr tokenizer;
+ if ( provider.getTokenizer( lang, &numbers, &tokenizer ) )
+ tokenizer->tokenize_node( api_root, lang, callback );
}
FTTokenStore::range_type const &r = token_store.getRange( this );
=== modified file 'src/store/naive/node_items.h'
--- src/store/naive/node_items.h 2012-04-24 12:39:38 +0000
+++ src/store/naive/node_items.h 2012-04-24 20:57:30 +0000
@@ -884,10 +884,6 @@
const OrdPath* getFirstChildOrdPathAfter(csize pos) const;
const OrdPath* getFirstChildOrdPathBefore(csize pos) const;
-
-#ifndef ZORBA_NO_FULL_TEXT
- void tokenize( XmlNodeTokenizerCallback& );
-#endif /* ZORBA_NO_FULL_TEXT */
};
@@ -1147,10 +1143,6 @@
zstring& absUri,
zstring& relUri);
-#ifndef ZORBA_NO_FULL_TEXT
- void tokenize( XmlNodeTokenizerCallback& );
-#endif /* ZORBA_NO_FULL_TEXT */
-
private:
//disable default copy constructor
ElementNode(const ElementNode& src);
@@ -1264,7 +1256,8 @@
#ifndef ZORBA_NO_FULL_TEXT
FTTokenIterator_t getTokens( TokenizerProvider const&, Tokenizer::Numbers&,
- locale::iso639_1::type, bool = false ) const;
+ locale::iso639_1::type,
+ bool wildcards = false ) const;
#endif /* ZORBA_NO_FULL_TEXT */
protected:
@@ -1279,10 +1272,6 @@
{
return *reinterpret_cast<ItemVector*>(theTypedValue.getp());
}
-
-#ifndef ZORBA_NO_FULL_TEXT
- void tokenize( XmlNodeTokenizerCallback& );
-#endif
store::Iterator_t getChildren() const;
};
@@ -1441,10 +1430,6 @@
void setValue(store::Item_t& val) { theContent.setValue(val); }
void setValue(store::Item* val) { theContent.setValue(val); }
-
-#ifndef ZORBA_NO_FULL_TEXT
- void tokenize( XmlNodeTokenizerCallback& );
-#endif /* ZORBA_NO_FULL_TEXT */
store::Iterator_t getChildren() const;
};
@@ -1680,59 +1665,26 @@
{
public:
typedef FTTokenStore::container_type container_type;
- typedef FTTokenStore::size_type begin_type;
-
- XmlNodeTokenizerCallback( TokenizerProvider const &provider,
- Tokenizer::Numbers &numbers,
- locale::iso639_1::type lang,
- FTTokenStore &token_store );
-
- XmlNodeTokenizerCallback( TokenizerProvider const &provider,
- Tokenizer::Numbers &numbers,
- locale::iso639_1::type lang,
- container_type &tokens );
-
- ~XmlNodeTokenizerCallback();
-
- begin_type beginTokenization() const;
-
- void endTokenization( XmlNode const*, begin_type );
-
- void push_element( ElementNode *element ) { element_stack_.push( element ); }
-
- void pop_element() { element_stack_.pop(); }
-
- void push_lang( locale::iso639_1::type lang );
-
- void pop_lang();
-
- void tokenize( char const *utf8_s, size_t len );
-
- Tokenizer& tokenizer() const { return *tokenizer_stack_.top(); }
+
+ XmlNodeTokenizerCallback( FTTokenStore &token_store );
+ XmlNodeTokenizerCallback( container_type &tokens );
// inherited
- void operator()( char const *utf8_s, size_type utf8_len,
- size_type pos, size_type sent, size_type para, void* );
+ void item( Item const&, bool );
+ void token( char const *utf8_s, size_type utf8_len, locale::iso639_1::type,
+ size_type pos, size_type sent, size_type para, Item const* );
private:
- typedef std::stack<ElementNode*> element_stack_t;
- typedef std::stack<locale::iso639_1::type> lang_stack_t;
- typedef std::stack<Tokenizer*> tokenizer_stack_t;
-
- ElementNode* get_element() const {
- return element_stack_.top();
- }
-
- locale::iso639_1::type get_lang() const {
- return lang_stack_.top();
- }
-
- TokenizerProvider const &provider_;
- Tokenizer::Numbers &numbers_;
+ typedef std::stack<store::Item const*> item_stack_t;
+ typedef std::stack<FTTokenStore::size_type> range_begin_stack_t;
+
+ store::Item const* get_item() const { return item_stack_.top(); }
+ void push_item( store::Item const *item ) { item_stack_.push( item ); }
+ void pop_item() { item_stack_.pop(); }
+
FTTokenStore *token_store_;
container_type &tokens_;
- element_stack_t element_stack_;
- lang_stack_t lang_stack_;
- tokenizer_stack_t tokenizer_stack_;
+ item_stack_t item_stack_;
+ range_begin_stack_t range_stack_;
};
#endif /* ZORBA_NO_FULL_TEXT */
=== modified file 'src/unit_tests/stemmer.cpp'
--- src/unit_tests/stemmer.cpp 2012-04-24 12:39:38 +0000
+++ src/unit_tests/stemmer.cpp 2012-04-24 20:57:30 +0000
@@ -37,6 +37,7 @@
public:
// inherited
void destroy() const;
+ void properties( Properties* ) const;
void stem( String const &word, iso639_1::type lang, String *result ) const;
};
@@ -44,6 +45,10 @@
destroy_called = true;
}
+void TestStemmer::properties( Properties *p ) const {
+ p->uri = "http://www.zorba-xquery.com/full-text/unit-tests/stemmer";
+}
+
void TestStemmer::stem( String const &word, iso639_1::type lang,
String *result ) const {
if ( word == "foobar" )
@@ -56,21 +61,20 @@
class TestStemmerProvider : public StemmerProvider {
public:
- Stemmer::ptr getStemmer( iso639_1::type lang ) const;
+ bool getStemmer( iso639_1::type lang, Stemmer::ptr* = 0 ) const;
};
-Stemmer::ptr TestStemmerProvider::getStemmer( iso639_1::type lang ) const {
+bool TestStemmerProvider::getStemmer( iso639_1::type lang,
+ Stemmer::ptr *result ) const {
static TestStemmer stemmer;
- Stemmer::ptr result;
switch ( lang ) {
case iso639_1::en:
case iso639_1::unknown:
- result.reset( &stemmer );
- break;
+ result->reset( &stemmer );
+ return true;
default:
- break;
+ return false;
}
- return std::move( result );
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/unit_tests/string.cpp'
--- src/unit_tests/string.cpp 2012-04-24 12:39:38 +0000
+++ src/unit_tests/string.cpp 2012-04-24 20:57:30 +0000
@@ -542,6 +542,19 @@
}
template<class StringType>
+static void test_strip_diacritics() {
+ StringType result;
+
+ StringType const s1( "x " utf8_aeiou_acute " x" );
+ utf8::strip_diacritics( s1, &result );
+ ASSERT_TRUE( result == "x aeiou x" );
+
+ StringType const s2( "x " utf8_AEIOU_acute " x" );
+ utf8::strip_diacritics( s2, &result );
+ ASSERT_TRUE( result == "x AEIOU x" );
+}
+
+template<class StringType>
static void test_to_codepoints( char const *s ) {
StringType const s1( s );
@@ -866,6 +879,9 @@
test_split<zstring>( "a", "" );
test_split<String>( "a", "" );
+ test_strip_diacritics<string>();
+ test_strip_diacritics<zstring>();
+
test_to_codepoints<string>( "hello" );
test_to_codepoints<string>( utf8_aeiou_acute );
test_to_codepoints<zstring>( "hello" );
=== modified file 'src/unit_tests/thesaurus.cpp'
--- src/unit_tests/thesaurus.cpp 2012-04-24 12:39:38 +0000
+++ src/unit_tests/thesaurus.cpp 2012-04-24 20:57:30 +0000
@@ -42,47 +42,48 @@
iterator::ptr lookup( String const &phrase, String const &relationship,
range_type at_least, range_type at_most ) const;
private:
- typedef std::list<String> synonyms_t;
- typedef std::map<String,synonyms_t const*> thesaurus_t;
+ typedef std::list<String> synonyms_type;
+ typedef std::map<String,synonyms_type const*> thesaurus_data_type;
- static thesaurus_t const& get_thesaurus();
+ static thesaurus_data_type const& get_thesaurus_data();
class iterator : public Thesaurus::iterator {
public:
- iterator( synonyms_t const &s ) : synonyms_( s ), i_( s.begin() ) { }
+ iterator( synonyms_type const &s ) : synonyms_( s ), i_( s.begin() ) { }
void destroy() const;
bool next( String *synonym );
private:
- synonyms_t const &synonyms_;
- synonyms_t::const_iterator i_;
+ synonyms_type const &synonyms_;
+ synonyms_type::const_iterator i_;
};
};
-TestThesaurus::thesaurus_t const& TestThesaurus::get_thesaurus() {
- static thesaurus_t thesaurus;
- if ( thesaurus.empty() ) {
- static synonyms_t synonyms;
+void TestThesaurus::destroy() const {
+ destroy_called = true;
+}
+
+TestThesaurus::thesaurus_data_type const& TestThesaurus::get_thesaurus_data() {
+ static thesaurus_data_type thesaurus_data;
+ if ( thesaurus_data.empty() ) {
+ static synonyms_type synonyms;
synonyms.push_back( "foo" );
synonyms.push_back( "foobar" );
- thesaurus[ "foo" ] = &synonyms;
- thesaurus[ "foobar" ] = &synonyms;
+ thesaurus_data[ "foo" ] = &synonyms;
+ thesaurus_data[ "foobar" ] = &synonyms;
}
- return thesaurus;
-}
-
-void TestThesaurus::destroy() const {
- destroy_called = true;
+ return thesaurus_data;
}
Thesaurus::iterator::ptr
TestThesaurus::lookup( String const &phrase, String const &relationship,
range_type at_least, range_type at_most ) const {
- static thesaurus_t const &thesaurus = get_thesaurus();
- thesaurus_t::const_iterator const i = thesaurus.find( phrase );
- Thesaurus::iterator::ptr result;
- if ( i != thesaurus.end() )
- result.reset( new iterator( *i->second ) );
+ static thesaurus_data_type const &thesaurus_data = get_thesaurus_data();
+ thesaurus_data_type::const_iterator const entry =
+ thesaurus_data.find( phrase );
+ iterator::ptr result;
+ if ( entry != thesaurus_data.end() )
+ result.reset( new iterator( *entry->second ) );
return std::move( result );
}
@@ -101,6 +102,28 @@
///////////////////////////////////////////////////////////////////////////////
+class TestThesaurusProvider : public ThesaurusProvider {
+public:
+ bool getThesaurus( iso639_1::type lang, Thesaurus::ptr* = 0 ) const;
+
+ // inherited
+ void destroy() const;
+};
+
+void TestThesaurusProvider::destroy() const {
+ // do nothing
+}
+
+bool TestThesaurusProvider::getThesaurus( iso639_1::type lang,
+ Thesaurus::ptr *result ) const {
+ static TestThesaurus thesaurus;
+ if ( result )
+ result->reset( &thesaurus );
+ return true;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
class TestThesaurusResolver : public URLResolver {
public:
TestThesaurusResolver( String const &uri ) : uri_( uri ) { }
@@ -112,9 +135,13 @@
};
Resource*
-TestThesaurusResolver::resolveURL( String const &uri, EntityData const *ed ) {
- static TestThesaurus thesaurus;
- return uri == uri_ ? &thesaurus : 0;
+TestThesaurusResolver::resolveURL( String const &uri, EntityData const *data ) {
+ if ( data->getKind() == EntityData::THESAURUS ) {
+ static TestThesaurusProvider provider;
+ if ( uri == uri_ )
+ return &provider;
+ }
+ return 0;
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/unit_tests/tokenizer.cpp'
--- src/unit_tests/tokenizer.cpp 2012-04-24 12:39:38 +0000
+++ src/unit_tests/tokenizer.cpp 2012-04-24 20:57:30 +0000
@@ -24,6 +24,7 @@
#include <iostream>
#include <zorba/diagnostic_list.h>
+#include <zorba/store_consts.h>
#include <zorba/store_manager.h>
#include <zorba/tokenizer.h>
#include <zorba/user_exception.h>
@@ -59,14 +60,18 @@
class TestTokenizer : public Tokenizer {
public:
- TestTokenizer( Numbers &num ) : Tokenizer( num, trace_begin ) { }
+ TestTokenizer( Numbers &num ) : Tokenizer( num ) { }
~TestTokenizer();
// inherited
void destroy() const;
- void element( Item const&, int );
- void tokenize( char const*, size_type, iso639_1::type, bool, Callback&,
- void* );
+ void properties( Properties* ) const;
+ void tokenize_string( char const*, size_type, iso639_1::type, bool,
+ Callback&, Item const* );
+
+protected:
+ // inherited
+ void item( Item const&, bool );
private:
typedef std::string token_t;
@@ -83,7 +88,8 @@
static bool is_word_begin_char( char );
bool is_word_char( char );
static char peek( char const *s, char const *end );
- bool send_token( token_t const &token, Callback&, void* );
+ bool send_token( token_t const &token, iso639_1::type, Callback&,
+ Item const* );
};
TestTokenizer::~TestTokenizer() {
@@ -95,10 +101,7 @@
delete this;
}
-void TestTokenizer::element( Item const &qname, int trace_options ) {
- if ( trace_options & trace_end )
- return;
-
+void TestTokenizer::item( Item const &item, bool entering ) {
static char const *const block_elements[] = {
"address",
"blockquote",
@@ -116,10 +119,14 @@
static char const *const *const end =
block_elements + sizeof( block_elements ) / sizeof( char* );
- String const name( qname.getLocalName() );
- if ( ::binary_search( block_elements, end, name.c_str(),
- less<char const*>() ) ) {
- ++numbers().para;
+ if ( entering && item.isNode() &&
+ item.getNodeKind() == store::StoreConsts::elementNode ) {
+ Item qname;
+ item.getNodeName( qname );
+ if ( ::binary_search( block_elements, end, qname.getLocalName().c_str(),
+ less<char const*>() ) ) {
+ ++numbers().para;
+ }
}
}
@@ -170,15 +177,24 @@
return ++s < end ? *s : '\0';
}
+void TestTokenizer::properties( Properties *p ) const {
+ p->comments_separate_tokens = true;
+ p->elements_separate_tokens = true;
+ p->processing_instructions_separate_tokens = true;
+ p->languages.clear();
+ p->languages.push_back( iso639_1::en );
+ p->uri = "http://www.zorba-xquery.com/full-text/tokenizer/unit-test";
+}
+
#define HANDLE_BACKSLASH() \
if ( !got_backslash ) ; else { \
got_backslash = in_wild = false; \
break; \
}
-void TestTokenizer::tokenize( char const *s, size_type s_len,
- iso639_1::type lang, bool wildcards,
- Callback &callback, void *payload ) {
+void TestTokenizer::tokenize_string( char const *s, size_type s_len,
+ iso639_1::type lang, bool wildcards,
+ Callback &callback, Item const *item ) {
bool got_backslash = false;
bool in_wild = false;
token_t token;
@@ -247,7 +263,7 @@
} else {
if ( is_word_char( *s ) )
token += *s;
- else if ( send_token( token, callback, payload ) ) {
+ else if ( send_token( token, lang, callback, item ) ) {
token.clear();
t_type_ = t_generic;
}
@@ -279,7 +295,7 @@
}
} // for
- send_token( token, callback, payload );
+ send_token( token, lang, callback, item );
}
static char const *const tokens[] = {
@@ -304,8 +320,8 @@
#define PRINT_TOKENS 0
-bool TestTokenizer::send_token( token_t const &token, Callback &callback,
- void *payload ) {
+bool TestTokenizer::send_token( token_t const &token, iso639_1::type lang,
+ Callback &callback, Item const *item ) {
if ( !token.empty() ) {
#if PRINT_TOKENS
cout << "t=" << setw(2) << numbers().token
@@ -316,9 +332,9 @@
check_token( token.c_str(), numbers().token );
- callback(
- token.data(), token.size(),
- numbers().token, numbers().sent, numbers().para, payload
+ callback.token(
+ token.data(), token.size(), lang,
+ numbers().token, numbers().sent, numbers().para, item
);
++numbers().token;
return true;
@@ -331,13 +347,16 @@
class TestTokenizerProvider : public TokenizerProvider {
public:
// inherited
- Tokenizer::ptr getTokenizer( iso639_1::type, Tokenizer::Numbers& ) const;
+ bool getTokenizer( iso639_1::type, Tokenizer::Numbers* = 0,
+ Tokenizer::ptr* = 0 ) const;
};
-Tokenizer::ptr
-TestTokenizerProvider::getTokenizer( iso639_1::type lang,
- Tokenizer::Numbers &num ) const {
- return Tokenizer::ptr( new TestTokenizer( num ) );
+bool TestTokenizerProvider::getTokenizer( iso639_1::type lang,
+ Tokenizer::Numbers *num,
+ Tokenizer::ptr *t ) const {
+ if ( num && t )
+ t->reset( new TestTokenizer( *num ) );
+ return true;
}
///////////////////////////////////////////////////////////////////////////////
=== modified file 'src/util/fs_util.h'
--- src/util/fs_util.h 2012-04-24 12:39:38 +0000
+++ src/util/fs_util.h 2012-04-24 20:57:30 +0000
@@ -503,6 +503,7 @@
* @param path The path to normalize.
* @param base The base path, if any.
* @return Returns the normalized path.
+ * @throws XQueryException err::XPTY0004 for malformed paths.
*/
zstring get_normalized_path( char const *path, char const *base = nullptr );
@@ -513,6 +514,7 @@
* @param path The path to normalize.
* @param base The base path, if any.
* @return Returns the normalized path.
+ * @throws XQueryException err::XPTY0004 for malformed paths.
*/
template<class PathStringType> inline
zstring get_normalized_path( PathStringType const &path,
@@ -527,6 +529,7 @@
* @param path The path to normalize.
* @param base The base path, if any.
* @return Returns the normalized path.
+ * @throws XQueryException err::XPTY0004 for malformed paths.
*/
template<class PathStringType> inline
void normalize_path( PathStringType &path, PathStringType const &base = "" ) {
=== modified file 'src/util/unicode_util.cpp'
--- src/util/unicode_util.cpp 2012-04-24 12:39:38 +0000
+++ src/util/unicode_util.cpp 2012-04-24 20:57:30 +0000
@@ -24,6 +24,7 @@
#ifndef ZORBA_NO_ICU
# include <unicode/normlzr.h>
+# include <unicode/uchar.h>
# include <unicode/ustring.h>
#endif /* ZORBA_NO_ICU */
@@ -2228,6 +2229,19 @@
return U_SUCCESS( status ) == TRUE;
}
+bool strip_diacritics( string const &in, string *out ) {
+ string in_normalized;
+ if ( !normalize( in, normalization::NFKD, &in_normalized ) )
+ return false;
+ out->truncate( 0 );
+ for ( size_type len = in_normalized.length(), i = 0; i < len; ++i ) {
+ UChar32 const uc32 = in_normalized.char32At( i );
+ if ( u_charType( uc32 ) != U_NON_SPACING_MARK )
+ out->append( uc32 );
+ }
+ return true;
+}
+
bool to_char( char const *in, char_type *out ) {
UErrorCode status = U_ZERO_ERROR;
u_strFromUTF8WithSub(
=== modified file 'src/util/unicode_util.h'
--- src/util/unicode_util.h 2012-04-24 12:39:38 +0000
+++ src/util/unicode_util.h 2012-04-24 20:57:30 +0000
@@ -189,6 +189,7 @@
////////// normalization //////////////////////////////////////////////////////
#ifndef ZORBA_NO_ICU
+
/**
* Normalizes the given string.
*
@@ -197,6 +198,17 @@
* @return Returns \c true only if the normalization succeeded.
*/
bool normalize( string const &in, normalization::type n, string *out );
+
+/**
+ * Strips all diacritical marks from all characters converting them to their
+ * closest non-diacritical equivalents.
+ *
+ * @param in The input string.
+ * @param out The output string.
+ * @return Returns \c true only if the strip succeeded.
+ */
+bool strip_diacritics( string const &in, string *out );
+
#endif /* ZORBA_NO_ICU */
////////// string conversion //////////////////////////////////////////////////
=== modified file 'src/util/uri_util.h'
--- src/util/uri_util.h 2012-04-24 12:39:38 +0000
+++ src/util/uri_util.h 2012-04-24 20:57:30 +0000
@@ -54,8 +54,8 @@
* @param uri The URI to get the scheme of.
* @param colon If not \c nullptr, this pointer is set to the position of the
* ':' (if any) that follows the scheme name.
- * @return Returns the URI's scheme, or scheme::none if none, or
- * scheme::unknown if unknown.
+ * @return Returns the URI's scheme (if known), scheme::unknown (if unknown),
+ * or scheme::none (if none).
*/
scheme get_scheme( char const *uri, char const **colon = nullptr );
@@ -64,10 +64,10 @@
*
* @tparam StringType The string type.
* @param uri The URI to get the scheme of.
- * @param sname If not \c nullptr and the scheme is known, this is set to the
- * scheme's name.
- * @return Returns the URI's scheme, or scheme::none if none, or
- * scheme::unknown if unknown.
+ * @param sname If not \c nullptr and the scheme is not \c none, this is set to
+ * the scheme's name.
+ * @return Returns the URI's scheme (if known), scheme::unknown (if unknown),
+ * or scheme::none (if none).
*/
template<class StringType> inline
scheme get_scheme( StringType const &uri, StringType *sname = nullptr ) {
=== modified file 'src/util/utf8_util.h'
--- src/util/utf8_util.h 2012-04-24 12:39:38 +0000
+++ src/util/utf8_util.h 2012-04-24 20:57:30 +0000
@@ -759,9 +759,10 @@
* @tparam OutputStringType The output string type.
* @param in The input string.
* @param out The output string.
+ * @return Returns \c true only if the strip succeeded.
*/
template<class InputStringType,class OutputStringType>
-void strip_diacritics( InputStringType const &in, OutputStringType *out );
+bool strip_diacritics( InputStringType const &in, OutputStringType *out );
/**
*
=== modified file 'src/util/utf8_util.tcc'
--- src/util/utf8_util.tcc 2012-04-24 12:39:38 +0000
+++ src/util/utf8_util.tcc 2012-04-24 20:57:30 +0000
@@ -123,19 +123,26 @@
#endif /* ZORBA_NO_ICU */
template<class InputStringType,class OutputStringType>
-void strip_diacritics( InputStringType const &in, OutputStringType *out ) {
- InputStringType in_normalized;
+bool strip_diacritics( InputStringType const &in, OutputStringType *out ) {
#ifndef ZORBA_NO_ICU
- normalize( in, unicode::normalization::NFKD, &in_normalized );
+ unicode::string u_in;
+ if ( !unicode::to_string( in, &u_in ) )
+ return false;
+ unicode::string u_out;
+ unicode::strip_diacritics( u_in, &u_out );
+ storage_type *temp;
+ size_type temp_len;
+ if ( !utf8::to_string( u_out.getBuffer(), u_out.length(), &temp, &temp_len ) )
+ return false;
+ out->assign( temp, temp_len );
+ if ( !string_traits<OutputStringType>::takes_pointer_ownership )
+ delete[] temp;
#else
- in_normalized = in.c_str();
-#endif /* ZORBA_NO_ICU */
out->clear();
- out->reserve( in_normalized.size() );
- std::copy(
- in_normalized.begin(), in_normalized.end(),
- ascii::back_ascii_inserter( *out )
- );
+ out->reserve( in.size() );
+ std::copy( in.begin(), in.end(), ascii::back_ascii_inserter( *out ) );
+#endif /* ZORBA_NO_ICU */
+ return true;
}
#ifndef ZORBA_NO_ICU
=== modified file 'src/zorbatypes/ft_token.cpp'
--- src/zorbatypes/ft_token.cpp 2012-04-24 12:39:38 +0000
+++ src/zorbatypes/ft_token.cpp 2012-04-24 20:57:30 +0000
@@ -175,7 +175,7 @@
///////////////////////////////////////////////////////////////////////////////
std::ostream& operator<<( ostream &o, FTToken const &t ) {
- return o << "[FTToken: \"" << t.value() << "\" ("
+ return o << "[\"" << t.value() << "\" ("
<< iso639_1::string_of[ t.lang() ] << ") "
<< t.pos() << ',' << t.sent() << ',' << t.para() << ']';
}
=== modified file 'src/zorbatypes/ft_token.h'
--- src/zorbatypes/ft_token.h 2012-04-24 12:39:38 +0000
+++ src/zorbatypes/ft_token.h 2012-04-24 20:57:30 +0000
@@ -286,7 +286,7 @@
*/
mutable mod_values_t *mod_values_;
- inline bool is_query_token() const {
+ bool is_query_token() const {
return sent_ == QueryTokenMagicValue;
}
=== modified file 'src/zorbatypes/numconversions.cpp'
--- src/zorbatypes/numconversions.cpp 2012-04-24 12:39:38 +0000
+++ src/zorbatypes/numconversions.cpp 2012-04-24 20:57:30 +0000
@@ -15,6 +15,8 @@
*/
#include "stdafx.h"
+#include <stdexcept>
+
#include "common/common.h"
#include "util/string_util.h"
#include "zorbatypes/numconversions.h"
@@ -23,6 +25,9 @@
///////////////////////////////////////////////////////////////////////////////
+#define RANGE_ERROR(N,TYPE) \
+ std::range_error( BUILD_STRING( '"', (N), "\": number can not be represented as an " TYPE ) )
+
xs_int to_xs_int( xs_double const &d ) {
zstring const temp( d.toIntegerString() );
return ztd::aton<xs_int>( temp.c_str() );
@@ -33,7 +38,9 @@
zstring const temp( i.toString() );
return ztd::aton<xs_int>( temp.c_str() );
#else
- return static_cast<xs_int>( i.value_ );
+ if ( i.is_xs_int() )
+ return static_cast<xs_int>( i.value_ );
+ throw RANGE_ERROR( i, "xs:int" );
#endif /* ZORBA_WITH_BIG_INTEGER */
}
@@ -42,9 +49,7 @@
zstring const temp( d.toString() );
return ztd::aton<xs_long>( temp.c_str() );
}
- throw std::range_error(
- BUILD_STRING( '"', d, "\": number can not be represented as an xs:long" )
- );
+ throw RANGE_ERROR( d, "xs:long" );
}
xs_long to_xs_long( xs_integer const &i ) {
@@ -52,7 +57,9 @@
zstring const temp( i.toString() );
return ztd::aton<xs_long>( temp.c_str() );
#else
- return static_cast<xs_long>( i.value_ );
+ if ( i.is_xs_long() )
+ return static_cast<xs_long>( i.value_ );
+ throw RANGE_ERROR( i, "xs:long" );
#endif /* ZORBA_WITH_BIG_INTEGER */
}
@@ -68,7 +75,9 @@
zstring const temp( i.toString() );
return ztd::aton<xs_unsignedInt>( temp.c_str() );
#else
- return static_cast<xs_unsignedInt>( i.value_ );
+ if ( i.is_xs_unsignedInt() )
+ return static_cast<xs_unsignedInt>( i.value_ );
+ throw RANGE_ERROR( i, "xs:unsignedInt" );
#endif /* ZORBA_WITH_BIG_INTEGER */
}
@@ -77,7 +86,9 @@
zstring const temp( i.toString() );
return ztd::aton<xs_unsignedLong>( temp.c_str() );
#else
- return static_cast<xs_unsignedLong>( i.value_ );
+ if ( i.is_xs_unsignedLong() )
+ return static_cast<xs_unsignedLong>( i.value_ );
+ throw RANGE_ERROR( i, "xs:unsignedLong" );
#endif /* ZORBA_WITH_BIG_INTEGER */
}
=== modified file 'src/zorbautils/locale.cpp'
--- src/zorbautils/locale.cpp 2012-04-24 12:39:38 +0000
+++ src/zorbautils/locale.cpp 2012-04-24 20:57:30 +0000
@@ -36,10 +36,10 @@
#define DEF_END(CHAR_ARRAY) \
static char const *const *const end = \
- CHAR_ARRAY + sizeof( CHAR_ARRAY ) / sizeof( char* );
+ CHAR_ARRAY + sizeof( CHAR_ARRAY ) / sizeof( char* )
-#define FIND(what) \
- static_cast<type>( find_index( string_of, end, what ) )
+#define FIND(WHAT) \
+ static_cast<type>( find_index( string_of, end, WHAT ) )
using namespace std;
@@ -70,10 +70,10 @@
static char* get_win32_locale_info( int constant ) {
int bytes = ::GetLocaleInfoA( LOCALE_USER_DEFAULT, constant, NULL, 0 );
ZORBA_FATAL( bytes, "GetLocaleInfoA() failed" );
- char *const info = new char[ bytes ];
- bytes = ::GetLocaleInfoA( LOCALE_USER_DEFAULT, constant, info, bytes );
+ unique_ptr<char[]> info = new char[ bytes ];
+ bytes = ::GetLocaleInfoA( LOCALE_USER_DEFAULT, constant, info.get(), bytes );
ZORBA_FATAL( bytes, "GetLocaleInfoA() failed" );
- return info;
+ return info.release();
}
#else /* WIN32 */
@@ -379,21 +379,192 @@
char const *const string_of[] = {
"#UNKNOWN", // starts with '#' for sorting
+ "aa", // Afar
+ "ab", // Abkhazian
+ "ae", // Avestan
+ "af", // Afrikaans
+ "ak", // Akan
+ "am", // Amharic
+ "an", // Aragonese
+ "ar", // Arabic
+ "as", // Assamese
+ "av", // Avaric
+ "ay", // Aymara
+ "az", // Azerbaijani
+ "ba", // Bashkir
+ "be", // Byelorussian
+ "bg", // Bulgarian
+ "bh", // Bihari
+ "bi", // Bislama
+ "bm", // Bambara
+ "bn", // Bengali; Bangla
+ "bo", // Tibetan
+ "br", // Breton
+ "bs", // Bosnian
+ "ca", // Catalan
+ "ce", // Chechen
+ "ch", // Chamorro
+ "co", // Corsican
+ "cr", // Cree
+ "cs", // Czech
+ "cu", // Church Slavic; Church Slavonic
+ "cv", // Chuvash
+ "cy", // Welsh
"da", // Danish
"de", // German
+ "dv", // Divehi
+ "dz", // Bhutani
+ "ee", // Ewe
+ "el", // Greek
"en", // English
+ "eo", // Esperanto
"es", // Spanish
+ "et", // Estonian
+ "eu", // Basque
+ "fa", // Persian
+ "ff", // Fulah
"fi", // Finnish
+ "fj", // Fiji
+ "fo", // Faroese
"fr", // French
+ "fy", // Frisian
+ "ga", // Irish
+ "gd", // Scots Gaelic
+ "gl", // Galician
+ "gn", // Guarani
+ "gu", // Gujarati
+ "gv", // Manx
+ "ha", // Hausa
+ "he", // Hebrew (formerly iw)
+ "hi", // Hindi
+ "ho", // Hiri Motu
+ "hr", // Croatian
+ "ht", // Haitian Creole
"hu", // Hungarian
+ "hy", // Armenian
+ "hz", // Herero
+ "ia", // Interlingua
+ "id", // Indonesian (formerly in)
+ "ie", // Interlingue
+ "ig", // Igbo
+ "ii", // Nuosu
+ "ik", // Inupiak
+ "io", // Ido
+ "is", // Icelandic
"it", // Italian
+ "iu", // Inuktitut
+ "ja", // Japanese
+ "jv", // Javanese
+ "ka", // Georgian
+ "kg", // Kongo
+ "ki", // Gikuyu
+ "kj", // Kuanyama
+ "kk", // Kazakh
+ "kl", // Greenlandic
+ "km", // Cambodian
+ "kn", // Kannada
+ "ko", // Korean
+ "kr", // Kanuri
+ "ks", // Kashmiri
+ "ku", // Kurdish
+ "kv", // Komi
+ "kw", // Cornish
+ "ky", // Kirghiz
+ "la", // Latin
+ "lb", // Letzeburgesch
+ "lg", // Ganda
+ "li", // Limburgan; Limburger; Limburgish
+ "ln", // Lingala
+ "lo", // Laothian
+ "lt", // Lithuanian
+ "lu", // Luba-Katanga
+ "lv", // Latvian, Lettish
+ "mg", // Malagasy
+ "mh", // Marshallese
+ "mi", // Maori
+ "mk", // Macedonian
+ "ml", // Malayalam
+ "mn", // Mongolian
+ "mo", // Moldavian
+ "mr", // Marathi
+ "ms", // Malay
+ "mt", // Maltese
+ "my", // Burmese
+ "na", // Nauru
+ "nb", // Norwegian Bokmal
+ "nd", // Ndebele, North
+ "ne", // Nepali
+ "ng", // Ndonga
"nl", // Dutch
+ "nn", // Norwegian Nynorsk
"no", // Norwegian
+ "nr", // Ndebele, South
+ "nv", // Navajo; Navaho
+ "ny", // Chichewa; Chewa; Nyanja
+ "oc", // Occitan
+ "oj", // Ojibwa
+ "om", // (Afan) Oromo
+ "or", // Oriya
+ "os", // Ossetian; Ossetic
+ "pa", // Panjabi; Punjabi
+ "pi", // Pali
+ "pl", // Polish
+ "ps", // Pashto; Pushto
"pt", // Portuguese
+ "qu", // Quechua
+ "rm", // Romansh
+ "rn", // Kirundi
"ro", // Romanian
"ru", // Russian
+ "rw", // Kinyarwanda
+ "sa", // Sanskrit
+ "sc", // Sardinian
+ "sd", // Sindhi
+ "se", // Northern Sami
+ "sg", // Sangho
+ "sh", // Serbo-Croatian
+ "si", // Sinhalese
+ "sk", // Slovak
+ "sl", // Slovenian
+ "sm", // Samoan
+ "sn", // Shona
+ "so", // Somali
+ "sq", // Albanian
+ "sr", // Serbian
+ "ss", // Siswati
+ "st", // Sesotho
+ "su", // Sundanese
"sv", // Swedish
+ "sw", // Swahili
+ "ta", // Tamil
+ "te", // Telugu
+ "tg", // Tajik
+ "th", // Thai
+ "ti", // Tigrinya
+ "tk", // Turkmen
+ "tl", // Tagalog
+ "tn", // Setswana
+ "to", // Tonga
"tr", // Turkish
+ "ts", // Tsonga
+ "tt", // Tatar
+ "tw", // Twi
+ "ty", // Tahitian
+ "ug", // Uighur
+ "uk", // Ukrainian
+ "ur", // Urdu
+ "uz", // Uzbek
+ "ve", // Venda
+ "vi", // Vietnamese
+ "vo", // Volapuk
+ "wa", // Walloon
+ "wo", // Wolof
+ "xh", // Xhosa
+ "yi", // Yiddish (formerly ji)
+ "yo", // Yoruba
+ "za", // Zhuang
+ "zh", // Chinese
+ "zu", // Zulu
};
type find( char const *lang ) {
@@ -409,18 +580,110 @@
char const *const string_of[] = {
"#UNKNOWN", // starts with '#' for sorting
+ "aar", // Afar
+ "abk", // Abkhazian
+ "afr", // Afrikaans
+ "aka", // Akan
+ "alb", // Albanian
+ "amh", // Amharic
+ "ara", // Arabic
+ "arg", // Aragonese
+ "arm", // Armenian
+ "asm", // Assamese [without '_', it's a C++ keyword]
+ "ava", // Avaric
+ "ave", // Avestan
+ "aym", // Aymara
+ "aze", // Azerbaijani
+ "bak", // Bashkir
+ "bam", // Bambara
+ "baq", // Basque
+ "bel", // Belarusian
+ "ben", // Bengali
+ "bih", // Bihari
+ "bis", // Bislama
+ "bos", // Bosnian
+ "bre", // Breton
+ "bul", // Bulgarian
+ "bur", // Burmese
+ "cat", // Catalan
+ "cha", // Chamorro
+ "che", // Chechen
+ "chi", // Chinese
+ "chu", // Church Slavic; Old Slavonic; Church Slavonic
+ "cym", // Welsh
"dan", // Danish
"deu", // German (T)
+ "div", // Divehi; Dhivehi; Maldivian
"dut", // Dutch (B)
+ "dzo", // Dzongkha
+ "ell", // Modern Greek
"eng", // English
+ "epo", // Esperanto
+ "est", // Estonian
+ "ewe", // Ewe
+ "fao", // Faroese
+ "fij", // Fijian
"fin", // Finnish
"fra", // French (T)
"fre", // French (B)
+ "fry", // Western Frisian
+ "ful", // Fulah
+ "geo", // Georgian
"ger", // German (B)
+ "gla", // Scottish Gaelic; Gaelic
+ "gle", // Irish
+ "glg", // Galician
+ "glv", // Manx
+ "gre", // Modern Greek
+ "grn", // Guarani
+ "guj", // Gujarati
+ "hat", // Haitian Creole; Haitian
+ "hau", // Hausa
+ "heb", // Hebrew
+ "her", // Herero
+ "hin", // Hindi
+ "hmo", // Hiri Motu
+ "hrv", // Croatian
"hun", // Hungarian
+ "ibo", // Igbo
+ "ice", // Icelandic
+ "ido", // Ido
+ "iku", // Inuktitut
+ "ile", // Interlingue; Occidental
+ "ina", // Interlingua
+ "ind", // Indonesian
+ "ipk", // Inupiaq
+ "isl", // Icelandic
"ita", // Italian
+ "jav", // Javanese
+ "jpn", // Japanese
+ "kal", // Kalaallisut; Greenlandic
+ "kan", // Kannada
+ "kas", // Kashmiri
+ "kat", // Georgian
+ "kau", // Kanuri
+ "kaz", // Kazakh
+ "khm", // Central Khmer
+ "kik", // Kikuyu; Gikuyu
+ "kin", // Kinyarwanda
+ "kir", // Kirghiz; Kyrgyz
+ "kom", // Komi
+ "kon", // Kongo
+ "kor", // Korean
+ "kua", // Kuanyama; Kwanyama
+ "kur", // Kurdish
+ "lao", // Lao
+ "lat", // Latin
+ "lav", // Latvian
+ "lim", // Limburgan; Limburger; Limburgish
+ "lin", // Lingala
+ "lit", // Lithuanian
+ "ltz", // Luxembourgish; Letzeburgesch
+ "lib", // Luba-Katanga
+ "mya", // Burmese
"nld", // Dutch (T)
"nor", // Norwegian
+ "nya", // Chichewa; Chewa; Nyanja
"por", // Portuguese
"ron", // Romanian (T)
"rum", // Romanian (B)
@@ -428,6 +691,18 @@
"spa", // Spanish
"swe", // Swedish
"tur", // Turkish
+ "vie", // Vietnamese
+ "ven", // Venda
+ "vol", // Volapuk
+ "wel", // Welsh
+ "wln", // Walloon
+ "wol", // Wolof
+ "xho", // Xhosa
+ "yid", // Yiddish
+ "yor", // Yoruba
+ "zha", // Zhuang; Chuang
+ "zho", // Chinese
+ "zul", // Zulu
};
type find( char const *lang ) {
@@ -447,18 +722,110 @@
static type const iso639_2_to_639_1[] = {
unknown,
+ aa, // aar
+ ab, // abk
+ af, // afr
+ ak, // aka
+ sq, // alb
+ am, // amh
+ ar, // ara
+ an, // arg
+ hy, // arm
+ as, // asm
+ av, // ava
+ ae, // ave
+ ay, // aym
+ az, // aze
+ ba, // bak
+ bm, // bam
+ eu, // baq
+ be, // bel
+ bn, // ben
+ bh, // bih
+ bi, // bis
+ bs, // bos
+ br, // bre
+ br, // bul
+ my, // bur
+ ca, // cat
+ ch, // cha
+ ce, // che
+ zh, // chi
+ cu, // chu
+ cy, // cym
da, // dan
de, // deu
+ dv, // div
nl, // dut
+ dz, // dzo
+ el, // ell
en, // eng
+ eo, // epo
+ et, // est
+ ee, // ewe
+ fo, // fao
+ fj, // fij
fi, // fin
fr, // fra
fr, // fre
+ fy, // fry
+ ff, // ful
+ ka, // geo
de, // ger
+ gd, // gla
+ ga, // gle
+ gl, // glg
+ gv, // glv
+ el, // gre
+ gn, // grn
+ gu, // guj
+ ht, // hat
+ ha, // hau
+ he, // heb
+ hz, // her
+ hi, // hin
+ ho, // hmo
+ hr, // hrv
hu, // hun
+ ig, // ibo
+ is, // ice
+ io, // ido
+ iu, // iku
+ ie, // ile
+ ia, // ina
+ id, // ind
+ ik, // ipk
+ is, // isl
it, // ita
+ jv, // jav
+ ja, // jpn
+ kl, // kal
+ kn, // kan
+ ks, // kas
+ ka, // kat
+ kr, // kau
+ kk, // kaz
+ km, // khm
+ ki, // kik
+ rw, // kin
+ ky, // kir
+ kv, // kom
+ kg, // kon
+ ko, // kor
+ kj, // kua
+ ku, // kur
+ lo, // lao
+ la, // lat
+ lv, // lav
+ li, // lim
+ ln, // lin
+ lt, // lit
+ lb, // ltz
+ lu, // lub
+ my, // mya
nl, // nld
no, // nor
+ ny, // nya
pt, // por
ro, // ron
ro, // rum
@@ -466,6 +833,18 @@
es, // spa
sv, // swe
tr, // tur
+ ve, // ven
+ vi, // vie
+ vo, // vol
+ cy, // wel
+ wa, // wln
+ wo, // wol
+ xh, // xho
+ yi, // yid
+ yo, // yor
+ za, // zha
+ zh, // zho
+ zu, // zul
};
return iso639_2_to_639_1[ iso639_2::find( lang ) ];
}
=== modified file 'src/zorbautils/locale.h'
--- src/zorbautils/locale.h 2012-04-24 12:39:38 +0000
+++ src/zorbautils/locale.h 2012-04-24 20:57:30 +0000
@@ -29,252 +29,252 @@
namespace iso3166_1 {
enum type {
unknown,
- AD, // Andorra
- AE, // United Arab Emirates
- AF, // Afghanistan
- AG, // Antigua and Barbuda
- AI, // Anguilla
- AL, // Albania
- AM, // Armenia
- AN, // Netherlands Antilles
- AO, // Angola
- AQ, // AntarcticA
- AR, // ArgentinA
- AS, // American Samoa
- AT, // Austria
- AU, // Australia
- AW, // Aruba
- AX, // Aland Islands
- AZ, // Azerbaijan
- BA, // Bosnia and Herzegovina
- BB, // Barbados
- BD, // Bangladesh
- BE, // Belgium
- BF, // Burkina Faso
- BG, // Bulgaria
- BH, // Bahrain
- BI, // Burundi
- BJ, // Benin
- BL, // Saint Barthelemy
- BM, // Bermuda
- BN, // Brunei Darussalam
- BO, // Bolivia
- BR, // Brazil
- BS, // Bahamas
- BT, // Bhutan
- BV, // Bouvet Island
- BW, // Botswana
- BY, // Belarus
- BZ, // Belize
- CA, // Canada
- CC, // Cocos Islands
- CD, // Congo
- CF, // Central African Republic
- CG, // Congo
- CH, // Switzerland
- CI, // Cote D'Ivoire
- CK, // Cook Islands
- CL, // Chile
- CM, // Cameroon
- CN, // China
- CO, // Colombia
- CR, // Costa Rica
- CU, // Cuba
- CV, // Cape Verde
- CX, // Christmas Island
- CY, // Cyprus
- CZ, // Czech Republic
- DE, // Germany
- DJ, // Djibouti
- DK, // Denmark
- DM, // Dominica
- DO, // Dominican Republic
- DZ, // Algeria
- EC, // Ecuador
- EE, // Estonia
- EG, // Egypt
- EH, // Western Sahara
- ER, // Eritrea
- ES, // Spain
- ET, // Ethiopia
- FI, // Finland
- FJ, // Fiji
- FK, // Falkland Islands
- FM, // Micronesia
- FO, // Faroe Islands
- FR, // France
- GA, // Gabon
- GB, // United Kingdom
- GD, // Grenada
- GE, // Georgia
- GF, // French Guiana
- GG, // Guernsey
- GH, // Ghana
- GI, // Gibraltar
- GL, // Greenland
- GM, // Gambia
- GN, // Guinea
- GP, // Guadeloupe
- GQ, // Equatorial Guinea
- GR, // Greece
- GS, // South Georgia and the South Sandwich Islands
- GT, // Guatemala
- GU, // Guam
- GW, // Guinea-Bissau
- GY, // Guyana
- HK, // Hong Kong
- HM, // Heard Island and Mcdonald Islands
- HN, // Honduras
- HR, // Croatia
- HT, // Haiti
- HU, // Hungary
- ID, // Indonesia
- IE, // Ireland
- IL, // Israel
- IM, // Isle of Man
- IN_, // India [without '_', it clashes with an identifier on Windows]
- IO, // British Indian Ocean Territory
- IQ, // Iraq
- IR, // Iran
- IS, // Iceland
- IT, // Italy
- JE, // Jersey
- JM, // Jamaica
- JO, // Jordan
- JP, // Japan
- KE, // Kenya
- KG, // Kyrgyzstan
- KH, // Cambodia
- KI, // Kiribati
- KM, // Comoros
- KN, // Saint Kitts and Nevis
- KP, // Korea (Democratic People's Republic)
- KR, // Korea
- KW, // Kuwait
- KY, // Cayman Islands
- KZ, // Kazakhstan
- LA, // Lao
- LB, // Lebanon
- LC, // Saint Lucia
- LI, // Liechtenstein
- LK, // Sri Lanka
- LR, // Liberia
- LS, // Lesotho
- LT, // Lithuania
- LU, // Luxembourg
- LV, // Latvia
- LY, // Libyan Arab Jamahiriya
- MA, // Morocco
- MC, // Monaco
- MD, // Moldova
- ME, // Montenegro
- MF, // Saint Martin
- MG, // Madagascar
- MH, // Marshall Islands
- MK, // Macedonia
- ML, // Mali
- MM, // Myanmar
- MN, // Mongolia
- MO, // Macao
- MP, // Northern Mariana Islands
- MQ, // Martinique
- MR, // Mauritania
- MS, // Montserrat
- MT, // Malta
- MU, // Mauritius
- MV, // Maldives
- MW, // Malawi
- MX, // Mexico
- MY, // Malaysia
- MZ, // Mozambique
- NA, // Namibia
- NC, // New Caledonia
- NE, // Niger
- NF, // Norfolk Island
- NG, // Nigeria
- NI, // Nicaragua
- NL, // Netherlands
- NO, // Norway
- NP, // Nepal
- NR, // Nauru
- NU, // Niue
- NZ, // New Zealand
- OM, // Oman
- PA, // Panama
- PE, // Peru
- PF, // French Polynesia
- PG, // Papua New Guinea
- PH, // Philippines
- PK, // Pakistan
- PL, // Poland
- PM, // Saint Pierre and Miquelon
- PN, // Pitcairn
- PR, // Puerto Rico
- PS, // Palestinian Territory
- PT, // Portugal
- PW, // Palau
- PY, // Paraguay
- QA, // Qatar
- RE, // Reunion
- RO, // Romania
- RS, // Serbia
- RU, // Russian Federation
- RW, // Rwanda
- SA, // Saudi Arabia
- SB, // Solomon Islands
- SC, // Seychelles
- SD, // Sudan
- SE, // Sweden
- SG, // Singapore
- SH, // Saint Helena
- SI, // Slovenia
- SJ, // Svalbard and Jan Mayen
- SK, // Slovakia
- SL, // Sierra Leone
- SM, // San Marino
- SN, // Senegal
- SO, // Somalia
- SR, // Suriname
- ST, // Sao Tome and Principe
- SV, // El Salvador
- SY, // Syria
- SZ, // Swaziland
- TC, // Turks and Caicos Islands
- TD, // Chad
- TF, // French Southern Territories
- TG, // Togo
- TH, // Thailand
- TJ, // Tajikistan
- TK, // Tokelau
- TL, // Timor-Leste
- TM, // Turkmenistan
- TN, // Tunisia
- TO, // Tonga
- TR, // Turkey
- TT, // Trinidad and Tobago
- TV, // Tuvalu
- TW, // Taiwan
- TZ, // Tanzania
- UA, // Ukraine
- UG, // Uganda
- UM, // United states Minor Outlying Islands
- US, // United States
- UY, // Uruguay
- UZ, // Uzbekistan
- VA, // Vatican
- VC, // Saint Vincent and the Grenadines
- VE, // Venezuela
- VG, // Virgin Islands (British)
- VI, // Virgin Islands (USA)
- VN, // Viet Nam
- VU, // Vanuatu
- WF, // Wallis and Futuna
- WS, // Samoa
- YE, // Yemen
- YT, // Mayotte
- ZA, // South Africa
- ZM, // Zambia
- ZW, // Zimbabwe
+ AD, ///< Andorra
+ AE, ///< United Arab Emirates
+ AF, ///< Afghanistan
+ AG, ///< Antigua and Barbuda
+ AI, ///< Anguilla
+ AL, ///< Albania
+ AM, ///< Armenia
+ AN, ///< Netherlands Antilles
+ AO, ///< Angola
+ AQ, ///< Antarctica
+ AR, ///< Argentina
+ AS, ///< American Samoa
+ AT, ///< Austria
+ AU, ///< Australia
+ AW, ///< Aruba
+ AX, ///< Aland Islands
+ AZ, ///< Azerbaijan
+ BA, ///< Bosnia and Herzegovina
+ BB, ///< Barbados
+ BD, ///< Bangladesh
+ BE, ///< Belgium
+ BF, ///< Burkina Faso
+ BG, ///< Bulgaria
+ BH, ///< Bahrain
+ BI, ///< Burundi
+ BJ, ///< Benin
+ BL, ///< Saint Barthelemy
+ BM, ///< Bermuda
+ BN, ///< Brunei Darussalam
+ BO, ///< Bolivia
+ BR, ///< Brazil
+ BS, ///< Bahamas
+ BT, ///< Bhutan
+ BV, ///< Bouvet Island
+ BW, ///< Botswana
+ BY, ///< Belarus
+ BZ, ///< Belize
+ CA, ///< Canada
+ CC, ///< Cocos Islands
+ CD, ///< Congo
+ CF, ///< Central African Republic
+ CG, ///< Congo
+ CH, ///< Switzerland
+ CI, ///< Cote D'Ivoire
+ CK, ///< Cook Islands
+ CL, ///< Chile
+ CM, ///< Cameroon
+ CN, ///< China
+ CO, ///< Colombia
+ CR, ///< Costa Rica
+ CU, ///< Cuba
+ CV, ///< Cape Verde
+ CX, ///< Christmas Island
+ CY, ///< Cyprus
+ CZ, ///< Czech Republic
+ DE, ///< Germany
+ DJ, ///< Djibouti
+ DK, ///< Denmark
+ DM, ///< Dominica
+ DO, ///< Dominican Republic
+ DZ, ///< Algeria
+ EC, ///< Ecuador
+ EE, ///< Estonia
+ EG, ///< Egypt
+ EH, ///< Western Sahara
+ ER, ///< Eritrea
+ ES, ///< Spain
+ ET, ///< Ethiopia
+ FI, ///< Finland
+ FJ, ///< Fiji
+ FK, ///< Falkland Islands
+ FM, ///< Micronesia
+ FO, ///< Faroe Islands
+ FR, ///< France
+ GA, ///< Gabon
+ GB, ///< United Kingdom
+ GD, ///< Grenada
+ GE, ///< Georgia
+ GF, ///< French Guiana
+ GG, ///< Guernsey
+ GH, ///< Ghana
+ GI, ///< Gibraltar
+ GL, ///< Greenland
+ GM, ///< Gambia
+ GN, ///< Guinea
+ GP, ///< Guadeloupe
+ GQ, ///< Equatorial Guinea
+ GR, ///< Greece
+ GS, ///< South Georgia and the South Sandwich Islands
+ GT, ///< Guatemala
+ GU, ///< Guam
+ GW, ///< Guinea-Bissau
+ GY, ///< Guyana
+ HK, ///< Hong Kong
+ HM, ///< Heard Island and Mcdonald Islands
+ HN, ///< Honduras
+ HR, ///< Croatia
+ HT, ///< Haiti
+ HU, ///< Hungary
+ ID, ///< Indonesia
+ IE, ///< Ireland
+ IL, ///< Israel
+ IM, ///< Isle of Man
+ IN_, ///< India [without '_', it clashes with an identifier on Windows]
+ IO, ///< British Indian Ocean Territory
+ IQ, ///< Iraq
+ IR, ///< Iran
+ IS, ///< Iceland
+ IT, ///< Italy
+ JE, ///< Jersey
+ JM, ///< Jamaica
+ JO, ///< Jordan
+ JP, ///< Japan
+ KE, ///< Kenya
+ KG, ///< Kyrgyzstan
+ KH, ///< Cambodia
+ KI, ///< Kiribati
+ KM, ///< Comoros
+ KN, ///< Saint Kitts and Nevis
+ KP, ///< Korea (Democratic People's Republic)
+ KR, ///< Korea
+ KW, ///< Kuwait
+ KY, ///< Cayman Islands
+ KZ, ///< Kazakhstan
+ LA, ///< Lao
+ LB, ///< Lebanon
+ LC, ///< Saint Lucia
+ LI, ///< Liechtenstein
+ LK, ///< Sri Lanka
+ LR, ///< Liberia
+ LS, ///< Lesotho
+ LT, ///< Lithuania
+ LU, ///< Luxembourg
+ LV, ///< Latvia
+ LY, ///< Libyan Arab Jamahiriya
+ MA, ///< Morocco
+ MC, ///< Monaco
+ MD, ///< Moldova
+ ME, ///< Montenegro
+ MF, ///< Saint Martin
+ MG, ///< Madagascar
+ MH, ///< Marshall Islands
+ MK, ///< Macedonia
+ ML, ///< Mali
+ MM, ///< Myanmar
+ MN, ///< Mongolia
+ MO, ///< Macao
+ MP, ///< Northern Mariana Islands
+ MQ, ///< Martinique
+ MR, ///< Mauritania
+ MS, ///< Montserrat
+ MT, ///< Malta
+ MU, ///< Mauritius
+ MV, ///< Maldives
+ MW, ///< Malawi
+ MX, ///< Mexico
+ MY, ///< Malaysia
+ MZ, ///< Mozambique
+ NA, ///< Namibia
+ NC, ///< New Caledonia
+ NE, ///< Niger
+ NF, ///< Norfolk Island
+ NG, ///< Nigeria
+ NI, ///< Nicaragua
+ NL, ///< Netherlands
+ NO, ///< Norway
+ NP, ///< Nepal
+ NR, ///< Nauru
+ NU, ///< Niue
+ NZ, ///< New Zealand
+ OM, ///< Oman
+ PA, ///< Panama
+ PE, ///< Peru
+ PF, ///< French Polynesia
+ PG, ///< Papua New Guinea
+ PH, ///< Philippines
+ PK, ///< Pakistan
+ PL, ///< Poland
+ PM, ///< Saint Pierre and Miquelon
+ PN, ///< Pitcairn
+ PR, ///< Puerto Rico
+ PS, ///< Palestinian Territory
+ PT, ///< Portugal
+ PW, ///< Palau
+ PY, ///< Paraguay
+ QA, ///< Qatar
+ RE, ///< Reunion
+ RO, ///< Romania
+ RS, ///< Serbia
+ RU, ///< Russian Federation
+ RW, ///< Rwanda
+ SA, ///< Saudi Arabia
+ SB, ///< Solomon Islands
+ SC, ///< Seychelles
+ SD, ///< Sudan
+ SE, ///< Sweden
+ SG, ///< Singapore
+ SH, ///< Saint Helena
+ SI, ///< Slovenia
+ SJ, ///< Svalbard and Jan Mayen
+ SK, ///< Slovakia
+ SL, ///< Sierra Leone
+ SM, ///< San Marino
+ SN, ///< Senegal
+ SO, ///< Somalia
+ SR, ///< Suriname
+ ST, ///< Sao Tome and Principe
+ SV, ///< El Salvador
+ SY, ///< Syria
+ SZ, ///< Swaziland
+ TC, ///< Turks and Caicos Islands
+ TD, ///< Chad
+ TF, ///< French Southern Territories
+ TG, ///< Togo
+ TH, ///< Thailand
+ TJ, ///< Tajikistan
+ TK, ///< Tokelau
+ TL, ///< Timor-Leste
+ TM, ///< Turkmenistan
+ TN, ///< Tunisia
+ TO, ///< Tonga
+ TR, ///< Turkey
+ TT, ///< Trinidad and Tobago
+ TV, ///< Tuvalu
+ TW, ///< Taiwan
+ TZ, ///< Tanzania
+ UA, ///< Ukraine
+ UG, ///< Uganda
+ UM, ///< United States Minor Outlying Islands
+ US, ///< United States
+ UY, ///< Uruguay
+ UZ, ///< Uzbekistan
+ VA, ///< Vatican
+ VC, ///< Saint Vincent and the Grenadines
+ VE, ///< Venezuela
+ VG, ///< Virgin Islands (British)
+ VI, ///< Virgin Islands (USA)
+ VN, ///< Viet Nam
+ VU, ///< Vanuatu
+ WF, ///< Wallis and Futuna
+ WS, ///< Samoa
+ YE, ///< Yemen
+ YT, ///< Mayotte
+ ZA, ///< South Africa
+ ZM, ///< Zambia
+ ZW, ///< Zimbabwe
NUM_ENTRIES
};
extern char const *const string_of[];
@@ -294,7 +294,7 @@
* Finds the ISO 3166-1 country code enumeration from the given string.
*
* @param country An ISO 3166-1 country code.
- * @return Returns said enumeration or <code>unknown</code>.
+ * @return Returns said enumeration or \c unknown.
*/
type find( char const *country );
}
@@ -319,7 +319,7 @@
* Finds the ISO 639-1 language code enumeration from the given string.
*
* @param lang An ISO 639-1 langauge code.
- * @return Returns said enumeration or <code>unknown</code>.
+ * @return Returns said enumeration or \c unknown.
*/
type find( char const *lang );
}
@@ -329,25 +329,129 @@
namespace iso639_2 {
enum type {
unknown,
- dan, // Danish
- deu, // German (T)
- dut, // Dutch (B)
- eng, // English
- fin, // Finnish
- fra, // French (T)
- fre, // French (B)
- ger, // German (B)
- hun, // Hungarian
- ita, // Italian
- nld, // Dutch (T)
- nor, // Norwegian
- por, // Portuguese
- ron, // Romanian (T)
- rum, // Romanian (B)
- rus, // Russian
- spa, // Spanish
- swe, // Swedish
- tur, // Turkish
+ aar, ///< Afar
+ abk, ///< Abkhazian
+ afr, ///< Afrikaans
+ aka, ///< Akan
+ alb, ///< Albanian
+ amh, ///< Amharic
+ ara, ///< Arabic
+ arg, ///< Aragonese
+ arm, ///< Armenian
+ asm_, ///< Assamese [without '_', it's a C++ keyword]
+ ava, ///< Avaric
+ ave, ///< Avestan
+ aym, ///< Aymara
+ aze, ///< Azerbaijani
+ bak, ///< Bashkir
+ bam, ///< Bambara
+ baq, ///< Basque
+ bel, ///< Belarusian
+ ben, ///< Bengali
+ bih, ///< Bihari
+ bis, ///< Bislama
+ bos, ///< Bosnian
+ bre, ///< Breton
+ bul, ///< Bulgarian
+ bur, ///< Burmese
+ cat, ///< Catalan
+ cha, ///< Chamorro
+ che, ///< Chechen
+ chi, ///< Chinese
+ chu, ///< Church Slavic; Old Slavonic; Church Slavonic
+ cym, ///< Welsh
+ dan, ///< Danish
+ deu, ///< German (T)
+ div, ///< Divehi; Dhivehi; Maldivian
+ dut, ///< Dutch (B)
+ dzo, ///< Dzongkha
+ ell, ///< Modern Greek
+ eng, ///< English
+ epo, ///< Esperanto
+ est, ///< Estonian
+ ewe, ///< Ewe
+ fao, ///< Faroese
+ fij, ///< Fijian
+ fin, ///< Finnish
+ fra, ///< French (T)
+ fre, ///< French (B)
+ fry, ///< Western Frisian
+ ful, ///< Fulah
+ geo, ///< Georgian
+ ger, ///< German (B)
+ gla, ///< Scottish Gaelic; Gaelic
+ gle, ///< Irish
+ glg, ///< Galician
+ glv, ///< Manx
+ gre, ///< Modern Greek
+ grn, ///< Guarani
+ guj, ///< Gujarati
+ hat, ///< Haitian Creole; Haitian
+ hau, ///< Hausa
+ heb, ///< Hebrew
+ her, ///< Herero
+ hin, ///< Hindi
+ hmo, ///< Hiri Motu
+ hrv, ///< Croatian
+ hun, ///< Hungarian
+ ibo, ///< Igbo
+ ice, ///< Icelandic
+ ido, ///< Ido
+ iku, ///< Inuktitut
+ ile, ///< Interlingue; Occidental
+ ina, ///< Interlingua
+ ind, ///< Indonesian
+ ipk, ///< Inupiaq
+ isl, ///< Icelandic
+ ita, ///< Italian
+ jav, ///< Javanese
+ jpn, ///< Japanese
+ kal, ///< Kalaallisut; Greenlandic
+ kan, ///< Kannada
+ kas, ///< Kashmiri
+ kat, ///< Georgian
+ kau, ///< Kanuri
+ kaz, ///< Kazakh
+ khm, ///< Central Khmer
+ kik, ///< Kikuyu; Gikuyu
+ kin, ///< Kinyarwanda
+ kir, ///< Kirghiz; Kyrgyz
+ kom, ///< Komi
+ kon, ///< Kongo
+ kor, ///< Korean
+ kua, ///< Kuanyama; Kwanyama
+ kur, ///< Kurdish
+ lao, ///< Lao
+ lat, ///< Latin
+ lav, ///< Latvian
+ lim, ///< Limburgan; Limburger; Limburgish
+ lin, ///< Lingala
+ lit, ///< Lithuanian
+ ltz, ///< Luxembourgish; Letzeburgesch
+ lub, ///< Luba-Katanga
+ mya, ///< Burmese
+ nld, ///< Dutch (T)
+ nor, ///< Norwegian
+ nya, ///< Chichewa; Chewa; Nyanja
+ por, ///< Portuguese
+ ron, ///< Romanian (T)
+ rum, ///< Romanian (B)
+ rus, ///< Russian
+ spa, ///< Spanish
+ swe, ///< Swedish
+ tur, ///< Turkish
+ ven, ///< Venda
+ vie, ///< Vietnamese
+ vol, ///< Volapuk
+ wel, ///< Welsh
+ wln, ///< Walloon
+ wol, ///< Wolof
+ xho, ///< Xhosa
+ yid, ///< Yiddish
+ yor, ///< Yoruba
+ zha, ///< Zhuang; Chuang
+ zho, ///< Chinese
+ zul, ///< Zulu
NUM_ENTRIES
};
extern char const *const string_of[];
@@ -367,7 +471,7 @@
* Finds the ISO 639-2 language code enumeration from the given string.
*
* @param lang An ISO 639-2 langauge code.
- * @return Returns said enumeration or <code>unknown</code>.
+ * @return Returns said enumeration or \c unknown.
*/
type find( char const *lang );
}
@@ -378,21 +482,21 @@
* Finds the ISO 639-1 language code enumeration from the given string.
*
* @param lang Either an ISO 639-1 or an ISO 639-2 langauge code.
- * @return Returns said enumeration or <code>unknown</code>.
+ * @return Returns said enumeration or \c unknown.
*/
iso639_1::type find_lang( char const *lang );
/**
* Gets the ISO 3166-1 country code enumeration for the host system.
*
- * @return Returns said enumeration or <code>unknown</code>.
+ * @return Returns said enumeration or \c unknown.
*/
iso3166_1::type get_host_country();
/**
* Gets the ISO 639-1 language code enumeration for the host system.
*
- * @return Returns said enumeration defaulting to <code>en</code>.
+ * @return Returns said enumeration defaulting to \c en.
*/
iso639_1::type get_host_lang();
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-current-lang-true-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-current-lang-true-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-current-lang-true-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-false-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-false-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-3.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-4.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+false
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+flavor
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+chic
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-3.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+flavor
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-4.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+chic
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-strip-diacritics-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-strip-diacritics-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-strip-diacritics-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+e
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-3.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-4.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-5.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-5.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-5.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-3.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-3.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-4.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-4.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+true
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-1.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-1.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,112 @@
+<tokenizer-properties xmlns="http://www.zorba-xquery.com/modules/full-text" uri="http://www.zorba-xquery.com/full-text/tokenizer/icu">
+ <comments-separate-tokens value="true"/>
+ <elements-separate-tokens value="true"/>
+ <processing-instructions-separate-tokens value="true"/>
+ <supported-languages>
+ <lang>af</lang>
+ <lang>ak</lang>
+ <lang>am</lang>
+ <lang>ar</lang>
+ <lang>as</lang>
+ <lang>az</lang>
+ <lang>be</lang>
+ <lang>bg</lang>
+ <lang>bm</lang>
+ <lang>bn</lang>
+ <lang>bo</lang>
+ <lang>br</lang>
+ <lang>bs</lang>
+ <lang>ca</lang>
+ <lang>cs</lang>
+ <lang>cy</lang>
+ <lang>da</lang>
+ <lang>de</lang>
+ <lang>ee</lang>
+ <lang>el</lang>
+ <lang>en</lang>
+ <lang>eo</lang>
+ <lang>es</lang>
+ <lang>et</lang>
+ <lang>eu</lang>
+ <lang>fa</lang>
+ <lang>ff</lang>
+ <lang>fi</lang>
+ <lang>fo</lang>
+ <lang>fr</lang>
+ <lang>ga</lang>
+ <lang>gl</lang>
+ <lang>gu</lang>
+ <lang>gv</lang>
+ <lang>ha</lang>
+ <lang>he</lang>
+ <lang>hi</lang>
+ <lang>hr</lang>
+ <lang>hu</lang>
+ <lang>hy</lang>
+ <lang>id</lang>
+ <lang>ig</lang>
+ <lang>ii</lang>
+ <lang>is</lang>
+ <lang>it</lang>
+ <lang>ja</lang>
+ <lang>ka</lang>
+ <lang>ki</lang>
+ <lang>kk</lang>
+ <lang>kl</lang>
+ <lang>km</lang>
+ <lang>kn</lang>
+ <lang>ko</lang>
+ <lang>kw</lang>
+ <lang>lg</lang>
+ <lang>ln</lang>
+ <lang>lt</lang>
+ <lang>lu</lang>
+ <lang>lv</lang>
+ <lang>mg</lang>
+ <lang>mk</lang>
+ <lang>ml</lang>
+ <lang>mr</lang>
+ <lang>ms</lang>
+ <lang>mt</lang>
+ <lang>my</lang>
+ <lang>nb</lang>
+ <lang>nd</lang>
+ <lang>ne</lang>
+ <lang>nl</lang>
+ <lang>nn</lang>
+ <lang>om</lang>
+ <lang>or</lang>
+ <lang>pa</lang>
+ <lang>pl</lang>
+ <lang>ps</lang>
+ <lang>pt</lang>
+ <lang>rm</lang>
+ <lang>rn</lang>
+ <lang>ro</lang>
+ <lang>ru</lang>
+ <lang>rw</lang>
+ <lang>sg</lang>
+ <lang>si</lang>
+ <lang>sk</lang>
+ <lang>sl</lang>
+ <lang>sn</lang>
+ <lang>so</lang>
+ <lang>sq</lang>
+ <lang>sr</lang>
+ <lang>sv</lang>
+ <lang>sw</lang>
+ <lang>ta</lang>
+ <lang>te</lang>
+ <lang>th</lang>
+ <lang>ti</lang>
+ <lang>to</lang>
+ <lang>tr</lang>
+ <lang>uk</lang>
+ <lang>ur</lang>
+ <lang>uz</lang>
+ <lang>vi</lang>
+ <lang>yo</lang>
+ <lang>zh</lang>
+ <lang>zu</lang>
+ </supported-languages>
+</tokenizer-properties>
=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-2.xml.res 1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-2.xml.res 2012-04-24 20:57:30 +0000
@@ -0,0 +1,112 @@
+<tokenizer-properties xmlns="http://www.zorba-xquery.com/modules/full-text" uri="http://www.zorba-xquery.com/full-text/tokenizer/icu">
+ <comments-separate-tokens value="true"/>
+ <elements-separate-tokens value="true"/>
+ <processing-instructions-separate-tokens value="true"/>
+ <supported-languages>
+ <lang>af</lang>
+ <lang>ak</lang>
+ <lang>am</lang>
+ <lang>ar</lang>
+ <lang>as</lang>
+ <lang>az</lang>
+ <lang>be</lang>
+ <lang>bg</lang>
+ <lang>bm</lang>
+ <lang>bn</lang>
+ <lang>bo</lang>
+ <lang>br</lang>
+ <lang>bs</lang>
+ <lang>ca</lang>
+ <lang>cs</lang>
+ <lang>cy</lang>
+ <lang>da</lang>
+ <lang>de</lang>
+ <lang>ee</lang>
+ <lang>el</lang>
+ <lang>en</lang>
+ <lang>eo</lang>
+ <lang>es</lang>
+ <lang>et</lang>
+ <lang>eu</lang>
+ <lang>fa</lang>
+ <lang>ff</lang>
+ <lang>fi</lang>
+ <lang>fo</lang>
+ <lang>fr</lang>
+ <lang>ga</lang>
+ <lang>gl</lang>
+ <lang>gu</lang>
+ <lang>gv</lang>
+ <lang>ha</lang>
+ <lang>he</lang>
+ <lang>hi</lang>
+ <lang>hr</lang>
+ <lang>hu</lang>
+ <lang>hy</lang>
+ <lang>id</lang>
+ <lang>ig</lang>
+ <lang>ii</lang>
+ <lang>is</lang>
+ <lang>it</lang>
+ <lang>ja</lang>
+ <lang>ka</lang>
+ <lang>ki</lang>
+ <lang>kk</lang>
+ <lang>kl</lang>
+ <lang>km</lang>
+ <lang>kn</lang>
+ <lang>ko</lang>
+ <lang>kw</lang>
+ <lang>lg</lang>
+ <lang>ln</lang>
+ <lang>lt</lang>
+ <lang>lu</lang>
+ <lang>lv</lang>
+ <lang>mg</lang>
+ <lang>mk</lang>
+ <lang>ml</lang>
+ <lang>mr</lang>
+ <lang>ms</lang>
+ <lang>mt</lang>
+ <lang>my</lang>
+ <lang>nb</lang>
+ <lang>nd</lang>
+ <lang>ne</lang>
+ <lang>nl</lang>
+ <lang>nn</lang>
+ <lang>om</lang>
+ <lang>or</lang>
+ <lang>pa</lang>
+ <lang>pl</lang>
+ <lang>ps</lang>
+ <lang>pt</lang>
+ <lang>rm</lang>
+ <lang>rn</lang>
+ <lang>ro</lang>
+ <lang>ru</lang>
+ <lang>rw</lang>
+ <lang>sg</lang>
+ <lang>si</lang>
+ <lang>sk</lang>
+ <lang>sl</lang>
+ <lang>sn</lang>
+ <lang>so</lang>
+ <lang>sq</lang>
+ <lang>sr</lang>
+ <lang>sv</lang>
+ <lang>sw</lang>
+ <lang>ta</lang>
+ <lang>te</lang>
+ <lang>th</lang>
+ <lang>ti</lang>
+ <lang>to</lang>
+ <lang>tr</lang>
+ <lang>uk</lang>
+ <lang>ur</lang>
+ <lang>uz</lang>
+ <lang>vi</lang>
+ <lang>yo</lang>
+ <lang>zh</lang>
+ <lang>zu</lang>
+ </supported-languages>
+</tokenizer-properties>
=== modified file 'test/rbkt/Queries/CMakeLists.txt'
--- test/rbkt/Queries/CMakeLists.txt 2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/CMakeLists.txt 2012-04-24 20:57:30 +0000
@@ -109,10 +109,15 @@
# depend on module features into the modules themselves.
-# Check if WordNet thesaurus is installed in the location tests expect
-IF(EXISTS "${CMAKE_BINARY_DIR}/test/rbkt/thesauri/wordnet-en.zth")
+# Check if WordNet thesaurus is installed
+SET(WORDNET_THESAURUS_FILE
+ "${CMAKE_BINARY_DIR}/LIB_PATH/edu/princeton/wordnet/wordnet-en.zth")
+IF(EXISTS "${WORDNET_THESAURUS_FILE}")
SET(ZORBA_WORDNET_FOUND 1)
-ENDIF(EXISTS "${CMAKE_BINARY_DIR}/test/rbkt/thesauri/wordnet-en.zth")
+ # Kind of a weird place to put this directive, but convenient
+ INSTALL(FILES "${WORDNET_THESAURUS_FILE}"
+ DESTINATION "${ZORBA_CORE_LIB_DIR}/edu/princeton/wordnet")
+ENDIF(EXISTS "${WORDNET_THESAURUS_FILE}")
IF(ZORBA_SUPPRESS_CURL)
MESSAGE(STATUS "ZORBA_SUPPRESS_CURL is true - not searching for cURL library")
@@ -273,6 +278,7 @@
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-SCORE/score-queries-results-q1 866923)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-SCORE/score-queries-results-q3 866923)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-SCORE/score-queries-results-q3b 866923)
+ EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q1 987632)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q4 866926)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9 866926)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9b 866926)
@@ -355,6 +361,7 @@
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-SCORE/score-queries-results-q1 866923)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-SCORE/score-queries-results-q3 866923)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-SCORE/score-queries-results-q3b 866923)
+ EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q1 987632)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q4 866926)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9 866926)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9b 866926)
@@ -432,6 +439,7 @@
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-STOP-WORD/stop-word-queries-results-q3 909375)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.7-StopWordOption/ft-5.2.11-examples-q5 909375)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.7-StopWordOption/ft-5.2.11-examples-q4 909375)
+ EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q4 909375)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q3 909375)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q2 909375)
EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q1 909375)
@@ -523,7 +531,9 @@
EXPECTED_FAILURE(test/rbkt/zorba/http-client/put/put3_binary_element 3391756)
EXPECTED_FAILURE(test/rbkt/zorba/http-client/post/post3_binary_element 3391756)
IF(NOT ZORBA_NO_ICU)
- EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err17 974477)
+ IF ( ${ICU_VERSION} VERSION_LESS 4.0.0 )
+ EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err17 974477)
+ ENDIF ( ${ICU_VERSION} VERSION_LESS 4.0.0 )
EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m11 866874)
EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m40 866874)
EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m41 866874)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "zu";
+
+ft:current-lang() eq "zu"
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("da") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("de") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("es") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("fi") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("hu") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("it") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("nl") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("no") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("pt") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("ru") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Valid, but unsupported ISO 639-1 code. :)
+ft:is-stem-lang-supported( xs:language("zu") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Invalid ISO 639-1 code. :)
+ft:is-stem-lang-supported( xs:language("XX") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stem-lang-supported( xs:language("sv") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-false-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-false-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word( "flavor", xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Valid, but unsupported ISO 639-1 code. :)
+ft:is-stop-word-lang-supported( xs:language("zu") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Invalid ISO 639-1 code. :)
+ft:is-stop-word-lang-supported( xs:language("XX") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word( "the", xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "en";
+
+ft:is-stop-word( "the" )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-stop-word( "el", xs:language("es") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "es";
+
+ft:is-stop-word( "el" )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Valid, but unsupported ISO 639-1 code. :)
+ft:is-thesaurus-lang-supported( xs:language("zu") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Invalid ISO 639-1 code. :)
+ft:is-thesaurus-lang-supported( xs:language("XX") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FTST0018
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Thesaurus URI that is not statically known. :)
+ft:is-thesaurus-lang-supported( "http://www.example.com/", xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-thesaurus-lang-supported( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+Args:
+--thesaurus
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,6 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:is-thesaurus-lang-supported( "http://wordnet.princeton.edu",
+ xs:language("en") )
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:stem( "flavoring", xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:stem( "chico", xs:language("es") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "en";
+
+ft:stem( "flavoring" )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "es";
+
+ft:stem( "chico" )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:strip-diacritics( "é" )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,6 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "en";
+
+let $synonyms := ft:thesaurus-lookup( "marmite" )
+return $synonyms = "pot"
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+Args:
+--thesaurus
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,6 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu",
+ "marmite",
+ xs:language("en") )
+return $synonyms = "pot"
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+Args:
+--thesaurus
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,7 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "en";
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu",
+ "marmite" )
+return $synonyms = "pot"
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+Args:
+--thesaurus
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,7 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu",
+ "breakfast",
+ xs:language("en"),
+ "BT" )
+return $synonyms = "meal"
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+Args:
+--thesaurus
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,8 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu",
+ "breakfast",
+ xs:language("en"),
+ "USE",
+ 2, 2 )
+return $synonyms = "nourishment"
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,18 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+let $doc := <msg>hello, world</msg>
+let $tokens := ft:tokenize( $doc, xs:language("en") )
+let $t1 := $tokens[1]
+let $t2 := $tokens[2]
+
+return $t1/@value = "hello"
+ and $t1/@lang = "en"
+ and $t1/@paragraph = 1
+ and $t1/@sentence = 1
+
+ and $t2/@value = "world"
+ and $t2/@lang = "en"
+ and $t2/@paragraph = 1
+ and $t2/@sentence = 1
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,18 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+let $doc := <msg xml:lang="es">hola, mundo</msg>
+let $tokens := ft:tokenize( $doc )
+let $t1 := $tokens[1]
+let $t2 := $tokens[2]
+
+return $t1/@value = "hola"
+ and $t1/@lang = "es"
+ and $t1/@paragraph = 1
+ and $t1/@sentence = 1
+
+ and $t2/@value = "mundo"
+ and $t2/@lang = "es"
+ and $t2/@paragraph = 1
+ and $t2/@sentence = 1
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+import module namespace ref = "http://www.zorba-xquery.com/modules/node-reference";
+
+let $x := <p xml:lang="en">Houston, we have a <em>problem</em>!</p>
+let $tokens := ft:tokenize( $x )
+let $node-ref := $tokens[5]/@node-ref
+let $node := ref:node-by-reference( $node-ref )
+return $node instance of text()
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+import module namespace ref = "http://www.zorba-xquery.com/modules/node-reference";
+
+let $x := <msg xml:lang="en" content="Houston, we have a problem!"/>
+let $tokens := ft:tokenize( $x/@content )
+let $node-ref := $tokens[5]/@node-ref
+let $node := ref:node-by-reference( $node-ref )
+return $node instance of attribute(content)
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,8 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+let $x := "hello, world"
+let $tokens := ft:tokenize-string( $x, xs:language("en") )
+return $tokens[1] = "hello"
+ and $tokens[2] = "world"
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "en";
+
+let $x := "hello, world"
+let $tokens := ft:tokenize-string( $x )
+return $tokens[1] = "hello"
+ and $tokens[2] = "world"
+
+(: vim:set et sw=2 ts=2: :)
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+ft:tokenizer-properties( xs:language("en") )
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+declare ft-option using language "en";
+
+ft:tokenizer-properties()
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.spec 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.spec 2012-04-24 20:57:30 +0000
@@ -0,0 +1,4 @@
+Error: http://www.w3.org/2005/xqt-errors:FOCA0003
+Args:
+--thesaurus
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== added file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.xq 1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.xq 2012-04-24 20:57:30 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";
+
+(: Invalid at least/most range. :)
+ft:thesaurus-lookup( "http://wordnet.princeton.edu",
+ "affluent",
+ xs:language("en"),
+ "use",
+ -1, 2 )
+
+(: vim:set et sw=2 ts=2: :)
=== removed file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-1.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-1.spec 2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-1.spec 1970-01-01 00:00:00 +0000
@@ -1,3 +0,0 @@
-Args:
---thesaurus
-##default:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth
=== removed file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-2.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-2.spec 2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-2.spec 1970-01-01 00:00:00 +0000
@@ -1,3 +0,0 @@
-Args:
---thesaurus
-##default:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth
=== modified file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-3.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-3.spec 2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-3.spec 2012-04-24 20:57:30 +0000
@@ -1,3 +1,3 @@
Args:
--thesaurus
-http://wordnet.princeton.edu:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== modified file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-4.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-4.spec 2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-4.spec 2012-04-24 20:57:30 +0000
@@ -1,3 +1,3 @@
Args:
--thesaurus
-http://wordnet.princeton.edu:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
=== modified file 'test/rbkt/Scripts/w3c/import_w3c_full_text_testsuite.sh'
--- test/rbkt/Scripts/w3c/import_w3c_full_text_testsuite.sh 2012-04-24 12:39:38 +0000
+++ test/rbkt/Scripts/w3c/import_w3c_full_text_testsuite.sh 2012-04-24 20:57:30 +0000
@@ -148,7 +148,7 @@
# does not understand $RBKT_SRC_DIR. Should change specification.h to
# do that replacement universally and eliminate the numerous other places
# that do it.
- $thesauri {$id} = "$uri:=xqftts|$test_src_path/$path";
+ $thesauri {$id} = "$uri:=xqftts://$test_src_path/$path";
next;
}
if (m/^%stop /) {
=== modified file 'test/rbkt/testdriver.cpp'
--- test/rbkt/testdriver.cpp 2012-04-24 12:39:38 +0000
+++ test/rbkt/testdriver.cpp 2012-04-24 20:57:30 +0000
@@ -26,7 +26,7 @@
#include <time.h>
#endif
-//#define ZORBA_TEST_PLAN_SERIALIZATION
+/*#define ZORBA_TEST_PLAN_SERIALIZATION /**/
#include "testdriverconfig.h" // SRC and BIN dir definitions
#include "specification.h" // parsing spec files
Follow ups