zorba-coders team mailing list archive

Thread
Date

[Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba

To: mp+103386@xxxxxxxxxxxxxxxxxx
From: "Paul J. Lucas" <paul@xxxxxxxxxxxxx>
Date: Tue, 24 Apr 2012 22:20:26 -0000
Reply-to: mp+103386@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Paul J. Lucas has proposed merging lp:~zorba-coders/zorba/feature-ft_module into lp:zorba.

Requested reviews:
  Markos Zaharioudakis (markos-za)
  Matthias Brantner (matthias-brantner)
Related bugs:
  Bug #944795 in Zorba: "XQDoc doesn't handle & in URLs"
  https://bugs.launchpad.net/zorba/+bug/944795

For more details, see:
https://code.launchpad.net/~zorba-coders/zorba/feature-ft_module/+merge/103386

1. Added a new full-text module.
2. Fixed semi-broken Thesaurus API.
3. Now supporting many more languages for tokenization including Chinese.
4. Many other full-text improvements.
-- 
https://code.launchpad.net/~zorba-coders/zorba/feature-ft_module/+merge/103386
Your team Zorba Coders is subscribed to branch lp:zorba.

=== modified file 'ChangeLog'
--- ChangeLog	2012-04-24 12:39:38 +0000
+++ ChangeLog	2012-04-24 22:19:24 +0000
@@ -10,6 +10,7 @@
   * fn:unparsed-text-available
   * Extended API for Python, Java, PHP and Ruby.
   * Add jvm classpath to zorbacmd and to Zorba API. Tracked by #931816
+  * Added full-text module.
   * Added support for NO_ICU (to not use ICU for unicode processing)
   * Added XQJ support.
 
@@ -88,6 +89,8 @@
   * Fixed bug 867509 (Can not handle largest xs:unsignedLong values)
   * Fixed bug 924063 (sentence is incorrectly incremented when token characters end without sentence terminator)
   * Fixed bug 909126 (bug in cloning of var_expr)
+  * Fixed bug 928631 (external builtin function were not executed in the module they
+    were declared)
   * Fixed bug in destruction of exit_catcher_expr
   * Fixed bug #867024 (error messages)
   * Fixed bug #957580 (stream read failure in StringToCodepointsIteartor)

=== modified file 'cmake_modules/FindICU.cmake'
--- cmake_modules/FindICU.cmake	2012-04-24 14:35:54 +0000
+++ cmake_modules/FindICU.cmake	2012-04-24 22:19:24 +0000
@@ -28,6 +28,8 @@
 #                       (note: in addition to ICU_LIBRARIES)
 #  ICU_DATA_LIBRARIES - Libraries to link against for ICU data
 #
+#  ICU_VERSION        - ICU's version number.
+#
 
 # Look for the header file.
 find_path(

=== modified file 'doc/zorba/ft_intro.dox'
--- doc/zorba/ft_intro.dox	2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_intro.dox	2012-04-24 22:19:24 +0000
@@ -5,9 +5,9 @@
 specification.
 Additional documentation:
 
-  - \ref ft_stemmer
-  - \ref ft_thesaurus
-  - \ref ft_tokenizer
+- \ref ft_stemmer
+- \ref ft_thesaurus
+- \ref ft_tokenizer
 
 \section ft_unimplemented Unimplemented Features
 
@@ -16,11 +16,11 @@
 implemented.
 The features that are not (completely) implemented are:
 
-  - The <a href="http://www.w3.org/TR/xpath-full-text-10/#ftignoreoption";>Ignore Option</a>
-    (bug <a href="https://bugs.launchpad.net/zorba/+bug/sf-3187470";>3187470</a>).
-  - <a href="http://www.w3.org/TR/xpath-full-text-10/#section-score-variables";>Score Variables</a>
-    and <a href="http://www.w3.org/TR/xpath-full-text-10/#section-using-weights";>Using Weights Within a Scored FTContainsExpr</a>
-    (bug <a href="https://bugs.launchpad.net/zorba/+bug/sf-3187462";>3187462</a>).
+- The <a href="http://www.w3.org/TR/xpath-full-text-10/#ftignoreoption";>Ignore Option</a>
+  (bug <a href="https://bugs.launchpad.net/zorba/+bug/866924";>866924</a>).
+- <a href="http://www.w3.org/TR/xpath-full-text-10/#section-score-variables";>Score Variables</a>
+  and <a href="http://www.w3.org/TR/xpath-full-text-10/#section-using-weights";>Using Weights Within a Scored FTContainsExpr</a>
+  (bug <a href="https://bugs.launchpad.net/zorba/+bug/866923";>866923</a>).
 
 */
 /* vim:set et sw=2 ts=2: */

=== modified file 'doc/zorba/ft_stemmer.dox'
--- doc/zorba/ft_stemmer.dox	2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_stemmer.dox	2012-04-24 22:19:24 +0000
@@ -56,7 +56,12 @@
 public:
   typedef /* implementation-defined */ ptr;
 
+  struct Properties {
+    char const *uri;
+  };
+
   virtual void destroy() const = 0;
+  virtual void properties( Properties *result ) const = 0;
   virtual void stem( String const &word, locale::iso639_1::type lang, String *result ) const = 0;
 protected:
   virtual ~Stemmer();
@@ -89,6 +94,8 @@
 Note that \c result should always be set to something.
 If your stemmer doesn't know how to stem the given word,
 you should set \c result to \c word.
+You also need to implement the \c properties() function
+and set the identifying URI of your stemmer.
 
 A very simple stemmer
 that stems the word "foobar" to "foo"
@@ -98,6 +105,7 @@
 class MyStemmer : public Stemmer {
 public:
   void destroy() const;
+  void properties( Properties *result ) const;
   void stem( String const &word, locale::iso639_1::type lang, String *result ) const;
 private:
   MyStemmer();
@@ -108,6 +116,10 @@
   // Do nothing since we statically allocate a singleton instance of our stemmer.
 }
 
+void MyStemmer::properties( Properties *props ) const {
+  props->uri = "http://my.example.com/zorba/full-text/stemmer";;
+}
+
 void MyStemmer::stem( String const &word, locale::iso639_1::type lang, String *result ) const {
   if ( word == "foobar" )
     *result = "foo";
@@ -120,7 +132,6 @@
 or a dictionary look-up
 to stem many words,
 of course.
-
 Although not used in this simple example,
 \c lang can be used to allow a single stemmer instance
 to stem words in more than one language.
@@ -135,16 +146,24 @@
 class StemmerProvider {
 public:
   virtual ~StemmerProvider();
-  virtual Stemmer::ptr getStemmer( locale::iso639_1::type lang ) const = 0;
+  virtual bool getStemmer( locale::iso639_1::type lang, Stemmer::ptr *s = 0 ) const = 0;
 };
 \endcode
 
+The \c getStemmer() function should return \c true
+only if it can provide a \c Stemmer
+for the given language; \c false otherwise.
+If the \c Stemmer::ptr argument is \c null,
+the caller wants to check only whether the provider
+can provide a stemmer for the given language
+and doesn't want a \c Stemmer instance created or returned.
+
 A simple \c StemmerProvider for our simple stemmer can be implemented as:
 
 \code
 class MyStemmerProvider : public StemmerProvider {
 public:
-  Stemmer::ptr getStemmer( locale::iso639_1::type lang ) const;
+  bool getStemmer( locale::iso639_1::type lang Stemmer::ptr *s = 0 ) const;
 };
 
 Stemmer::ptr MyStemmerProvider::getStemmer( locale::iso639_1::type lang ) const {
@@ -154,15 +173,14 @@
     case iso639_1::en:
     case iso639_1::unknown: // Handle "unknown" language since, in many cases, the language is not known.
       result.reset( &stemmer );
-      break;
+      return true;
     default: 
       //
-      // We have no stemmer for the given language: leave the result as null to indicate this.
+      // We have no stemmer for the given language: return false.
       // Zorba will then use the built-in stemmer for the given language.
       //
-      break;
+      return false;
   }
-  resturn std::move( result );
 }
 \endcode
 

=== modified file 'doc/zorba/ft_thesaurus.dox'
--- doc/zorba/ft_thesaurus.dox	2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_thesaurus.dox	2012-04-24 22:19:24 +0000
@@ -44,16 +44,16 @@
 To download and install the WordNet database on a Unix-like system,
 follow these steps:
 
-  -# Download the WordNet database from
-     <a href="http://wordnet.princeton.edu/wordnet/download/";>here</a>.
-     All you really need are just the database files
-     (<code>WNdb-3.0.tar.gz</code>).
-  -# Un-gzip and untar the files.
-     This will result in a directory dict containing the database files.
-  -# Move the dict directory somewhere of your choosing,
-     e.g., <code>/usr/local/wordnet-3.0/dict</code>.
-  -# Compile the \c dict directory into a Zorba-compatible binary thesaurus
-     as described below.
+-# Download the WordNet database from
+   <a href="http://wordnet.princeton.edu/wordnet/download/";>here</a>.
+   All you really need are just the database files
+   (<code>WNdb-3.0.tar.gz</code>).
+-# Un-gzip and untar the files.
+   This will result in a directory dict containing the database files.
+-# Move the dict directory somewhere of your choosing,
+   e.g., <code>/usr/local/wordnet-3.0/dict</code>.
+-# Compile the \c dict directory into a Zorba-compatible binary thesaurus
+   as described below.
 
 To compile the WordNet database files,
 use the \c zt-wn-compile script
@@ -65,12 +65,12 @@
 zt-wn-compile [-v] wordnet_dict_dir [thesaurus_file]
 \endcode
 
-  - The \c -v option specifies verbose output.
-  - The \e wordnet_dict_dir specifies the full path
-    of the WordNet \c dict directory.
-  - The \e thesaurus_file specifies the name of the resulting binary file.
-    If none is given, it defaults to \c wordnet-en.zth
-    ("en" for English and "zth" for "Zorba Thesaurus file").
+- The \c -v option specifies verbose output.
+- The \e wordnet_dict_dir specifies the full path
+  of the WordNet \c dict directory.
+- The \e thesaurus_file specifies the name of the resulting binary file.
+  If none is given, it defaults to \c wordnet-en.zth
+  ("en" for English and "zth" for "Zorba Thesaurus file").
 
 For example:
 
@@ -78,33 +78,39 @@
 zt-wn-compile -v /usr/local/wordnet-3.0/dict
 \endcode
 
-Move the \c wordnet-en.zth file to a location of your choosing.
+To install the \c wordnet-en.zth file,
+move it onto Zorba's <i>library path</i>:
+
+\code
+LIB_PATH/edu/princeton/wordnet/wordnet-en.zth
+\endcode
 
 \subsection ft_thesaurus_precompiled Downloading a Precompiled WordNet Database
 
 Alternatively,
-you can download a precompiled WordNet database from
+you can download a precompiled, little-endian (Intel) CPU WordNet database from
 <a href="http://www.zorba-xquery.com/downloads/WordNet-3.0/wordnet-en.zip";>here</a>.
 
 \section ft_thesaurus_mappings Thesauri Mappings
 
 In order to use thesauri,
-you need to specify where they are to the Zorba engine
-via one or more thesaurus <i>mappings</i>.
-A <i>mapping</i> maps a symbolic URI to URI for an actual thesaurus.
+you need to specify what symbolic URI(s) <i>map</i>
+to what thesauri.
 A mapping is of the form:
 
-<i>from_uri</i><code>:=</code><b>[</b><i>implementation</i><code>|</code><b>]</b><i>to_uri</i>
+<i>from_uri</i><code>:=</code><i>implementation-scheme</i><code>:</code><i>to_uri</i>
 
 For example:
 
 \code
-http://wordnet.princeton.edu:=wordnet|/usr/local/zorba/thesauri/wordnet-en.zth
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu
 \endcode
 
 says that the symbolic URI \c http://wordnet.princeton.edu
 maps to the WordNet implementation
-having a database file at the given path.
+having a database file at the given sub-path
+\c edu/princeton/wordnet
+on Zorba's library path.
 Once a mapping is established for a symbolic URI,
 it can be used in a query:
 
@@ -114,13 +120,8 @@
   using thesaurus at "http://wordnet.princeton.edu";
 \endcode
 
-If the \e implementation is omitted,
-it defaults to \c wordnet.
 As a special-case,
-the \e from_uri can be \c default or 
-\code 
-##default
-\endcode
+the \e from_uri can be \c default or \c ##default
 to allow for specifying the default thesaurus
 as was done for the first example on this page.
 
@@ -130,7 +131,7 @@
 use one or more –thesaurus options:
 
 \code
-zorba --thesaurus default:=/usr/local/zorba/thesauri/wordnet-en.zth ...
+zorba --thesaurus default:=wordnet://wordnet.princeton.edu ...
 \endcode
 
 \section ft_thesaurus_rels Thesaurus Relationships
@@ -423,25 +424,26 @@
 
 If no levels are specified in a query,
 Zorba defaults the WordNet implementation to be 2 levels.
-The rationale can be found
-<a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=11444";>here</a>.
+(The rationale can be found
+<a href="http://www.w3.org/Bugs/Public/show_bug.cgi?id=11444";>here</a>.)
 
 \section ft_thesaurus_providing Providing Your Own Thesaurus
 
 Using the Zorba C++ API,
 you can provide your own thesaurus
-by deriving from three classes:
+by deriving from four classes:
 \c Thesaurus,
 \c Thesaurus::iterator,
+\c ThesaurusProvider,
 and
-\c ThesaurusProvider.
+\c URLResolver.
 
 \subsection ft_class_thesaurus The Thesaurus Class
 
 The \c Thesaurus class is:
 
 \code
-class Thesaurus : public Resource {
+class Thesaurus {
 public:
   typedef /* implementation-defined */ ptr;
   typedef /* implementation-defined */ range_type;
@@ -457,15 +459,15 @@
 
   virtual iterator::ptr lookup( String const &phrase, String const &relationship, range_type at_least, range_type at_most ) const = 0;
 
-  virtual void destroy() const = 0;     // interited from Resource
+  virtual void destroy() const = 0;
 protected:
   virtual ~Thesaurus();
 };
 \endcode
 
-For details about the \c ptr type,
-the \c destroy() function,
-and why the destructor is \c protected,
+For details about the \c ptr types,
+the \c destroy() functions,
+and why the destructors are \c protected,
 see the \ref memory_management document.
 
 To implement the \c Thesaurus
@@ -482,18 +484,19 @@
   </tr>
   <tr>
     <td>\c at_least</td>
-    <td>The The minimum number of levels within the thesaurus to be traversed.</td>
+    <td>The minimum number of levels within the thesaurus to be traversed.</td>
   </tr>
   <tr>
     <td>\c at_most</td>
-    <td>The The maximum number of levels within the thesaurus to be traversed.</td>
+    <td>The maximum number of levels within the thesaurus to be traversed.</td>
   </tr>
 </table>
 
 The \c lookup() function returns a pointer to an \c iterator
 that is used to iterate over the phrase's synonyms.
-
-A very simple thesaurus
+You also need to implement an \c iterator.
+A very simple \c Thesaurus
+and its \c iterator
 can be implemented as:
 
 \code
@@ -505,53 +508,49 @@
   //
   // Define a simple thesaurus data structure as a map from a phrase to a list of its synonyms.
   //
-  typedef std::list<String> synonyms_t;
-  typedef std::map<String,synonyms_t const*> thesaurus_t;
+  typedef std::list<String> synonyms_type;
+  typedef std::map<String,synonyms_type const*> thesaurus_data_type;
 
-  static thesaurus_t const& get_thesaurus();
+  static thesaurus_data_type const& get_thesaurus_data();
 
   class iterator : public Thesaurus::iterator {
   public:
-    iterator( synonyms_t const &s ) : synonyms_( s ), i_( s.begin() ) { }
+    iterator( synonyms_type const &s ) : synonyms_( s ), i_( s.begin() ) { }
     void destroy();
     bool next( String *synonym );
   private:
-    synonyms_t const &synonyms_;      // synonyms to iterate over
-    synonyms_t::const_iterator i_;    // current iterator position
+    synonyms_type const &synonyms_;     // synonyms to iterate over
+    synonyms_type::const_iterator i_;   // current iterator position
   };
 };
 
 void MyThesaurus::destroy() const {
-  // Do nothing since we statically allocate a singleton instance of our thesaurus.
+  // Do nothing since we statically allocate a singleton instance of our Thesaurus.
 }
 
-MyThesaurus::thesaurus_t const& MyThesaurus::get_thesaurus() {
-  static thesaurus_t thesaurus;
-  if ( thesaurus.empty() ) {
-    //
-    // Construct a thesaurus "by hand" for this example.  A real thesaurus would probably
-    // be read from disk.
-    //
+MyThesaurus::thesaurus_data_type const& MyThesaurus::get_thesaurus_data() {
+  static thesaurus_data_type thesaurus_data;
+  if ( thesaurus_data.empty() ) {
+    //
+    // Construct thesaurus data "by hand" for this example.  A real thesaurus would probably be read from disk.
     // Note that every list of synonyms must always include the original phrase.
     //
-    static synonyms_t synonyms;
+    static synonyms_type synonyms;
     synonyms.push_back( "foo" );
     synonyms.push_back( "foobar" );
-    thesaurus[ "foo"    ] = &synonyms;
-    thesaurus[ "foobar" ] = &synonyms;
+    thesaurus_data[ "foo"    ] = &synonyms;
+    thesaurus_data[ "foobar" ] = &synonyms;
   }
-  return thesaurus;
+  return thesaurus_data;
 }
-\endcode
 
-\code
 MyThesaurus::iterator::ptr MyThesaurus::lookup( String const &phrase, String const &relationship,
                                                 range_type at_least, range_type at_most ) const {
-  static thesaurus_t const &thesaurus = get_thesaurus();
-  thesaurus_t::const_iterator const i = thesaurus.find( phrase );
+  static thesaurus_data_type const &thesaurus_data = get_thesaurus_data();
+  thesaurus_data_type::const_iterator const entry = thesaurus_data.find( phrase );
   iterator::ptr result;
-  if ( i != thesaurus.end() )
-    result.reset( new iterator( *i->second ) );
+  if ( entry != thesaurus_data.end() )
+    result.reset( new iterator( *entry->second ) );
   return std::move( result );
 }
 
@@ -572,13 +571,71 @@
 A real thesaurus would load a large number of synonyms,
 of course.
 
+\subsection ft_class_thesaurus_provider The ThesaurusProvider Class
+
+The \c ThesaurusProvider class is:
+
+\code
+class ThesaurusProvider : public Resource {
+public:
+  typedef /* implementation-defined */ ptr;
+
+  virtual bool getThesaurus( locale::iso639_1::type lang, Thesaurus::ptr *thesaurus = 0 ) const = 0;
+  void destroy() const;                 // inherited from Resource
+};
+\endcode
+
+To implement a \c ThesaurusProvider,
+you need to implement the \c getThesaurus() function where:
+
+<table>
+  <tr>
+    <td>\c lang</td>
+    <td>The desired language of the thesaurus.</td>
+  </tr>
+  <tr>
+    <td>\c thesaurus</td>
+    <td>If not \c null, set to point to a thesaurus for \c lang.</td>
+  </tr>
+</table>
+
+The \c getThesaurus() function returns \c true
+only if it can provide a thesaurus for the given language.
+Continuing with the example,
+a very simple \c ThesaurusProvider
+can be implemented as:
+
+\code
+class MyThesaurusProvider : pulic ThesaurusProvider {
+public:
+  void destroy() const;
+  bool getThesaurus( iso639_1::type lang, Thesaurus::ptr* = 0 ) const;
+};
+
+void MyThesaurusProvider::destroy() const {
+  // Do nothing since we statically allocate a singleton instance of our ThesaurusProvider.
+}
+
+bool MyThesaurusProvider::getThesaurus( iso639_1::type lang, Thesaurus::ptr *result ) const {
+  //
+  // Since our tiny thesaurus contains only universally known words, we don't bother checking lang
+  // and always return true.
+  //
+  static MyThesaurus thesaurus;
+  if ( result )
+    result->reset( &thesaurus );
+  return true;
+}
+\endcode
+
 \subsection ft_class_thesaurus_resolver A Thesaurus URL Resolver Class
 
-In addition to a \c Thesaurus,
+In addition to a \c Thesaurus
+and \c ThesaurusProvider,
 you must also implement a "thesaurus resolver" class
 that,
-given a URL and a language,
-provides a \c Thesaurus for that language.
+given a URI,
+provides a \c ThesaurusProvider for that URI.
 A simple \c ThesaurusURLResolver
 for our simple thesaurus can be implemented as:
 
@@ -591,23 +648,12 @@
   String const url_;
 };
 
-Resource*
-ThesaurusURLResolver::resolveURL( String const &url, EntityData const *data ) const {
-  ThesaurusEntityData const *const t_data = dynamic_cast<ThesaurusEntityData const*>( data );
-  assert( t_data );
-  static MyThesaurus thesaurus;
-  if ( url == url_ )
-    switch ( t_data->getLanguage() ) {
-      case locale::iso639_1::en:
-      case locale::iso639_1::unknown:
-        //
-        // Here, we could test to ensure that the language of our thesaurus matches the
-        // language sought, but in our case, we want our thesaurus to be used for all
-        // languages since "foo" and "foobar" are universal.
-        //
-      default:
-        return &thesaurus;
-    }
+Resource* ThesaurusURLResolver::resolveURL( String const &url, EntityData const *data ) const {
+  if ( data->getKind() == EntityData::THESAURUS )
+    static MyThesaurusProvider provider;
+    if ( uri == uri_ )
+      return &provider;
+  }
   return 0;
 }
 \endcode

=== modified file 'doc/zorba/ft_tokenizer.dox'
--- doc/zorba/ft_tokenizer.dox	2012-04-24 12:39:38 +0000
+++ doc/zorba/ft_tokenizer.dox	2012-04-24 22:19:24 +0000
@@ -5,14 +5,25 @@
 The Zorba XQuery processor implements the
 <a href="http://www.w3.org/TR/xpath-full-text-10/";>XQuery and XPath Full Text 1.0</a>
 specification that, among other things,
-tokenizes a string into a sequence of tokens.
-See
-<a href="http://www.w3.org/TR/xpath-full-text-10/#TokenizationSec";>Tokenization</a>.
-
-The initial implementation of the toknenizer
-uses the one provided by the
-<a href="http://site.icu-project.org/";>ICU library</a>.
-However, you can provide your own tokenizer instead.
+<a ref="http://www.w3.org/TR/xpath-full-text-10/#TokenizationSec";>tokenizes</a>
+a string into a sequence of tokens.
+
+\section ft_tokenizer_tokization Tokenization
+
+Using the
+<a href="http://site.icu-project.org/";>ICU library</a>,
+Zorba's implementation of tokenization
+considers only alpha-numeric sequences of characters to be part of a token;
+whitespace and punctuation characters are not
+and separate tokens.
+However, alpha-numeric sequences matching the regular expression
+<code>[0-9][.,][0-9]</code>
+are retained as part of a token, e.g.:
+"98.6" and "1,432.58" are tokens.
+
+Alternatively,
+you can implement your own tokenizer
+by deriving from the \c Tokenizer class.
 
 \section ft_class_tokenizer The Tokenizer Class
 
@@ -36,33 +47,43 @@
 
   class Callback {
   public:
-    typedef Tokenizer::size_type size_type;;
+    typedef Tokenizer::size_type size_type;
 
     virtual ~Callback();
 
-    virtual void operator()( char const *utf8_s, size_type utf8_len,
-                             size_type token_no, size_type sent_no, size_type para_no,
-                             void *payload = 0 ) = 0;
-  };
-
-  enum ElementTraceOptions {
-    trace_none  = 0x0,  // Trace no elements.
-    trace_begin = 0x1,  // Trace the beginning of elements.
-    trace_end   = 0x2   // Trace the ending of elements.
-  };
+    virtual void token( char const *utf8_s, size_type utf8_len, locale::iso639_1::type lang,
+                        size_type token_no, size_type sent_no, size_type para_no,
+                        Item const *item = 0 ) = 0;
+  };
+
+  struct Properties {
+    typedef std::vector<locale::iso639_1::type> languages_type;
+
+    bool comments_separate_tokens;
+    bool elements_separate_tokens;
+    bool processing_instructions_separate_tokens;
+    languages_type languages;
+    char const *uri;
+  };
+
+  virtual void properties( Properties *result ) const = 0;
 
   virtual void destroy() const = 0;
-  virtual void element( Item const &qname, int trace_options );
   Numbers& numbers();
   Numbers const& numbers() const;
-  int trace_options() const;
-
-  virtual void tokenize( char const *utf8_s, size_type utf8_len, locale::iso639_1::type lang,
-                         bool wildcards, Callback &callback, void *payload = 0 ) = 0;
+
+  void tokenize_node( Item const &node, locale::iso639_1::type lang, Callback &callback );
+
+  virtual void tokenize_string( char const *utf8_s, size_type utf8_len, locale::iso639_1::type lang,
+                                bool wildcards, Callback &callback, Item const *item = 0 ) = 0;
 
 protected:
-  Tokenizer( Numbers&, int trace_options = trace_none );
+  Tokenizer( Numbers& );
   virtual ~Tokenizer();
+
+  bool find_lang_attribute( Item const&, locale::iso639_1::type *lang );
+  virtual void item( Item const&, bool entering );
+  virtual void tokenize_node_impl( Item const&, locale::iso639_1::type, Callback&, bool tokenize_acp );
 };
 \endcode
 
@@ -76,8 +97,8 @@
 It simply keeps track of the current
 token, sentence, and paragraph numbers.
 
-To implement the \c Tokenizer,
-you need to implement the \c %tokenize() function where:
+To implement a \c Tokenizer,
+you need to implement the \c %tokenize_string() function where:
 
 <table>
   <tr>
@@ -115,9 +136,13 @@
     </td>
   </tr>
   <tr>
-    <td>\c payload</td>
+    <td>\c item</td>
     <td>
-      Optional implementation-defined data.
+      The \c Item whence this token came.
+      If the token occurred within an element,
+      the \c Item is the text node.
+      If the token occurred within an attribute,
+      the \c Item is the attribute node.
     </td>
   </tr>
 </table>
@@ -127,21 +152,30 @@
 However,
 the things a tokenizer should take into consideration include:
 
-  - Detecting sentence termination ('.', '?', and '!' characters).
-  - Handling floating-point numbers with possible thousands separators
-    in US and European formats, e.g. "98.7", "98,7", "10,000", etc.
-  - Distinguishing '.' used as a sentence terminator
-    from '.' used as a decimal point.
-  - Handling apostrophies, e.g., "men's".
-  - Handling acronyms, e.g., "AT&T".
-
-\subsection ft_paragraphs Paragraphs
+- Detecting sentence termination ('.', '?', and '!' characters).
+- Handling floating-point numbers with possible thousands separators
+  in US and European formats, e.g. "98.7", "98,7", "10,000", etc.
+- Distinguishing '.' used as a sentence terminator
+  from '.' used as a decimal point.
+- Handling apostrophies, e.g., "men's".
+- Handling acronyms, e.g., "AT&T".
+
+The task of iterating over an XML element's child nodes
+is done by \c tokenize_node_impl().
+Its default implementation
+treats XML elements, comments, and processing instructions
+as token separators.
+(See \ref ft_tokenizer_properties.)
+If you want to change that,
+you need to override \c tokenize_node_impl().
+
+\subsection ft_tokenizer_paragraphs Paragraphs
 
 By default,
 Zorba increments the current paragraph number once
 for each XML element encountered.
 However,
-this doens't work well for mixed content.
+this doesn't work well for mixed content.
 For example, in the XHTML:
 \code
 <p>The <em>best</em> thing ever!</p>
@@ -150,31 +184,65 @@
 but Zorba will consider that 3 paragraphs by default.
 
 Your tokenizer can take control over when the paragraph number is incremented
-by passing the bitwise-or
-of the \c ElementTraceOptions values
-to the constructor
-and overriding the \c element() function.
-The \c element() function is passed the QName of the current XML element
-and (depending on the initial value passed to the constructor)
-one of \c trace_begin or \c trace_end.
-Note that this function is called
-only if the trace options value
-passed to the constructor
-was non-zero.
+by overriding the \c item() function.
+The \c item() function is passed the \c Item of the current XML element
+and whether the item is being entered or exited.
 
 For example,
-the \c element() function for tokenizing XHTML
+the \c item() function for tokenizing XHTML
 would be along the lines of:
 \code
-void MyTokenizer::element( Item const &qname, int trace_options ) {
-  if ( trace_options & trace_end )
-    return;
-  String const name( qname.getLocalName() );
-  if ( /* qname is an XHTML block-level element */ )
-    ++numbers().para;
+void MyTokenizer::item( Item const &item, bool entering ) {
+  if ( entering && item.isNode() && item.getNodeKind() == store::StoreConsts::elementNode ) {
+    Item qname;
+    item.getNodeName( qname );
+    if ( /* qname matches an XHTML block-level element's name */ )
+      ++numbers().para;
 }
 \endcode
 
+\subsection ft_tokenizer_properties Properties
+
+To implement a \c Tokenizer,
+you need also to implement the \c %properties() function
+that fills in the \c Properties struct where:
+
+<table>
+  <tr>
+    <td>\c comments_separate_tokens</td>
+    <td>
+      If \c true, XML comments separate tokens.  For example,
+      <code>net&lt;!-- --&gt;work</code> would be 2 tokens instead of 1.
+    </td>
+  </tr>
+  <tr>
+    <td>\c elements_separate_tokens</td>
+    <td>
+      If \c true, XML elements separate tokens.  For example,
+      <code>&lt;b&gt;B&lt;/b&gt;old</code> would be 2 tokens instead of 1.
+    </td>
+  </tr>
+  <tr>
+    <td>\c processing_instructions_separate_tokens</td>
+    <td>
+      If \c true, XML processing instructions separate tokens.  For example,
+      <code>net&lt;?PI pi?&gt;work</code> would be 2 tokens instead of 1.
+    </td>
+  </tr>
+  <tr>
+    <td>\c languages</td>
+    <td>
+      The list of languages supported by the tokenizer.
+    </td>
+  </tr>
+  <tr>
+    <td>\c uri</td>
+    <td>
+      The URI that uniquely identifies the %Tokenizer.
+    </td>
+  </tr>
+</table>
+
 \section ft_class_tokenizer_provider The TokenizerProviderClass
 
 In addition to a \c Tokenizer,
@@ -185,20 +253,51 @@
 class TokenizerProvider {
 public:
   virtual ~TokenizerProvider();
-  virtual Tokenizer::ptr getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers &numbers ) const = 0;
+  virtual bool getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers *numbers = 0, Tokenizer::ptr* = 0 ) const = 0;
 };
 \endcode
 
+Specifically, you need to implement the \c getTokenizer() function where:
+
+<table>
+  <tr>
+    <td>\c lang</td>
+    <td>The language to tokenize.</td>
+  </tr>
+  <tr>
+    <td>\c num</td>
+    <td>
+      The \c Numbers to use.
+      If \c null,
+      \a t is not set.
+    </td>
+  </tr>
+  <tr>
+    <td>\c t</td>
+    <td>
+      If not \c null,
+      set to point to a Tokenizer for \a lang.
+    </td>
+  </tr>
+</table>
+
 A simple \c TokenizerProvider for our tokenizer can be implemented as:
 
 \code
 class MyTokenizerProvider : public TokenizerProvider {
 public:
-  Tokenizer::ptr getTokenizer( locale::iso639_1::type lang ) const;
+  getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers* = 0, Tokenizer::ptr* = 0 ) const;
 };
 
-Tokenizer::ptr MyTokenizerProvider::getTokenizer( locale::iso639_1::type lang const {
-  return Tokenizer::ptr( new MyTokenizer );
+bool MyTokenizerProvider::getTokenizer( locale::iso639_1::type lang, Tokenizer::Numbers *num, Tokenizer::ptr *t ) const {
+  switch ( lang ) {
+    case iso639_1::en:
+      if ( num && t )
+        t->reset( new MyTokenizer );
+      return true;
+    default:
+      return false;
+  }
 }
 \endcode
 

=== modified file 'include/zorba/locale.h'
--- include/zorba/locale.h	2012-04-24 12:39:38 +0000
+++ include/zorba/locale.h	2012-04-24 22:19:24 +0000
@@ -22,24 +22,198 @@
 
     /////////////////////////////////////////////////////////////////////////// 
 
+    /**
+     * Defines constants for all ISO 639-1 language codes.
+     */
     namespace iso639_1 {
       enum type {
         unknown,
-        da,   // Danish
-        de,   // German
-        en,   // English
-        es,   // Spanish
-        fi,   // Finnish
-        fr,   // French
-        hu,   // Hungarian
-        it,   // Italian
-        nl,   // Dutch
-        no,   // Norwegian
-        pt,   // Portuguese
-        ro,   // Romanian
-        ru,   // Russian
-        sv,   // Swedish
-        tr,   // Turkish
+        aa,   ///< Afar
+        ab,   ///< Abkhazian
+        ae,   ///< Avestan
+        af,   ///< Afrikaans
+        ak,   ///< Akan
+        am,   ///< Amharic
+        an,   ///< Aragonese
+        ar,   ///< Arabic
+        as,   ///< Assamese
+        av,   ///< Avaric
+        ay,   ///< Aymara
+        az,   ///< Azerbaijani
+        ba,   ///< Bashkir
+        be,   ///< Byelorussian
+        bg,   ///< Bulgarian
+        bh,   ///< Bihari
+        bi,   ///< Bislama
+        bm,   ///< Bambara
+        bn,   ///< Bengali; Bangla
+        bo,   ///< Tibetan
+        br,   ///< Breton
+        bs,   ///< Bosnian
+        ca,   ///< Catalan
+        ce,   ///< Chechen
+        ch,   ///< Chamorro
+        co,   ///< Corsican
+        cr,   ///< Cree
+        cs,   ///< Czech
+        cu,   ///< Church Slavic; Church Slavonic
+        cv,   ///< Chuvash
+        cy,   ///< Welsh
+        da,   ///< Danish
+        de,   ///< German
+        dv,   ///< Divehi
+        dz,   ///< Bhutani
+        ee,   ///< Ewe
+        el,   ///< Greek
+        en,   ///< English
+        eo,   ///< Esperanto
+        es,   ///< Spanish
+        et,   ///< Estonian
+        eu,   ///< Basque
+        fa,   ///< Persian
+        ff,   ///< Fulah
+        fi,   ///< Finnish
+        fj,   ///< Fiji
+        fo,   ///< Faroese
+        fr,   ///< French
+        fy,   ///< Frisian
+        ga,   ///< Irish
+        gd,   ///< Scots Gaelic
+        gl,   ///< Galician
+        gn,   ///< Guarani
+        gu,   ///< Gujarati
+        gv,   ///< Manx
+        ha,   ///< Hausa
+        he,   ///< Hebrew (formerly iw)
+        hi,   ///< Hindi
+        ho,   ///< Hiri Motu
+        hr,   ///< Croatian
+        ht,   ///< Haitian Creole
+        hu,   ///< Hungarian
+        hy,   ///< Armenian
+        hz,   ///< Herero
+        ia,   ///< Interlingua
+        id,   ///< Indonesian (formerly in)
+        ie,   ///< Interlingue
+        ig,   ///< Igbo
+        ii,   ///< Nuosu
+        ik,   ///< Inupiak
+        io,   ///< Ido
+        is,   ///< Icelandic
+        it,   ///< Italian
+        iu,   ///< Inuktitut
+        ja,   ///< Japanese
+        jv,   ///< Javanese
+        ka,   ///< Georgian
+        kg,   ///< Kongo
+        ki,   ///< Gikuyu
+        kj,   ///< Kuanyama
+        kk,   ///< Kazakh
+        kl,   ///< Greenlandic
+        km,   ///< Cambodian
+        kn,   ///< Kannada
+        ko,   ///< Korean
+        kr,   ///< Kanuri
+        ks,   ///< Kashmiri
+        ku,   ///< Kurdish
+        kv,   ///< Komi
+        kw,   ///< Cornish
+        ky,   ///< Kirghiz
+        la,   ///< Latin
+        lb,   ///< Letzeburgesch
+        lg,   ///< Ganda
+        li,   ///< Limburgan; Limburger; Limburgish
+        ln,   ///< Lingala
+        lo,   ///< Laothian
+        lt,   ///< Lithuanian
+        lu,   ///< Luba-Katanga
+        lv,   ///< Latvian
+        mg,   ///< Malagasy
+        mh,   ///< Marshallese
+        mi,   ///< Maori
+        mk,   ///< Macedonian
+        ml,   ///< Malayalam
+        mn,   ///< Mongolian
+        mo,   ///< Moldavian
+        mr,   ///< Marathi
+        ms,   ///< Malay
+        mt,   ///< Maltese
+        my,   ///< Burmese
+        na,   ///< Nauru
+        nb,   ///< Norwegian Bokmal
+        nd,   ///< Ndebele, North
+        ne,   ///< Nepali
+        ng,   ///< Ndonga
+        nl,   ///< Dutch
+        nn,   ///< Norwegian Nynorsk
+        no,   ///< Norwegian
+        nr,   ///< Ndebele, South
+        nv,   ///< Navajo; Navaho
+        ny,   ///< Chichewa; Chewa; Nyanja
+        oc,   ///< Occitan
+        oj,   ///< Ojibwa
+        om,   ///< Oromo
+        or_,  ///< Oriya
+        os,   ///< Ossetian; Ossetic
+        pa,   ///< Panjabi; Punjabi
+        pi,   ///< Pali
+        pl,   ///< Polish
+        ps,   ///< Pashto, Pushto
+        pt,   ///< Portuguese
+        qu,   ///< Quechua
+        rm,   ///< Romansh
+        rn,   ///< Kirundi
+        ro,   ///< Romanian
+        ru,   ///< Russian
+        rw,   ///< Kinyarwanda
+        sa,   ///< Sanskrit
+        sc,   ///< Sardinian
+        sd,   ///< Sindhi
+        se,   ///< Northern Sami
+        sg,   ///< Sangho
+        sh,   ///< Serbo-Croatian
+        si,   ///< Sinhalese
+        sk,   ///< Slovak
+        sl,   ///< Slovenian
+        sm,   ///< Samoan
+        sn,   ///< Shona
+        so,   ///< Somali
+        sq,   ///< Albanian
+        sr,   ///< Serbian
+        ss,   ///< Siswati
+        st,   ///< Sesotho
+        su,   ///< Sundanese
+        sv,   ///< Swedish
+        sw,   ///< Swahili
+        ta,   ///< Tamil
+        te,   ///< Telugu
+        tg,   ///< Tajik
+        th,   ///< Thai
+        ti,   ///< Tigrinya
+        tk,   ///< Turkmen
+        tl,   ///< Tagalog
+        tn,   ///< Setswana
+        to,   ///< Tonga
+        tr,   ///< Turkish
+        ts,   ///< Tsonga
+        tt,   ///< Tatar
+        tw,   ///< Twi
+        ty,   ///< Tahitian
+        ug,   ///< Uighur
+        uk,   ///< Ukrainian
+        ur,   ///< Urdu
+        uz,   ///< Uzbek
+        ve,   ///< Venda
+        vi,   ///< Vietnamese
+        vo,   ///< Volapuk
+        wa,   ///< Walloon
+        wo,   ///< Wolof
+        xh,   ///< Xhosa
+        yi,   ///< Yiddish
+        yo,   ///< Yoruba
+        za,   ///< Zhuang
+        zh,   ///< Chinese
+        zu,   ///< Zulu
         NUM_ENTRIES
       };
     }

=== modified file 'include/zorba/pregenerated/diagnostic_list.h'
--- include/zorba/pregenerated/diagnostic_list.h	2012-04-24 12:39:38 +0000
+++ include/zorba/pregenerated/diagnostic_list.h	2012-04-24 22:19:24 +0000
@@ -454,6 +454,14 @@
 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8402_THESAURUS_ENDIANNESS_MISMATCH;
 
 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8403_THESAURUS_DATA_ERROR;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8404_STEM_LANG_NOT_SUPPORTED;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8405_STOP_WORDS_LANG_NOT_SUPPORTED;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED;
+
+extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED;
 #endif
 
 extern ZORBA_DLL_PUBLIC ZorbaErrorCode ZXQD0001_PREFIX_NOT_DECLARED;

=== modified file 'include/zorba/stemmer.h'
--- include/zorba/stemmer.h	2012-04-24 12:39:38 +0000
+++ include/zorba/stemmer.h	2012-04-24 22:19:24 +0000
@@ -52,6 +52,23 @@
   virtual void destroy() const = 0;
 
   /**
+   * Various properties of this %Stemmer.
+   */
+  struct Properties {
+    /**
+     * The URI that uniquely identifies this %Stemmer.
+     */
+    char const *uri;
+  };
+
+  /**
+   * Gets the Properties of this %Stemmer.
+   *
+   * @param result The Properties to populate.
+   */
+  virtual void properties( Properties *result ) const = 0;
+
+  /**
    * Stems the given word.
    *
    * @param word The word to stem.
@@ -66,7 +83,7 @@
 };
 
 /**
- * A %StemmerProvider, given an language, provies a stemmer for it.
+ * A %StemmerProvider, given a language, provides a Stemmer for it.
  */
 class ZORBA_DLL_PUBLIC StemmerProvider {
 public:
@@ -76,10 +93,12 @@
    * Gets a Stemmer for the given language.
    *
    * @param lang The language to get a Stemmer for.
-   * @return The relevant Stemmer or \c NULL if no stemmer for the given
-   * language is available.
+   * @param s If not \c null, set to point to a Stemmer for \a lang.
+   * @return Returns \c true only if this provider can provide a stemmer for
+   * \a lang.
    */
-  virtual Stemmer::ptr getStemmer( locale::iso639_1::type lang ) const = 0;
+  virtual bool getStemmer( locale::iso639_1::type lang,
+                           Stemmer::ptr *s = 0 ) const = 0;
 };
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'include/zorba/thesaurus.h'
--- include/zorba/thesaurus.h	2012-04-24 12:39:38 +0000
+++ include/zorba/thesaurus.h	2012-04-24 22:19:24 +0000
@@ -32,25 +32,13 @@
 ///////////////////////////////////////////////////////////////////////////////
 
 /**
- * Contains additional data for URIMappers and URLResolvers
- * when mapping/resolving a Thesaurus URI.
- */
-class ZORBA_DLL_PUBLIC ThesaurusEntityData : public EntityData {
-public:
-  /**
-   * Gets the language for which a thesaurus is being requested.
-   *
-   * @return said language.
-   */
-  virtual locale::iso639_1::type getLanguage() const = 0;
-};
-
-/**
- * A %Thesaurus is-a Resource for thesaurus implementations.
- */
-class ZORBA_DLL_PUBLIC Thesaurus : public Resource {
-public:
-  typedef std::unique_ptr<Thesaurus,internal::ztd::destroy_delete<Thesaurus> >
+ * A %Thesaurus provides a way to look up related phrases for a given phrase.
+ */
+class ZORBA_DLL_PUBLIC Thesaurus {
+public:
+  typedef std::unique_ptr<
+            Thesaurus const,internal::ztd::destroy_delete<Thesaurus const>
+          >
           ptr;
 
   /**
@@ -88,11 +76,11 @@
    * Destroys this %Thesaurus.
    * This function is called by Zorba when the %Thesaurus is no longer needed.
    *
-   * If your URLResolver dynamically allocates %Thesaurus objects, then the
+   * If your implementation dynamically allocates %Thesaurus objects, then your
    * implementation can simply be (and usually is) <code>delete this</code>.
    *
-   * If your URLResolver returns a pointer to a static %Thesaurus object, then
-   * the implementation should do nothing.
+   * If your implementation returns a pointer to a static %Thesaurus object,
+   * then your implementation should do nothing.
    */
   virtual void destroy() const = 0;
 
@@ -119,6 +107,32 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+/**
+ * A %ThesaurusProvider is-a Resource for providing thesauri for a given
+ * language.
+ */
+class ZORBA_DLL_PUBLIC ThesaurusProvider : public Resource {
+public:
+  typedef std::unique_ptr<
+            ThesaurusProvider const,
+            internal::ztd::destroy_delete<ThesaurusProvider const>
+          >
+          ptr;
+
+  /**
+   * Gets a Thesaurus for the given language.
+   *
+   * @param lang The desired language of the thesaurus.
+   * @param t If not \c null, set to point to a Thesaurus for \a lang.
+   * @return Returns \c true only if this provider can provide a thesaurus for
+   * \a lang.
+   */
+  virtual bool getThesaurus( locale::iso639_1::type lang,
+                             Thesaurus::ptr *t = 0 ) const = 0;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace zorba
 #endif /* ZORBA_NO_FULL_TEXT */
 #endif /* ZORBA_THESAURUS_API_H */

=== modified file 'include/zorba/tokenizer.h'
--- include/zorba/tokenizer.h	2012-04-24 12:39:38 +0000
+++ include/zorba/tokenizer.h	2012-04-24 22:19:24 +0000
@@ -18,6 +18,8 @@
 #ifndef ZORBA_TOKENIZER_API_H
 #define ZORBA_TOKENIZER_API_H
 
+#include <vector>
+
 #include <zorba/config.h>
 #include <zorba/locale.h>
 #include <zorba/internal/unique_ptr.h>
@@ -67,8 +69,6 @@
    * A %Callback is called once per token.
    * This is only internally by Zorba.
    * You do not need to derive from this class.
-   * The only thing you need to do is call the callback's \c operator() once
-   * for each token you parse in \c tokenize().
    */
   class Callback {
   public:
@@ -77,19 +77,75 @@
     virtual ~Callback();
 
     /**
+     * This member-function is called whenever an item that is being tokenized
+     * is entered or exited.
+     *
+     * @param item The item being entered or exited.
+     * @param entering If \c true, the item is being entered; if \c false, the
+     * item is being exited.
+     */
+    virtual void item( Item const &item, bool entering );
+
+    /**
      * This member-function is called once per token.
      *
      * @param utf8_s    The UTF-8 token string.  It is not null-terminated.
      * @param utf8_len  The number of bytes in the token string.
+     * @param lang      The language of the token.
      * @param token_no  The token number.  Token numbers start at 0.
      * @param sent_no   The sentence number.  Sentence numbers start at 1.
      * @param para_no   The paragraph number.  Paragraph numbers start at 1.
-     * @param payload   Optional user-defined data.
-     */
-    virtual void operator()( char const *utf8_s, size_type utf8_len,
-                             size_type token_no, size_type sent_no,
-                             size_type para_no, void *payload = 0 ) = 0;
-  };
+     * @param item      The Item this token is from, if any.
+     */
+    virtual void token( char const *utf8_s, size_type utf8_len,
+                        locale::iso639_1::type lang,
+                        size_type token_no, size_type sent_no,
+                        size_type para_no, Item const *item = 0 ) = 0;
+  };
+
+  /////////////////////////////////////////////////////////////////////////////
+
+  /**
+   * Various properties of this %Tokenizer.
+   */
+  struct Properties {
+    typedef std::vector<locale::iso639_1::type> languages_type;
+
+    /**
+     * If \c true, XML comments separate tokens.  For example,
+     * \c net&lt;!----&gt;work would be 2 tokens instead of 1.
+     */
+    bool comments_separate_tokens;
+
+    /**
+     * If \c true, XML elements separate tokens.  For example,
+     * \c &lt;b&gt;B&lt;/b&gt;old would be 2 tokens instead of 1.
+     */
+    bool elements_separate_tokens;
+
+    /**
+     * If \c true, XML processing instructions separate tokens.  For example,
+     * <code>net<?PI pi?>work</code> would be 2 tokens instead of 1.
+     */
+    bool processing_instructions_separate_tokens;
+
+    /**
+     * The set of languages supported.
+     */
+    languages_type languages;
+
+    /**
+     * The URI that uniquely identifies this %Tokenizer.
+     */
+    char const* uri;
+  };
+
+  /**
+   * Gets the Properties of this %Tokenizer.
+   *
+   * @param result The Properties to populate.
+   */
+  virtual void properties( Properties *result ) const = 0;
 
   /////////////////////////////////////////////////////////////////////////////
 
@@ -106,39 +162,6 @@
   virtual void destroy() const = 0;
 
   /**
-   * Trace options for XML elements combined via bitwise-or.
-   */
-  enum ElementTraceOptions {
-    trace_none  = 0x0,  ///< Trace no elements.
-    trace_begin = 0x1,  ///< Trace the beginning of elements.
-    trace_end   = 0x2   ///< Trace the ending of elements.
-  };
-
-  /**
-   * Gets the trace options.  If the value is \c trace_none, then the paragraph
-   * number will be incremented upon entering an XML element; if the value is
-   * anything other than \c trace_none, then the tokenizer assumes
-   * responsibility for incrementing the paragraph number.
-   *
-   * @return Returns said options.
-   */
-  int trace_options() const {
-    return trace_options_;
-  }
-
-  /**
-   * This function is called whenever an XML element is entered during
-   * tokenization.  Note that this function is called only if \c
-   * trace_options() returns non-zero.
-   *
-   * @param qname The element's QName.
-   * @param trace_options The bitwise-or of the trace option(s) in effect for a
-   * particular call.
-   * @see trace_options()
-   */
-  virtual void element( Item const &qname, int trace_options );
-
-  /**
    * Gets this %Tokenizer's associated Numbers.
    *
    * @return Returns said Numbers.
@@ -153,6 +176,16 @@
   Numbers const& numbers() const;
 
   /**
+   * Tokenizes the given node.
+   *
+   * @param node      The node to tokenize.
+   * @param lang      The default language to use.
+   * @param callback  The Callback to call once per token.
+   */
+  void tokenize_node( Item const &node, locale::iso639_1::type lang,
+                      Callback &callback );
+
+  /**
    * Tokenizes the given string.
    *
    * @param utf8_s    The UTF-8 string to tokenize.  It need not be
@@ -162,11 +195,11 @@
    * @param wildcards If \c true, allows XQuery wildcard syntax characters to
    *                  be part of tokens.
    * @param callback  The Callback to call once per token.
-   * @param payload   Optional user-defined data.
+   * @param item      The Item this string is from, if any.
    */
-  virtual void tokenize( char const *utf8_s, size_type utf8_len,
-                         locale::iso639_1::type lang, bool wildcards,
-                         Callback &callback, void *payload = 0 ) = 0;
+  virtual void tokenize_string( char const *utf8_s, size_type utf8_len,
+                                locale::iso639_1::type lang, bool wildcards,
+                                Callback &callback, Item const *item = 0 ) = 0;
 
   /////////////////////////////////////////////////////////////////////////////
 
@@ -175,27 +208,71 @@
    * Constructs a %Tokenizer.
    *
    * @param numbers the Numbers to use.
-   * @param trace_options The bitwise-or of the available trace options, if
-   * any.
    */
-  Tokenizer( Numbers &numbers, int trace_options = trace_none );
+  Tokenizer( Numbers &numbers );
 
   /**
    * Destroys a %Tokenizer.
    */
   virtual ~Tokenizer() = 0;
 
+  /**
+   * Given an element, finds its \c xml:lang attribute, if any, and gets its
+   * value.
+   *
+   * @param element The element to check.
+   * @param lang A pointer to where to put the found language, if any.
+   * @return Returns \c true only if an \c xml:lang attribute is found and the
+   * value is a known language.
+   */
+  bool find_lang_attribute( Item const &element, locale::iso639_1::type *lang );
+
+  /**
+   * This member-function is called whenever an item that is being tokenized is
+   * entered or exited.
+   *
+   * @param item      The item being entered or exited.
+   * @param entering  If \c true, the item is being entered; if \c false, the
+   *                  item is being exited.
+   */
+  virtual void item( Item const &item, bool entering );
+
+  /**
+   * Tokenizes the given node and all of its child nodes, if any.  For each
+   * node, it is required that this function call the item() member function of
+   * both this %Tokenizer and of the Callback twice, once each for entrance and
+   * exit.
+   *
+   * @param node          The node to tokenize.
+   * @param lang          The default language to use.
+   * @param callback      The Callback to call per token.
+   * @param tokenize_acp  If \c true, additionally tokenize all attribute,
+   *                      comment, and processing-instruction nodes encountered;
+   *                      if \c false, skip them.
+   */
+  virtual void tokenize_node_impl( Item const &node,
+                                   locale::iso639_1::type lang,
+                                   Callback &callback, bool tokenize_acp );
+
 private:
-  int trace_options_;
-  Numbers *no_;
+  Numbers *numbers_;
 };
 
+inline Tokenizer::Tokenizer( Numbers &numbers ) : numbers_( &numbers ) {
+}
+
 inline Tokenizer::Numbers& Tokenizer::numbers() {
-  return *no_;
+  return *numbers_;
 }
 
 inline Tokenizer::Numbers const& Tokenizer::numbers() const {
-  return *no_;
+  return *numbers_;
+}
+
+inline void Tokenizer::tokenize_node( Item const &item,
+                                      locale::iso639_1::type lang,
+                                      Callback &callback ) {
+  tokenize_node_impl( item, lang, callback, true );
 }
 
 ///////////////////////////////////////////////////////////////////////////////
@@ -211,11 +288,14 @@
    * Creates a new %Tokenizer.
    *
    * @param lang The language of the text that the tokenizer will tokenize.
-   * @param numbers The Numbers to use.
-   * @return Returns said %Tokenizer.
+   * @param numbers The Numbers to use.  If \c null, \a t is not set.
+   * @param t If not \c null, set to point to a Tokenizer for \a lang.
+   * @return Returns \c true only if this provider can provide a tokenizer for
+   * \a lang.
    */
-  virtual Tokenizer::ptr getTokenizer( locale::iso639_1::type lang,
-                                       Tokenizer::Numbers &numbers ) const = 0;
+  virtual bool getTokenizer( locale::iso639_1::type lang,
+                             Tokenizer::Numbers *numbers = 0,
+                             Tokenizer::ptr *t = 0 ) const = 0;
 };
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'include/zorba/uri_resolvers.h'
--- include/zorba/uri_resolvers.h	2012-04-24 12:39:38 +0000
+++ include/zorba/uri_resolvers.h	2012-04-24 22:19:24 +0000
@@ -50,7 +50,8 @@
 class ZORBA_DLL_PUBLIC Resource
 {
 public:
-  typedef std::unique_ptr<Resource,internal::ztd::destroy_delete<Resource> > ptr;
+  typedef std::unique_ptr<Resource,internal::ztd::destroy_delete<Resource> >
+          ptr;
 
   virtual ~Resource() = 0;
 
@@ -172,8 +173,8 @@
    * object itself will be discarded.
    *
    * In any case, if they create a Resource, Zorba will take memory
-   * ownership of the Resource and delete it when it is no longer
-   * needed.
+   * ownership of the Resource and delete it (by calling destroy() on it)
+   * when it is no longer needed.
    */
   virtual Resource* resolveURL(const zorba::String& aUrl,
     EntityData const* aEntityData) = 0;

=== modified file 'modules/com/zorba-xquery/www/modules/CMakeLists.txt'
--- modules/com/zorba-xquery/www/modules/CMakeLists.txt	2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/CMakeLists.txt	2012-04-24 22:19:24 +0000
@@ -72,6 +72,13 @@
 DECLARE_ZORBA_MODULE(FILE xqdoc.xq VERSION 2.0
   URI "http://www.zorba-xquery.com/modules/xqdoc";)
 
+IF(NOT ZORBA_NO_FULL_TEXT)
+  DECLARE_ZORBA_MODULE(FILE full-text.xq VERSION 2.0
+    URI "http://www.zorba-xquery.com/modules/full-text";)
+  DECLARE_ZORBA_SCHEMA(FILE full-text.xsd
+    URI "http://www.zorba-xquery.com/modules/full-text";)
+ENDIF(NOT ZORBA_NO_FULL_TEXT)
+
 # Subdirectories
 DECLARE_ZORBA_MODULE(FILE converters/base64.xq VERSION 2.0
   URI "http://www.zorba-xquery.com/modules/converters/base64";)

=== added file 'modules/com/zorba-xquery/www/modules/full-text.xq'
--- modules/com/zorba-xquery/www/modules/full-text.xq	1970-01-01 00:00:00 +0000
+++ modules/com/zorba-xquery/www/modules/full-text.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,872 @@
+xquery version "3.0";
+
+(:
+ : Copyright 2006-2011 The FLWOR Foundation.
+ :
+ : Licensed under the Apache License, Version 2.0 (the "License");
+ : you may not use this file except in compliance with the License.
+ : You may obtain a copy of the License at
+ :
+ : http://www.apache.org/licenses/LICENSE-2.0
+ :
+ : Unless required by applicable law or agreed to in writing, software
+ : distributed under the License is distributed on an "AS IS" BASIS,
+ : WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ : See the License for the specific language governing permissions and
+ : limitations under the License.
+ :)
+
+(:===========================================================================:)
+
+(:~
+ : This module provides an XQuery API to full-text functions.
+ : For general information about Zorba's implementation of the
+ : <a href="http://www.w3.org/TR/xpath-full-text-10/";>XQuery and XPath Full Text 1.0 specification</a>
+ : as well as instructions for building an installing a thesaurus,
+ : see the <a href="http://www.zorba-xquery.com/html/documentation/latest/zorba/ft_thesaurus";>Full Text Thesaurus documentation</a>.
+ : <h2>Notes on languages</h2>
+ : To refer to paricular human languages,
+ : Zorba uses both the
+ : <a href="http://en.wikipedia.org/wiki/ISO_639-1";>ISO 639-1</a>
+ : and
+ : <a href="http://en.wikipedia.org/wiki/ISO_639-2";>ISO 639-2</a>
+ : languages codes.
+ : Note that Zorba supports only a subset of the
+ : <a href="http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes";>complete list of language codes</a>
+ : and not every function supports the same subset.
+ : <p/>
+ : Most functions in this module take a language as a parameter
+ : using the
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>
+ : XML schema data type.
+ : <h2>Notes on stemming</h2>
+ : The <code>stem()</code> functions return the
+ : <a href="http://en.wikipedia.org/wiki/Word_stem";>stem</a>
+ : of a word.
+ : In Zorba,
+ : the stem of a word itself, however, is not guaranteed to be a word.
+ : It is best to consider a stem as an opaque byte sequence.
+ : All that is guaranteed about a stem is that,
+ : for a given word,
+ : the stem of that word will always be the same byte sequence.
+ : Hence,
+ : you sould never compare the result of one of the <code>stem()</code>
+ : functions against a non-stemmed string,
+ : for example:
+ : <pre>
+ :  if ( ft:stem( "apples" ) eq "apple" )             ** WRONG **
+ : </pre>
+ : Instead do:
+ : <pre>
+ :  if ( ft:stem( "apples" ) eq ft:stem( "apple" ) )  ** CORRECT **
+ : </pre>
+ : <h2>Notes on the thesaurus</h2>
+ : The <code>thesaurus-lookup()</code> functions have "levels"
+ : and "relationship" parameters.
+ : The values for these are implementation-defined.
+ : Zorba's default implementation uses the
+ : <a href="http://wordnet.princeton.edu/";>WordNet lexical database</a>,
+ : version 3.0.
+ : <p/>
+ : In WordNet,
+ : the number of "levels" that two phrases are apart
+ : are how many hierarchical meanings apart they are.
+ : For example,
+ : "canary" is 5 levels away from "vertebrate"
+ : (carary &gt; finch &gt; oscine &gt; passerine &gt; bird &gt; vertebrate).
+ : <p/>
+ : When using the WordNet implementation,
+ : Zorba supports all of the relationships (and their abbreviations)
+ : specified by
+ : <a href="http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=7776";>ISO 2788</a>
+ : and
+ : <a href="http://www.niso.org/kst/reports/standards?step=2&amp;gid=&amp;project_key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a";>ANSI/NISO Z39.19-2005</a>
+ : with the exceptions of "HN" (history note)
+ : and "X SN" (see scope note for).
+ : These relationships are:
+ :  <table>
+ :    <tr>
+ :      <th>Rel.</th>
+ :      <th>Meaning</th>
+ :      <th>WordNet Rel.</th>
+ :    </tr>
+ :    <tr>
+ :      <td>BT</td>
+ :      <td>broader term</td>
+ :      <td>hypernym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>BTG</td>
+ :      <td>broader term generic</td>
+ :      <td>hypernym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>BTI</td>
+ :      <td>broader term instance</td>
+ :      <td>instance hypernym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>BTP</td>
+ :      <td>broader term partitive</td>
+ :      <td>part meronym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>NT</td>
+ :      <td>narrower term</td>
+ :      <td>hyponym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>NTG</td>
+ :      <td>narrower term generic</td>
+ :      <td>hyponym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>NTI</td>
+ :      <td>narrower term instance</td>
+ :      <td>instance hyponym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>NTP</td>
+ :      <td>narrower term partitive</td>
+ :      <td>part holonym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>RT</td>
+ :      <td>related term</td>
+ :      <td>also see</td>
+ :    </tr>
+ :    <tr>
+ :      <td>SN</td>
+ :      <td>scope note</td>
+ :      <td>n/a</td>
+ :    </tr>
+ :    <tr>
+ :      <td>TT</td>
+ :      <td>top term</td>
+ :      <td>hypernym</td>
+ :    </tr>
+ :    <tr>
+ :      <td>UF</td>
+ :      <td>non-preferred term</td>
+ :      <td>n/a</td>
+ :    </tr>
+ :    <tr>
+ :      <td>USE</td>
+ :      <td>preferred term</td>
+ :      <td>n/a</td>
+ :    </tr>
+ :  </table>
+ : Note that you can specify relationships
+ : either by their abbreviation
+ : or their meaning.
+ : Relationships are case-insensitive.
+ :
+ : In addition to the
+ : <a href="http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=7776";>ISO 2788</a>
+ : and
+ : <a href="http://www.niso.org/kst/reports/standards?step=2&amp;gid=&amp;project_key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a";>ANSI/NISO Z39.19-2005</a>
+ : relationships,
+ : Zorba also supports all of the relationships offered by WordNet.
+ : These relationships are:
+ :  <table class="ft_rels">
+ :    <tr>
+ :      <th>Relationship</th>
+ :      <th>Meaning</th>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">also see</td>
+ :      <td>
+ :        A word that is related to another,
+ :        e.g., for "varnished" (furniture)
+ :        one should <em>also see</em> "finished."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>antonym</td>
+ :      <td>
+ :        A word opposite in meaning to another,
+ :        e.g., "light" is an <em>antonym</em> for "heavy."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>attribute</td>
+ :      <td>
+ :        A noun for which adjectives express values,
+ :        e.g., "weight" is an <em>attribute</em>
+ :        for which the adjectives "light" and "heavy"
+ :        express values.
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>cause</td>
+ :      <td>
+ :        A verb that causes another,
+ :        e.g., "show" is a <em>cause</em> of "see."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">derivationally related form</td>
+ :      <td>
+ :        A word that is derived from a root word,
+ :        e.g., "metric" is a <em>derivationally related form</em> of "meter."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">derived from adjective</td>
+ :      <td>
+ :        An adverb that is derived from an adjective,
+ :        e.g., "correctly" is <em>derived from the adjective</em> "correct."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>entailment</td>
+ :      <td>
+ :        A verb that presupposes another,
+ :        e.g., "snoring" <em>entails</em> "sleeping."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>hypernym</td>
+ :      <td>
+ :        A word with a broad meaning that more specific words fall under,
+ :        e.g., "meal" is a <em>hypernym</em> of "breakfast."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>hyponym</td>
+ :      <td>
+ :        A word of more specific meaning than a general term applicable to it,
+ :        e.g., "breakfast" is a <em>hyponym</em> of "meal."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">instance hypernym</td>
+ :      <td>
+ :        A word that denotes a category of some specific instance,
+ :        e.g., "author" is an <em>instance hypernym</em> of "Asimov."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">instance hyponym</td>
+ :      <td>
+ :        A term that donotes a specific instance of some general category,
+ :        e.g., "Asimov" is an <em>instance hyponym</em> of "author."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">member holonym</td>
+ :      <td>
+ :        A word that denotes a collection of individuals,
+ :        e.g., "faculty" is a <em>member holonym</em> of "professor."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">member meronym</td>
+ :      <td>
+ :        A word that denotes a member of a larger group,
+ :        e.g., a "person" is a <em>member meronym</em> of a "crowd."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">part holonym</td>
+ :      <td>
+ :        A word that denotes a larger whole comprised of some part,
+ :        e.g., "car" is a <em>part holonym</em> of "engine."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">part meronym</td>
+ :      <td>
+ :        A word that denotes a part of a larger whole,
+ :        e.g., an "engine" is <em>part meronym</em> of a "car."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">participle of verb</td>
+ :      <td>
+ :        An adjective that is the participle of some verb,
+ :        e.g., "breaking" is the <em>participle of the verb</em> "break."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td>pertainym</td>
+ :      <td>
+ :        An adjective that classifies its noun,
+ :        e.g., "musical" is a <em>pertainym</em> in "musical instrument."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">similar to</td>
+ :      <td>
+ :        Similar, though not necessarily interchangeable, adjectives.
+ :        For example, "shiny" is <em>similar to</em> "bright",
+ :        but they have subtle differences.
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">substance holonym</td>
+ :      <td>
+ :        A word that denotes a larger whole containing some constituent
+ :        substance, e.g., "bread" is a <em>substance holonym</em> of "flour."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">substance meronym</td>
+ :      <td>
+ :        A word that denotes a constituant substance of some larger whole,
+ :        e.g., "flour" is a <em>substance meronym</em> of "bread."
+ :      </td>
+ :    </tr>
+ :    <tr>
+ :      <td nowrap="nowrap">verb group</td>
+ :      <td>
+ :        A verb that is a member of a group of similar verbs,
+ :        e.g., "live" is in the <em>verb group</em>
+ :        of "dwell", "live", "inhabit", etc.
+ :      </td>
+ :    </tr>
+ :  </table>
+ : <h2>Notes on tokenization</h2>
+ : For general information about Zorba's implementation of tokenization,
+ : including what constitutes a token,
+ : see the <a href="http://www.zorba-xquery.com/html/documentation/latest/zorba/ft_tokenizer";>Full Text Tokenizer</a> documentation.
+ :)
+
+(:===========================================================================:)
+
+module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+import schema namespace ft-schema =
+  "http://www.zorba-xquery.com/modules/full-text";;
+
+declare namespace err = "http://www.w3.org/2005/xqt-errors";;
+declare namespace zerr = "http://www.zorba-xquery.com/errors";;
+
+declare namespace ver = "http://www.zorba-xquery.com/options/versioning";;
+declare option ver:module-version "2.0";
+
+(:===========================================================================:)
+
+(:~
+ : Predeclared constant for the Danish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-da as xs:language := xs:language("da");
+
+(:~
+ : Predeclared constant for the German
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-de as xs:language := xs:language("de");
+
+(:~
+ : Predeclared constant for the English
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-en as xs:language := xs:language("en");
+
+(:~
+ : Predeclared constant for the Spanish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-es as xs:language := xs:language("es");
+
+(:~
+ : Predeclared constant for the Finnish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-fi as xs:language := xs:language("fi");
+
+(:~
+ : Predeclared constant for the French
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-fr as xs:language := xs:language("fr");
+
+(:~
+ : Predeclared constant for the Hungarian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-hu as xs:language := xs:language("hu");
+
+(:~
+ : Predeclared constant for the Italian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-it as xs:language := xs:language("it");
+
+(:~
+ : Predeclared constant for the Dutch
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-nl as xs:language := xs:language("nl");
+
+(:~
+ : Predeclared constant for the Norwegian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-no as xs:language := xs:language("no");
+
+(:~
+ : Predeclared constant for the Portuguese
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-pt as xs:language := xs:language("pt");
+
+(:~
+ : Predeclared constant for the Romanian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-ro as xs:language := xs:language("ro");
+
+(:~
+ : Predeclared constant for the Russian
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-ru as xs:language := xs:language("ru");
+
+(:~
+ : Predeclared constant for the Swedish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-sv as xs:language := xs:language("sv");
+
+(:~
+ : Predeclared constant for the Turkish
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";><code>xs:language</code></a>.
+ :)
+declare variable $ft:lang-tr as xs:language := xs:language("tr");
+
+(:===========================================================================:)
+
+(:~
+ : Gets the current
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>:
+ : either the langauge specified by the
+ : <code><a href="http://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-FTOptionDecl";>declare ft-option using</a>
+ : <a href="http://www.w3.org/TR/xpath-full-text-10/#ftlanguageoption";>language</a></code>
+ : statement (if any)
+ : or the one returned by <code>ft:host-lang()</code> (if none).
+ :
+ : @return said language.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq
+ :)
+declare function ft:current-lang()
+  as xs:language external;
+
+(:~
+ : Gets the host's current
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>.
+ : The "host" is the computer on which Zorba is running.
+ : The host's current language is obtained as follows:
+ :  <ul>
+ :    <li>
+ :      For *nix systems:
+ :      <ol>
+ :        <li>
+ :          If <a ref="http://www.cplusplus.com/reference/clibrary/clocale/setlocale/";><code>setlocale</code>(3)</a> returns non-null,
+ :          the language corresponding to that locale is used.
+ :        </li>
+ :        <li>
+ :          Else, if the <code>LANG</code> environment variable is set,
+ :          that language is ued.
+ :        </li>
+ :        <li>
+ :          Otherwise, there is no default language.
+ :        </li>
+ :      </ol>
+ :    </li>
+ :    <li>
+ :      For Windows systems,
+ :      the language corresponding to the locale returned by the
+ :      <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/dd318101(v=vs.85).aspx"><code>GetLocaleInfo()</code></a>
+ :      function is used.
+ :    </li>
+ :  </ul>
+ :
+ : @return said language.
+ :)
+declare function ft:host-lang()
+  as xs:language external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is supported for stemming.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq
+ :)
+declare function ft:is-stem-lang-supported( $lang as xs:language )
+  as xs:boolean external;
+
+(:~
+ : Checks whether the given word is a stop-word.
+ :
+ : @param $word The word to check.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$word</code>.
+ : @return <code>true</code> only if <code>$word</code> is a stop-word.
+ : @error zerr:ZXQP8405 if <code>$lang</code> is not supported for stop-words.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq
+ :)
+declare function ft:is-stop-word( $word as xs:string, $lang as xs:language )
+  as xs:boolean external;
+
+(:~
+ : Checks whether the given word is a stop-word.
+ :
+ : @param $word The word to check.
+ : The word's <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return <code>true</code> only if <code>$word</code> is a stop-word.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8405 if <code>ft:current-lang()</code> is not supported for
+ : stop-words specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq
+ :)
+declare function ft:is-stop-word( $word as xs:string )
+  as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is supported for stop words.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq
+ :)
+declare function ft:is-stop-word-lang-supported( $lang as xs:language )
+  as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is supported for look-up using the default thesaurus.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ :)
+declare function ft:is-thesaurus-lang-supported( $lang as xs:language )
+  as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is supported for look-up using the thesaurus specified by the given URI.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq
+ :)
+declare function ft:is-thesaurus-lang-supported( $uri as xs:string,
+                                                 $lang as xs:language )
+  as xs:boolean external;
+
+(:~
+ : Checks whether the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is supported for tokenization.
+ :
+ : @param $lang The language to check.
+ : @return <code>true</code> only if the language is supported.
+ :)
+declare function ft:is-tokenizer-lang-supported( $lang as xs:language )
+  as xs:boolean external;
+
+(:~
+ : Stems the given word.
+ :
+ : @param $word The word to stem.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$word</code>.
+ : @return the stem of <code>$word</code>.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZXQP8404 if <code>$lang</code> is not supported for stemming
+ : specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq
+ :)
+declare function ft:stem( $word as xs:string, $lang as xs:language )
+  as xs:string external;
+
+(:~
+ : Stems the given word.
+ :
+ : @param $word The word to stem.
+ : The word's <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return the stem of <code>$word</code>.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8404 if <code>ft:current-lang()</code> is not supported for
+ : stemming specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq
+ :)
+declare function ft:stem( $word as xs:string )
+  as xs:string external;
+
+(:~
+ : Strips all diacritical marks from all characters.
+ :
+ : @param $string The string to strip diacritical marks from.
+ : @return <code>$string</code> with diacritical marks stripped.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq
+ :)
+declare function ft:strip-diacritics( $string as xs:string )
+  as xs:string external;
+
+(:~
+ : Looks-up the given phrase in the default thesaurus.
+ :
+ : @param $phrase The phrase to look up.
+ : The phrase's
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data.
+ : @error zerr:ZXQP8406 if <code>ft:current-lang()</code> is not supported for
+ : thesaurus look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq
+ :)
+declare function ft:thesaurus-lookup( $phrase as xs:string )
+  as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in the thesaurus specified by the given URI.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$phrase</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>$lang</code> is not supported for thesaurus
+ : look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string,
+                                      $lang as xs:language )
+  as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in a thesaurus.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : The phrase's
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one the one returned by <code>ft:current-lang()</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is unsupported in
+ : general.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>ft:current-lang()</code> is not supported for
+ : thesaurus look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string )
+  as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in a thesaurus.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$phrase</code>.
+ : @param $relationship The relationship the results are to have to
+ : <code>$phrase</code>.
+ : @return the original and related phrases.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>$lang</code> is not supported for thesaurus
+ : look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string,
+                                      $lang as xs:language,
+                                      $relationship as xs:string )
+  as xs:string+ external;
+
+(:~
+ : Looks-up the given phrase in a thesaurus.
+ :
+ : @param $uri The URI specifying the thesaurus to use.
+ : @param $phrase The phrase to look up.
+ : @param $lang The
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$phrase</code>.
+ : @param $relationship The relationship the results are to have to
+ : <code>$phrase</code>.
+ : @param $level-least The minimum number of levels within the thesaurus to be
+ : travers$ed.
+ : @param $level-most The maximum number of levels within the thesaurus to be
+ : traversed.
+ : @return the original and related phrases.
+ : @error err:FOCA0003 if either <code>$level-least</code> or
+ : <code>$level-most</code> is either negative or too large.
+ : @error err:FTST0018 if <code>$uri</code> refers to a thesaurus
+ : that is not found in the statically known thesauri.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZOSE0001 if the thesaurus data file could not be found.
+ : @error zerr:ZOSE0002 if the thesaurus data file is not a plain file.
+ : @error zerr:ZXQP8401 if the thesaurus data file's version is not supported
+ : by the currently running version of Zorba.
+ : @error zerr:ZXQP8402 if the thesaurus data file's endianness does not match
+ : that of the CPU on which Zorba is currently running.
+ : @error zerr:ZXQP8403 if there was an error reading the thesaurus data file.
+ : @error zerr:ZXQP8406 if <code>$lang</code> is not supported for thesaurus
+ : look-up specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq
+ :)
+declare function ft:thesaurus-lookup( $uri as xs:string, $phrase as xs:string,
+                                      $lang as xs:language,
+                                      $relationship as xs:string,
+                                      $level-least as xs:integer,
+                                      $level-most as xs:integer )
+  as xs:string+ external;
+
+(:~
+ : Tokenizes the given document.
+ :
+ : @param $node The node to tokenize.
+ : @param $lang The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$node</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq
+ :)
+declare function ft:tokenize( $node as node(), $lang as xs:language )
+  as element(ft-schema:token)* external;
+
+(:~
+ : Tokenizes the given document.
+ :
+ : @param $node The node to tokenize.
+ : The document's default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq
+ :)
+declare function ft:tokenize( $node as node() )
+  as element(ft-schema:token)* external;
+
+(:~
+ : Tokenizes the given string.
+ :
+ : @param $string The string to tokenize.
+ : @param $lang The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : of <code>$string</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZXQP8407 if <code>$lang</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq
+ :)
+declare function ft:tokenize-string( $string as xs:string,
+                                     $lang as xs:language )
+  as xs:string* external;
+
+(:~
+ : Tokenizes the given string.
+ :
+ : @param $string The string to tokenize.
+ : The string's default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8407 if <code>ft:current_lang()</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq
+ :)
+declare function ft:tokenize-string( $string as xs:string )
+  as xs:string* external;
+
+(:~
+ : Gets properties of the tokenizer for the given
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>.
+ :
+ : @param $lang The langauage of the tokenizer to get the properties of.
+ : @return said properties.
+ : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error zerr:ZXQP8407 if <code>$lang</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq
+ :)
+declare function ft:tokenizer-properties( $lang as xs:language )
+  as element(ft-schema:tokenizer-properties) external;
+
+(:~
+ : Gets properties of the tokenizer for the
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : returned by <code>ft:current-lang()</code>.
+ :
+ : @return said properties.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
+ : general.
+ : @error zerr:ZXQP8407 if <code>ft:current_lang()</code> is not supported for
+ : tokenization specifically.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq
+ :)
+declare function ft:tokenizer-properties()
+  as element(ft-schema:tokenizer-properties) external;
+
+(:===========================================================================:)
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'modules/com/zorba-xquery/www/modules/full-text.xsd'
--- modules/com/zorba-xquery/www/modules/full-text.xsd	1970-01-01 00:00:00 +0000
+++ modules/com/zorba-xquery/www/modules/full-text.xsd	2012-04-24 22:19:24 +0000
@@ -0,0 +1,134 @@
+<?xml version="1.0"?>
+<!--
+ ! Copyright 2006-2011 The FLWOR Foundation.
+ ! 
+ ! Licensed under the Apache License, Version 2.0 (the "License");
+ ! you may not use this file except in compliance with the License.
+ ! You may obtain a copy of the License at
+ ! 
+ ! http://www.apache.org/licenses/LICENSE-2.0
+ ! 
+ ! Unless required by applicable law or agreed to in writing, software
+ ! distributed under the License is distributed on an "AS IS" BASIS,
+ ! WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ ! See the License for the specific language governing permissions and
+ ! limitations under the License.
+-->
+
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
+  targetNamespace="http://www.zorba-xquery.com/modules/full-text";
+  xmlns="http://www.zorba-xquery.com/modules/full-text";
+  elementFormDefault="qualified"
+  attributeFormDefault="unqualified">
+
+  <!--======================================================================-->
+
+  <xs:element name="compare-options">
+    <xs:complexType>
+      <xs:attributeGroup ref="compare-attributes"/>
+    </xs:complexType>
+  </xs:element>
+
+  <xs:attributeGroup name="compare-attributes">
+    <xs:attribute name="case" type="sensitivity" default="insensitive"/>
+    <xs:attribute name="diacritics" type="sensitivity" default="insensitive"/>
+    <xs:attribute name="stem" type="yes-no-both" default="no"/>
+  </xs:attributeGroup>
+
+  <xs:simpleType name="sensitivity">
+    <xs:restriction base="xs:string">
+      <xs:enumeration value="insensitive"/>
+      <xs:enumeration value="sensitive"/>
+      <xs:enumeration value="both"/>
+    </xs:restriction>
+  </xs:simpleType>
+
+  <xs:simpleType name="yes-no-both">
+    <xs:restriction base="xs:string">
+      <xs:enumeration value="yes"/>
+      <xs:enumeration value="no"/>
+      <xs:enumeration value="both"/>
+    </xs:restriction>
+  </xs:simpleType>
+
+  <xs:complexType name="boolean-value">
+    <xs:attribute name="value" type="xs:boolean" use="required"/>
+  </xs:complexType>
+
+  <!--======================================================================-->
+
+  <xs:element name="token">
+    <xs:complexType>
+
+      <!-- The language of the token. -->
+      <xs:attribute name="lang" type="xs:language"/>
+
+      <!-- The sentence number. -->
+      <xs:attribute name="sentence" type="xs:nonNegativeInteger" use="required"/>
+
+      <!-- The paragraph number. -->
+      <xs:attribute name="paragraph" type="xs:nonNegativeInteger" use="required"/>
+
+      <!-- The token string value. -->
+      <xs:attribute name="value" type="xs:string" use="required"/>
+
+      <!--
+       ! A reference to the originating node.  If the token occurred within an
+       ! element, the reference refers to the text node.  If the token occurred
+       ! within an attribute, the reference refers to the attribute node.
+      -->
+      <xs:attribute name="node-ref" type="xs:anyURI"/>
+
+    </xs:complexType>
+  </xs:element>
+
+  <!--======================================================================-->
+
+  <xs:element name="tokenizer-properties">
+    <xs:complexType>
+      <xs:all>
+
+        <!--
+         ! If true, XML comments separate tokens.  (No example can be provided
+         ! here because it is illegal to nest an XML comment inside an XML
+         ! comment.)
+        -->
+        <xs:element name="comments-separate-tokens" type="boolean-value"/>
+
+        <!--
+         ! If true, XML elements separate tokens.  For example,
+         ! <b>B</b>old would be 2 tokens instead of 1.
+        -->
+        <xs:element name="elements-separate-tokens" type="boolean-value"/>
+
+        <!--
+         ! If true, XML processing instructions separate tokens.  For example,
+         ! net<?PI pi?>work would be 2 tokens instead of 1.
+        -->
+        <xs:element name="processing-instructions-separate-tokens" type="boolean-value"/>
+
+        <!--
+         ! The list of languages that the tokenizer can tokenize.
+        -->
+        <xs:element name="supported-languages">
+          <xs:complexType>
+            <xs:sequence>
+              <xs:element name="lang" type="xs:language" maxOccurs="unbounded"/>
+            </xs:sequence>
+          </xs:complexType>
+        </xs:element>
+
+      </xs:all>
+
+      <!--
+       !  The tokenizer's identifying URI.
+      -->
+      <xs:attribute name="uri" type="xs:anyURI"/>
+
+    </xs:complexType>
+  </xs:element>
+
+  <!--======================================================================-->
+
+</xs:schema>
+<!-- vim:set et sw=2 ts=2: -->

=== modified file 'modules/com/zorba-xquery/www/modules/http-client.xq.src/http_request_handler.cpp'
--- modules/com/zorba-xquery/www/modules/http-client.xq.src/http_request_handler.cpp	2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/http-client.xq.src/http_request_handler.cpp	2012-04-24 22:19:24 +0000
@@ -39,7 +39,6 @@
       theSerStream(NULL),
       thePost(NULL),
       theLast(NULL),
-      theLastSerializerOptions(NULL),
       theIsHeadRequest(false)
   {
     theHeaderLists.push_back(NULL);
@@ -260,6 +259,7 @@
   void HttpRequestHandler::cleanUpBody()
   {
     delete theSerStream;
+    theSerStream = 0;
     theLastBodyHadContent = false;
   }
 

=== modified file 'modules/com/zorba-xquery/www/modules/pregenerated/errors.xq'
--- modules/com/zorba-xquery/www/modules/pregenerated/errors.xq	2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/pregenerated/errors.xq	2012-04-24 22:19:24 +0000
@@ -188,6 +188,7 @@
 
 (:~
  :
+ : The thesaurus data file's endianness does not match that of the CPU.
  : 
 :)
 declare variable $zerr:ZXQP8402 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8402");
@@ -201,6 +202,22 @@
 
 (:~
 :)
+declare variable $zerr:ZXQP8404 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8404");
+
+(:~
+:)
+declare variable $zerr:ZXQP8405 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8405");
+
+(:~
+:)
+declare variable $zerr:ZXQP8406 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8406");
+
+(:~
+:)
+declare variable $zerr:ZXQP8407 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQP8407");
+
+(:~
+:)
 declare variable $zerr:ZXQD0001 as xs:QName := fn:QName($zerr:NS, "zerr:ZXQD0001");
 
 (:~

=== modified file 'modules/com/zorba-xquery/www/modules/xqdoc2xhtml/index.xq'
--- modules/com/zorba-xquery/www/modules/xqdoc2xhtml/index.xq	2012-04-24 12:39:38 +0000
+++ modules/com/zorba-xquery/www/modules/xqdoc2xhtml/index.xq	2012-04-24 22:19:24 +0000
@@ -839,9 +839,7 @@
     if(fn:matches($specLine, "Args:")) then
       let $arg_split := fn:substring-after($specLine, "-x")
       return
-      if(fn:string-length($arg_split) eq 0) then
-        fn:error($err:UE008, fn:concat("Unknown Args: in spec file for example <", $exampleSource,"> .
-        Add the example input and expected output by hand in the example, in a commentary that should also include the word 'output'."))
+      if(fn:string-length($arg_split) eq 0) then string-join($specLines, " ")
       else
         let $var_value := fn:tokenize($arg_split, "=")
         let $var_name := fn:normalize-space(fn:replace($var_value[1], ":$", ""))

=== modified file 'scripts/zt-wn-get'
--- scripts/zt-wn-get	2012-04-24 12:39:38 +0000
+++ scripts/zt-wn-get	2012-04-24 22:19:24 +0000
@@ -22,7 +22,7 @@
   echo 'Arguments: [--workdir <workdir>] [--builddir <builddir>]'
   echo '           [--thesaurusurl <thesaurusurl>]'
   echo '           <zorba_repository>'
-  echo '<zorba_repository> is the top-level SVN working copy.'
+  echo '<zorba_repository> is the top-level BZR working copy.'
   echo '<workdir> is a temp directory to download and unzip XQTS (default: /tmp).'
   echo '<builddir> is the directory Zorba has been built in'
   echo '           (default: <zorba_repository>/build)'
@@ -71,8 +71,8 @@
 echo Build dir is at $BUILD
 
 # Compile thesaurus to binary format
-mkdir -p $BUILD/test/rbkt/thesauri
-THESAURUS_DEST="$BUILD/test/rbkt/thesauri/wordnet-en.zth"
+mkdir -p $BUILD/LIB_PATH/edu/princeton/wordnet
+THESAURUS_DEST="$BUILD/LIB_PATH/edu/princeton/wordnet/wordnet-en.zth"
 echo "Compiling thesaurus to $THESAURUS_DEST..."
 untar_dir=`mktemp -d "$WORK/thesaurus.XXXXXX"`
 cd "$untar_dir"

=== modified file 'src/api/CMakeLists.txt'
--- src/api/CMakeLists.txt	2012-04-24 12:39:38 +0000
+++ src/api/CMakeLists.txt	2012-04-24 22:19:24 +0000
@@ -62,8 +62,9 @@
 IF (NOT ZORBA_NO_FULL_TEXT)
   LIST(APPEND API_SRCS
     stemmer.cpp
-    stemmer_wrapper.cpp
-    thesaurus.cpp)
+    stemmer_wrappers.cpp
+    thesaurus.cpp
+    thesaurus_wrappers.cpp)
 ENDIF (NOT ZORBA_NO_FULL_TEXT)
 
 ADD_SRC_SUBFOLDER(API_SRCS serialization API_SERIALIZATION_SRCS)

=== modified file 'src/api/staticcontextimpl.cpp'
--- src/api/staticcontextimpl.cpp	2012-04-24 12:39:38 +0000
+++ src/api/staticcontextimpl.cpp	2012-04-24 22:19:24 +0000
@@ -42,8 +42,8 @@
 #include "context/static_context.h"
 #include "context/static_context_consts.h"
 #ifndef ZORBA_NO_FULL_TEXT
-#include "context/stemmer_wrappers.h"
-#include "context/thesaurus_wrappers.h"
+#include "stemmer_wrappers.h"
+#include "thesaurus_wrappers.h"
 #endif /* ZORBA_NO_FULL_TEXT */
 #include "uri_resolver_wrappers.h"
 
@@ -65,7 +65,6 @@
 
 namespace zorba {
 
-
 /*******************************************************************************
   Create a StaticContextImpl obj as well as an internal static_context obj S.
   S is created as a child of the zorba root sctx. This constructor is used

=== renamed file 'src/api/stemmer_wrapper.cpp' => 'src/api/stemmer_wrappers.cpp'
--- src/api/stemmer_wrapper.cpp	2012-04-24 12:39:38 +0000
+++ src/api/stemmer_wrappers.cpp	2012-04-24 22:19:24 +0000
@@ -23,7 +23,7 @@
 #include "diagnostics/assert.h"
 #include "util/cxx_util.h"
 
-#include "stemmer_wrapper.h"
+#include "stemmer_wrappers.h"
 
 using namespace zorba::locale;
 
@@ -32,8 +32,8 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-StemmerWrapper::StemmerWrapper( zorba::Stemmer::ptr p ) :
-  api_stemmer_( std::move( p ) )
+StemmerWrapper::StemmerWrapper( zorba::Stemmer::ptr api_stemmer ) :
+  api_stemmer_( std::move( api_stemmer ) )
 {
   ZORBA_ASSERT( api_stemmer_.get() );
 }
@@ -42,6 +42,12 @@
   api_stemmer_.release()->destroy();
 }
 
+void StemmerWrapper::properties( Properties *props ) const {
+  zorba::Stemmer::Properties api_props;
+  api_stemmer_->properties( &api_props );
+  props->uri = api_props.uri;
+}
+
 void StemmerWrapper::stem( zstring const &word, iso639_1::type lang,
                            zstring *result ) const {
   String const api_word( Unmarshaller::newString( word ) );
@@ -52,19 +58,22 @@
 ///////////////////////////////////////////////////////////////////////////////
 
 StemmerProviderWrapper::
-StemmerProviderWrapper( zorba::StemmerProvider const *p ) :
-  api_stemmer_provider_( p )
+StemmerProviderWrapper( zorba::StemmerProvider const *api_stemmer_provider ) :
+  api_stemmer_provider_( api_stemmer_provider )
 {
   ZORBA_ASSERT( api_stemmer_provider_ );
 }
 
-Stemmer::ptr
-StemmerProviderWrapper::get_stemmer( iso639_1::type lang ) const {
-  zorba::Stemmer::ptr p( api_stemmer_provider_->getStemmer( lang ) );
-  Stemmer::ptr result;
-  if ( p.get() )
-    result.reset( new StemmerWrapper( std::move( p ) ) );
-  return std::move( result );
+bool StemmerProviderWrapper::getStemmer( iso639_1::type lang,
+                                         Stemmer::ptr *result ) const {
+  zorba::Stemmer::ptr api_ptr;
+  zorba::Stemmer::ptr *const api_ptr_ptr = result ? &api_ptr : nullptr;
+  if ( api_stemmer_provider_->getStemmer( lang, api_ptr_ptr ) ) {
+    if ( result )
+      result->reset( new StemmerWrapper( std::move( api_ptr ) ) );
+    return true;
+  }
+  return false;
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== renamed file 'src/api/stemmer_wrapper.h' => 'src/api/stemmer_wrappers.h'
--- src/api/stemmer_wrapper.h	2012-04-24 12:39:38 +0000
+++ src/api/stemmer_wrappers.h	2012-04-24 22:19:24 +0000
@@ -35,6 +35,7 @@
 
   // inherited
   void destroy() const;
+  void properties( Properties* ) const;
   void stem( zstring const &word, locale::iso639_1::type lang,
              zstring *result ) const;
 private:
@@ -50,7 +51,7 @@
   }
 
   // inherited
-  Stemmer::ptr get_stemmer( locale::iso639_1::type lang ) const;
+  bool getStemmer( locale::iso639_1::type, Stemmer::ptr* = 0 ) const;
 private:
   zorba::StemmerProvider const *const api_stemmer_provider_;
 };

=== modified file 'src/api/thesaurus.cpp'
--- src/api/thesaurus.cpp	2012-04-24 12:39:38 +0000
+++ src/api/thesaurus.cpp	2012-04-24 22:19:24 +0000
@@ -25,9 +25,11 @@
   // out-of-line since it's virtual
 }
 
-//Thesaurus::iterator::~iterator() {
-//  // out-of-line since it's virtual
-//}
+#if 0
+Thesaurus::iterator::~iterator() {
+  // out-of-line since it's virtual
+}
+#endif
 
 ///////////////////////////////////////////////////////////////////////////////
 

=== renamed file 'src/context/thesaurus_wrappers.cpp' => 'src/api/thesaurus_wrappers.cpp'
--- src/context/thesaurus_wrappers.cpp	2012-04-24 12:39:38 +0000
+++ src/api/thesaurus_wrappers.cpp	2012-04-24 22:19:24 +0000
@@ -87,6 +87,27 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+ThesaurusProviderWrapper::
+ThesaurusProviderWrapper( zorba::ThesaurusProvider const *p ) :
+  api_thesaurus_provider_( p )
+{
+  ZORBA_ASSERT( api_thesaurus_provider_ );
+}
+
+bool ThesaurusProviderWrapper::getThesaurus( iso639_1::type lang,
+                                             Thesaurus::ptr *result ) const {
+  zorba::Thesaurus::ptr api_ptr;
+  zorba::Thesaurus::ptr *const api_ptr_ptr = result ? &api_ptr : nullptr;
+  if ( api_thesaurus_provider_->getThesaurus( lang, api_ptr_ptr ) ) {
+    if ( result )
+      result->reset( new ThesaurusWrapper( std::move( api_ptr ) ) );
+    return true;
+  }
+  return false;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace internal
 } // namespace zorba
 

=== renamed file 'src/context/thesaurus_wrappers.h' => 'src/api/thesaurus_wrappers.h'
--- src/context/thesaurus_wrappers.h	2012-04-24 12:39:38 +0000
+++ src/api/thesaurus_wrappers.h	2012-04-24 22:19:24 +0000
@@ -22,6 +22,7 @@
 #ifndef ZORBA_NO_FULL_TEXT
 
 #include <zorba/thesaurus.h>
+
 #include "runtime/full_text/thesaurus.h"
 
 namespace zorba {
@@ -54,6 +55,17 @@
   zorba::Thesaurus::ptr api_thesaurus_;
 };
 
+class ThesaurusProviderWrapper : public ThesaurusProvider {
+public:
+  ThesaurusProviderWrapper( zorba::ThesaurusProvider const* );
+
+  // inherited
+  bool getThesaurus( locale::iso639_1::type, Thesaurus::ptr* ) const;
+
+private:
+  zorba::ThesaurusProvider::ptr const api_thesaurus_provider_;
+};
+
 ///////////////////////////////////////////////////////////////////////////////
 
 } // namespace internal

=== modified file 'src/api/uri_resolver_wrappers.cpp'
--- src/api/uri_resolver_wrappers.cpp	2012-04-24 12:39:38 +0000
+++ src/api/uri_resolver_wrappers.cpp	2012-04-24 22:19:24 +0000
@@ -15,24 +15,20 @@
  */
 #include "stdafx.h"
 
+#include <zorba/thesaurus.h>
+
+#include "runtime/full_text/thesaurus.h"
+
+#include "thesaurus_wrappers.h"
+#include "unmarshaller.h"
 #include "uri_resolver_wrappers.h"
 #include "uriresolverimpl.h"
-#include "unmarshaller.h"
-#include <zorba/thesaurus.h>
-#include <runtime/full_text/thesaurus.h>
-#include <context/thesaurus_wrappers.h>
 
 namespace zorba
 {
   // "Convenience" class for passing an internal EntityData object to
-  // external mappers/resolvers. This can serve as a plain EntityData or
-  // a ThesaurusEntityData. However, when there's another EntityData subclass
-  // in future, this won't work as EntityData becomes an ambiguous base class...
-#ifndef ZORBA_NO_FULL_TEXT
-  class EntityDataWrapper : public ThesaurusEntityData
-#else
+  // external mappers/resolvers.
   class EntityDataWrapper : public EntityData
-#endif /* ZORBA_NO_FULL_TEXT */
   {
   public:
     static EntityDataWrapper const* create(internal::EntityData const* aData) {
@@ -45,12 +41,7 @@
         return new EntityDataWrapper(EntityData::SCHEMA);
 #ifndef ZORBA_NO_FULL_TEXT
       case internal::EntityData::THESAURUS:
-      {
-        EntityDataWrapper* retval = new EntityDataWrapper(EntityData::THESAURUS);
-        retval->theThesaurusLang =
-            dynamic_cast<const internal::ThesaurusEntityData*>(aData)->getLanguage();
-        return retval;
-      }
+        return new EntityDataWrapper(EntityData::THESAURUS);
       case internal::EntityData::STOP_WORDS:
         return new EntityDataWrapper(EntityData::STOP_WORDS);
 #endif /* ZORBA_NO_FULL_TEXT */
@@ -67,21 +58,12 @@
       return theKind;
     }
 
-#ifndef ZORBA_NO_FULL_TEXT
-    virtual zorba::locale::iso639_1::type getLanguage() const {
-      return theThesaurusLang;
-    }
-#endif /* ZORBA_NO_FULL_TEXT */
-
   private:
     EntityDataWrapper(EntityData::Kind aKind)
       : theKind(aKind)
     {}
 
     EntityData::Kind const theKind;
-#ifndef ZORBA_NO_FULL_TEXT
-    zorba::locale::iso639_1::type theThesaurusLang;
-#endif /* ZORBA_NO_FULL_TEXT */
   };
 
   URIMapperWrapper::URIMapperWrapper(zorba::URIMapper& aUserMapper)
@@ -169,13 +151,13 @@
       }
 #ifndef ZORBA_NO_FULL_TEXT
       else {
-        Thesaurus* lUserThesaurus = dynamic_cast<Thesaurus*>(lUserPtr.get());
-        if (lUserThesaurus != NULL) {
-          // Here we pass memory ownership of the actual Thesaurus to the
-          // internal ThesaurusWrapper.
-          lRetval = new internal::ThesaurusWrapper
-              (Thesaurus::ptr(lUserThesaurus));
-          lUserPtr.release();
+        ThesaurusProvider* lUserThesaurusProvider =
+          dynamic_cast<ThesaurusProvider*>(lUserPtr.get());
+        if (lUserThesaurusProvider) {
+          // Here we pass memory ownership of the actual ThesaurusProvider to
+          // the internal ThesaurusWrapper.
+          lRetval = new internal::ThesaurusProviderWrapper
+              (lUserThesaurusProvider);
         }
         else {
           assert(false);

=== modified file 'src/api/xmldatamanagerimpl.cpp'
--- src/api/xmldatamanagerimpl.cpp	2012-04-24 12:39:38 +0000
+++ src/api/xmldatamanagerimpl.cpp	2012-04-24 22:19:24 +0000
@@ -47,7 +47,7 @@
 #include "runtime/util/flowctl_exception.h"
 
 #ifndef ZORBA_NO_FULL_TEXT
-#include "stemmer_wrapper.h"
+#include "stemmer_wrappers.h"
 #endif /* ZORBA_NO_FULL_TEXT */
 
 namespace zorba {

=== modified file 'src/api/xmldatamanagerimpl.h'
--- src/api/xmldatamanagerimpl.h	2012-04-24 12:39:38 +0000
+++ src/api/xmldatamanagerimpl.h	2012-04-24 22:19:24 +0000
@@ -27,7 +27,7 @@
 #include "util/singleton.h"
 
 #ifndef ZORBA_NO_FULL_TEXT
-#include "stemmer_wrapper.h"
+#include "stemmer_wrappers.h"
 #endif /* ZORBA_NO_FULL_TEXT */
 
 namespace zorba {

=== modified file 'src/compiler/codegen/plan_visitor.cpp'
--- src/compiler/codegen/plan_visitor.cpp	2012-04-24 12:39:38 +0000
+++ src/compiler/codegen/plan_visitor.cpp	2012-04-24 22:19:24 +0000
@@ -250,7 +250,7 @@
 class plan_ftnode_visitor : public ftnode_visitor 
 {
 public:
-  typedef std::list<PlanIter_t> PlanIter_list_t;
+  typedef std::vector<PlanIter_t> PlanIter_list_t;
 
   plan_ftnode_visitor( plan_visitor* v ) : plan_visitor_( v ) { }
 

=== modified file 'src/compiler/expression/expr_put.cpp'
--- src/compiler/expression/expr_put.cpp	2012-04-24 12:39:38 +0000
+++ src/compiler/expression/expr_put.cpp	2012-04-24 22:19:24 +0000
@@ -41,6 +41,7 @@
 #include "compiler/expression/function_item_expr.h"
 #include "compiler/parser/parse_constants.h"
 
+#include "diagnostics/assert.h"
 #include "functions/function.h"
 #include "functions/udf.h"
 

=== modified file 'src/compiler/translator/translator.cpp'
--- src/compiler/translator/translator.cpp	2012-04-24 12:39:38 +0000
+++ src/compiler/translator/translator.cpp	2012-04-24 22:19:24 +0000
@@ -68,6 +68,7 @@
 #include "functions/signature.h"
 #include "functions/udf.h"
 #include "functions/external_function.h"
+#include "functions/func_ft_module.h"
 
 #include "annotations/annotations.h"
 
@@ -859,7 +860,7 @@
 {
   ZORBA_ASSERT(count >= 0);
 
-  ftnode *n = NULL;
+  ftnode *n = nullptr;
   while ( count-- > 0 )
   {
     ZORBA_FATAL( !theFTNodeStack.empty(), "" );
@@ -3294,6 +3295,41 @@
                                     qnameItem->getLocalName())));
         }
 
+#ifndef ZORBA_NO_FULL_TEXT
+        if (qnameItem->getNamespace() == static_context::ZORBA_FULL_TEXT_FN_NS &&
+            (qnameItem->getLocalName() == "tokenizer-properties" ||
+             qnameItem->getLocalName() == "tokenize"))
+        {
+          FunctionConsts::FunctionKind kind;
+
+          if (qnameItem->getLocalName() == "tokenizer-properties")
+          {
+            assert(numParams <= 1);
+
+            if (numParams == 1)
+              kind = FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1;
+            else
+              kind = FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_0;
+
+            f = new full_text_tokenizer_properties(f->getSignature(), kind);
+          }
+          else 
+          {
+            assert(numParams == 1 || numParams == 2);
+
+            if (numParams == 2)
+              kind = FunctionConsts::FULL_TEXT_TOKENIZE_2;
+            else
+              kind = FunctionConsts::FULL_TEXT_TOKENIZE_1;
+
+            f = new full_text_tokenize(f->getSignature(), kind);
+          }
+
+          f->setStaticContext(theRootSctx);
+          bind_fn(f, numParams, loc);
+        }
+#endif /* ZORBA_NO_FULL_TEXT */
+
         f->setAnnotations(theAnnotations);
         theAnnotations = NULL; // important to reset
 
@@ -12512,7 +12548,7 @@
 {
   TRACE_VISIT ();
 #ifndef ZORBA_NO_FULL_TEXT
-  push_ftstack( NULL ); // sentinel
+  push_ftstack( nullptr ); // sentinel
 #endif /* ZORBA_NO_FULL_TEXT */
   return no_state;
 }
@@ -12756,7 +12792,7 @@
 void *begin_visit (const FTMildNot& v) {
   TRACE_VISIT ();
 #ifndef ZORBA_NO_FULL_TEXT
-  push_ftstack( NULL ); // sentinel
+  push_ftstack( nullptr ); // sentinel
 #endif /* ZORBA_NO_FULL_TEXT */
   return no_state;
 }
@@ -12799,7 +12835,7 @@
 void *begin_visit (const FTOr& v) {
   TRACE_VISIT ();
 #ifndef ZORBA_NO_FULL_TEXT
-  push_ftstack( NULL ); // sentinel
+  push_ftstack( nullptr ); // sentinel
 #endif /* ZORBA_NO_FULL_TEXT */
   return no_state;
 }
@@ -13058,7 +13094,7 @@
     levels = dynamic_cast<ftrange*>( pop_ftstack() );
     ZORBA_ASSERT( levels );
   } else
-    levels = NULL;
+    levels = nullptr;
 
   ftthesaurus_id *const tid = new ftthesaurus_id(
     loc, v.get_uri(), v.get_relationship(), levels
@@ -13070,7 +13106,7 @@
 void *begin_visit (const FTThesaurusOption& v) {
   TRACE_VISIT ();
 #ifndef ZORBA_NO_FULL_TEXT
-  push_ftstack( NULL ); // sentinel
+  push_ftstack( nullptr ); // sentinel
 #endif /* ZORBA_NO_FULL_TEXT */
   return no_state;
 }
@@ -13078,10 +13114,8 @@
 void end_visit (const FTThesaurusOption& v, void* /*visit_state*/) {
   TRACE_VISIT_OUT ();
 #ifndef ZORBA_NO_FULL_TEXT
-  ftthesaurus_id *default_tid = NULL;
-  if ( v.includes_default() ) {
-    default_tid = new ftthesaurus_id( loc, "##default" );
-  }
+  ftthesaurus_id *const default_tid = v.includes_default() ?
+    new ftthesaurus_id( loc, "##default" ) : nullptr;
 
   ftthesaurus_option::thesaurus_id_list_t list;
   while ( true ) {

=== modified file 'src/context/CMakeLists.txt'
--- src/context/CMakeLists.txt	2012-04-24 12:39:38 +0000
+++ src/context/CMakeLists.txt	2012-04-24 22:19:24 +0000
@@ -32,11 +32,6 @@
     features.cpp
     )
 
-IF (NOT ZORBA_NO_FULL_TEXT)
-  LIST(APPEND CONTEXT_SRCS
-    thesaurus_wrappers.cpp)
-ENDIF (NOT ZORBA_NO_FULL_TEXT)
-
 SET(CONTEXT_BUILD_SRCS
   ${CMAKE_CURRENT_BINARY_DIR}/context/root_static_context_init.cpp
   )

=== modified file 'src/context/default_url_resolvers.cpp'
--- src/context/default_url_resolvers.cpp	2012-04-24 12:39:38 +0000
+++ src/context/default_url_resolvers.cpp	2012-04-24 22:19:24 +0000
@@ -17,6 +17,7 @@
 
 
 #include "context/default_url_resolvers.h"
+#include "util/cxx_util.h"
 #include "util/uri_util.h"
 #include "util/http_util.h"
 #include "util/fs_util.h"
@@ -41,8 +42,15 @@
 HTTPURLResolver::resolveURL
 (zstring const& aUrl, EntityData const* aEntityData)
 {
-  if (aEntityData->getKind() == EntityData::COLLECTION)
-    return NULL;
+  switch ( aEntityData->getKind() ) {
+    case EntityData::COLLECTION:
+#ifndef ZORBA_NO_FULL_TEXT
+    case EntityData::THESAURUS:
+#endif /* ZORBA_NO_FULL_TEXT */
+      return nullptr;
+    default:
+      break;
+  }
 
   uri::scheme lScheme = uri::get_scheme(aUrl);
   switch (lScheme) {
@@ -82,8 +90,15 @@
 FileURLResolver::resolveURL
 (zstring const& aUrl, EntityData const* aEntityData)
 {
-  if (aEntityData->getKind() == EntityData::COLLECTION)
-    return NULL;
+  switch ( aEntityData->getKind() ) {
+    case EntityData::COLLECTION:
+#ifndef ZORBA_NO_FULL_TEXT
+    case EntityData::THESAURUS:
+#endif /* ZORBA_NO_FULL_TEXT */
+      return nullptr;
+    default:
+      break;
+  }
 
   uri::scheme lScheme = uri::get_scheme(aUrl);
   if (lScheme != uri::file) {
@@ -111,7 +126,6 @@
 {
   if (aEntityData->getKind() != EntityData::COLLECTION)
     return NULL;
-
   store::Item_t lName;
   GENV_STORE.getItemFactory()->createQName(lName, aUrl.c_str(), "", "zorba-internal-name-for-w3c-collections");
   store::Collection_t lColl = GENV_STORE.getCollection(lName.getp(), true);

=== modified file 'src/context/static_context.cpp'
--- src/context/static_context.cpp	2012-04-24 12:39:38 +0000
+++ src/context/static_context.cpp	2012-04-24 22:19:24 +0000
@@ -378,11 +378,16 @@
 static_context::ZORBA_XML_FN_NS =
 "http://www.zorba-xquery.com/modules/xml";;
 
+#ifndef ZORBA_NO_FULL_TEXT
+const char*
+static_context::ZORBA_FULL_TEXT_FN_NS =
+"http://www.zorba-xquery.com/modules/full-text";;
+#endif /* ZORBA_NO_FULL_TEXT */
+
 const char*
 static_context::ZORBA_XML_FN_OPTIONS_NS =
 "http://www.zorba-xquery.com/modules/xml-options";;
 
-
 /***************************************************************************//**
   Target namespaces of zorba reserved modules
 ********************************************************************************/
@@ -451,8 +456,11 @@
             ns == ZORBA_JSON_FN_NS ||
             ns == ZORBA_FETCH_FN_NS ||
             ns == ZORBA_NODE_FN_NS ||
+#ifndef ZORBA_NO_FULL_TEXT
+            ns == ZORBA_FULL_TEXT_FN_NS ||
+#endif /* ZORBA_NO_FULL_TEXT */
             ns == ZORBA_XML_FN_NS);
-  }
+  } 
   else if (ns == W3C_FN_NS || ns == XQUERY_MATH_FN_NS)
   {
     return true;
@@ -1585,7 +1593,7 @@
     std::auto_ptr<internal::Resource>& oResource,
     zstring& oErrorMessage) const
 {
-  oErrorMessage = "";
+  oErrorMessage.clear();
 
   // Iterate through all candidate URLs...
   for (std::vector<zstring>::iterator url = aUrls.begin();
@@ -1621,7 +1629,7 @@
         }
         catch (const std::exception& e)
         {
-          if (oErrorMessage == "")
+          if (oErrorMessage.empty()) 
           {
             // Really no point in saving anything more than the first message
             oErrorMessage = e.what();

=== modified file 'src/context/static_context.h'
--- src/context/static_context.h	2012-04-24 12:39:38 +0000
+++ src/context/static_context.h	2012-04-24 22:19:24 +0000
@@ -471,6 +471,9 @@
   static const char* ZORBA_FETCH_FN_NS;
   static const char* ZORBA_NODE_FN_NS;
   static const char* ZORBA_XML_FN_NS;
+#ifndef ZORBA_NO_FULL_TEXT
+  static const char* ZORBA_FULL_TEXT_FN_NS;
+#endif /* ZORBA_NO_FULL_TEXT */
   static const char* ZORBA_XML_FN_OPTIONS_NS;
 
   // Namespaces of virtual modules declaring zorba builtin functions

=== removed file 'src/context/stemmer_wrappers.cpp'
--- src/context/stemmer_wrappers.cpp	2012-04-24 12:39:38 +0000
+++ src/context/stemmer_wrappers.cpp	1970-01-01 00:00:00 +0000
@@ -1,74 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#include "stdafx.h"
-
-#include <zorba/config.h>
-
-#ifndef ZORBA_NO_FULL_TEXT
-
-#include "api/unmarshaller.h"
-#include "diagnostics/assert.h"
-#include "util/cxx_util.h"
-
-#include "stemmer_wrappers.h"
-
-using namespace zorba::locale;
-
-namespace zorba {
-namespace internal {
-
-///////////////////////////////////////////////////////////////////////////////
-
-StemmerWrapper::StemmerWrapper( zorba::Stemmer const *s ) :
-  api_stemmer_( s )
-{
-  ZORBA_ASSERT( api_stemmer_ );
-}
-
-void StemmerWrapper::stem( zstring const &word, iso639_1::type lang,
-                           zstring *result ) const {
-  String const api_word( Unmarshaller::newString( word ) );
-  String api_result( Unmarshaller::newString( *result ) );
-  api_stemmer_->stem( api_word, lang, &api_result );
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
-StemmerProviderWrapper::
-StemmerProviderWrapper( zorba::StemmerProvider const *p ) :
-  api_stemmer_provider_( p )
-{
-  ZORBA_ASSERT( api_stemmer_provider_ );
-}
-
-Stemmer const*
-StemmerProviderWrapper::get_stemmer( iso639_1::type lang ) const {
-  zorba::Stemmer const *const s = api_stemmer_provider_->getStemmer( lang );
-  return s ? new StemmerWrapper( s ) : nullptr;
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
-} // namespace internal
-} // namespace zorba
-
-#endif /* ZORBA_NO_FULL_TEXT */
-/*
- * Local variables:
- * mode: c++
- * End:
- */
-/* vim:set et sw=2 ts=2: */

=== removed file 'src/context/stemmer_wrappers.h'
--- src/context/stemmer_wrappers.h	2012-04-24 12:39:38 +0000
+++ src/context/stemmer_wrappers.h	1970-01-01 00:00:00 +0000
@@ -1,63 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-#pragma once
-#ifndef ZORBA_STEMMER_WRAPPERS_H
-#define ZORBA_STEMMER_WRAPPERS_H
-
-#include <zorba/config.h>
-
-#if 0
-#ifndef ZORBA_NO_FULL_TEXT
-
-#include <zorba/stemmer.h>
-#include "zorbautils/stemmer.h"
-
-namespace zorba {
-namespace internal {
-
-///////////////////////////////////////////////////////////////////////////////
-
-class StemmerWrapper : public Stemmer {
-public:
-  StemmerWrapper( zorba::Stemmer const *api_stemmer );
-  void stem( zstring const &word, locale::iso639_1::type lang,
-             zstring *result ) const;
-private:
-  zorba::Stemmer const *const api_stemmer_;
-};
-
-class StemmerProviderWrapper : public StemmerProvider {
-public:
-  StemmerProviderWrapper( zorba::StemmerProvider const *p );
-  Stemmer const* get_stemmer( locale::iso639_1::type lang ) const;
-private:
-  zorba::StemmerProvider const *const api_stemmer_provider_;
-};
-
-///////////////////////////////////////////////////////////////////////////////
-
-} // namespace internal
-} // namespace zorba
-
-#endif /* ZORBA_NO_FULL_TEXT */
-#endif
-#endif /* ZORBA_STEMMER_WRAPPERS_H */
-/*
- * Local variables:
- * mode: c++
- * End:
- */
-/* vim:set et sw=2 ts=2: */

=== modified file 'src/context/uri_resolver.cpp'
--- src/context/uri_resolver.cpp	2012-04-24 12:39:38 +0000
+++ src/context/uri_resolver.cpp	2012-04-24 22:19:24 +0000
@@ -117,19 +117,6 @@
   {
   }
 
-#ifndef ZORBA_NO_FULL_TEXT
-  ThesaurusEntityData::ThesaurusEntityData(locale::iso639_1::type aLang)
-    : EntityData(EntityData::THESAURUS),
-      theLang(aLang)
-  {
-  }
-
-  locale::iso639_1::type ThesaurusEntityData::getLanguage() const
-  {
-    return theLang;
-  }
-#endif /* ZORBA_NO_FULL_TEXT */
-
 /*************
  * URIMapper is an abstract class, but we have to define its vtbl and
  * base destructor somewhere.

=== modified file 'src/context/uri_resolver.h'
--- src/context/uri_resolver.h	2012-04-24 12:39:38 +0000
+++ src/context/uri_resolver.h	2012-04-24 22:19:24 +0000
@@ -55,21 +55,21 @@
   /**
    * @brief Return the URL used to load this Resource.
    */
-  zstring getUrl() { return theUrl; }
+  zstring const& getUrl() const { return theUrl; }
 
   virtual ~Resource() = 0;
 
-  protected:
+protected:
 
   Resource();
 
-  private:
+private:
 
   /**
    * Used by static_context to populate the URL.
    */
+  void setUrl(zstring const &aUrl) { theUrl = aUrl; }
   friend class zorba::static_context;
-  void setUrl(zstring aUrl) { theUrl = aUrl; }
 
   zstring theUrl;
 };
@@ -193,25 +193,6 @@
   Kind const theKind;
 };
 
-#ifndef ZORBA_NO_FULL_TEXT
-/**
- * @brief The class containing additional data for URIMappers and URLResolvers
- * when mapping/resolving a Thesaurus URI.
- */
-class ThesaurusEntityData : public EntityData
-{
-public:
-  ThesaurusEntityData(locale::iso639_1::type aLang);
-  /**
-   * @brief Return the language for which a thesaurus is being requested.
-   */
-  virtual locale::iso639_1::type getLanguage() const;
-
-private:
-  locale::iso639_1::type const theLang;
-};
-#endif /* ZORBA_NO_FULL_TEXT */
-
 /**
  * @brief Interface for URL resolving.
  *

=== modified file 'src/diagnostics/assert.cpp'
--- src/diagnostics/assert.cpp	2012-04-24 12:39:38 +0000
+++ src/diagnostics/assert.cpp	2012-04-24 22:19:24 +0000
@@ -68,7 +68,7 @@
     file, 
     line, 
     zerr::ZXQP0002_ASSERT_FAILED, 
-    ( msg ? ERROR_PARAMS( condition, msg ) : ERROR_PARAMS( condition ))
+    ( msg ? ERROR_PARAMS( condition, msg ) : ERROR_PARAMS( condition ) )
   );
 }
 

=== modified file 'src/diagnostics/assert.h'
--- src/diagnostics/assert.h	2012-04-24 12:39:38 +0000
+++ src/diagnostics/assert.h	2012-04-24 22:19:24 +0000
@@ -20,6 +20,10 @@
 #ifndef ZORBA_ASSERT_H
 #define ZORBA_ASSERT_H
 
+#include <sstream>
+
+#include "util/cxx_util.h"
+
 namespace zorba {
 
 /**
@@ -35,7 +39,7 @@
 void assertion_failed( char const *condition,
                        char const *file, 
                        int line, 
-                       char const *msg = 0);
+                       char const *msg = nullptr );
 
 /**
  * Zorba version of the standard assert(3) macro.

=== modified file 'src/diagnostics/diagnostic_en.xml'
--- src/diagnostics/diagnostic_en.xml	2012-04-24 12:39:38 +0000
+++ src/diagnostics/diagnostic_en.xml	2012-04-24 22:19:24 +0000
@@ -1746,7 +1746,7 @@
     <diagnostic code="ZXQP8401" name="THESAURUS_VERSION_MISMATCH"
       if="!defined(ZORBA_NO_FULL_TEXT)">
       <comment>
-       The version of the thesaurus is not the expected version.
+        The version of the thesaurus is not the expected version.
       </comment>
      <value>"$1": wrong WordNet file version; should be "$2"</value>
     </diagnostic>
@@ -1754,19 +1754,39 @@
     <diagnostic code="ZXQP8402" name="THESAURUS_ENDIANNESS_MISMATCH"
       if="!defined(ZORBA_NO_FULL_TEXT)">
       <comment>
+        The thesaurus data file's endianness does not match that of the CPU.
       </comment>
      <value>thesaurus data endianness does not match CPU</value>
-       The thesaurus data file's endianness does not match that of the CPU.
     </diagnostic>
 
     <diagnostic code="ZXQP8403" name="THESAURUS_DATA_ERROR"
       if="!defined(ZORBA_NO_FULL_TEXT)">
       <comment>
-       The thesaurus data contains an unexpected value.
+        The thesaurus data contains an unexpected value.
       </comment>
      <value>thesaurus data error${: 1}</value>
     </diagnostic>
 
+    <diagnostic code="ZXQP8404" name="STEM_LANG_NOT_SUPPORTED"
+      if="!defined(ZORBA_NO_FULL_TEXT)">
+      <value>"$1": langauge not supported for stemming</value>
+    </diagnostic>
+
+    <diagnostic code="ZXQP8405" name="STOP_WORDS_LANG_NOT_SUPPORTED"
+      if="!defined(ZORBA_NO_FULL_TEXT)">
+      <value>"$1": langauge not supported for stop-words</value>
+    </diagnostic>
+
+    <diagnostic code="ZXQP8406" name="THESAURUS_LANG_NOT_SUPPORTED"
+      if="!defined(ZORBA_NO_FULL_TEXT)">
+      <value>"$1": langauge not supported for thesaurus</value>
+    </diagnostic>
+
+    <diagnostic code="ZXQP8407" name="TOKENIZER_LANG_NOT_SUPPORTED"
+      if="!defined(ZORBA_NO_FULL_TEXT)">
+      <value>"$1": langauge not supported for tokenizer</value>
+    </diagnostic>
+
     <diagnostic code="ZXQD0001" name="PREFIX_NOT_DECLARED">
       <value>"$1": prefix not declared when calling function "$2" from $3</value>
     </diagnostic>

=== modified file 'src/diagnostics/pregenerated/diagnostic_list.cpp'
--- src/diagnostics/pregenerated/diagnostic_list.cpp	2012-04-24 12:39:38 +0000
+++ src/diagnostics/pregenerated/diagnostic_list.cpp	2012-04-24 22:19:24 +0000
@@ -660,6 +660,18 @@
 
 
 ZorbaErrorCode ZXQP8403_THESAURUS_DATA_ERROR( "ZXQP8403" );
+
+
+ZorbaErrorCode ZXQP8404_STEM_LANG_NOT_SUPPORTED( "ZXQP8404" );
+
+
+ZorbaErrorCode ZXQP8405_STOP_WORDS_LANG_NOT_SUPPORTED( "ZXQP8405" );
+
+
+ZorbaErrorCode ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED( "ZXQP8406" );
+
+
+ZorbaErrorCode ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED( "ZXQP8407" );
 #endif
 
 

=== modified file 'src/diagnostics/pregenerated/dict_en.cpp'
--- src/diagnostics/pregenerated/dict_en.cpp	2012-04-24 12:39:38 +0000
+++ src/diagnostics/pregenerated/dict_en.cpp	2012-04-24 22:19:24 +0000
@@ -434,6 +434,18 @@
 #if !defined(ZORBA_NO_FULL_TEXT)
   { "ZXQP8403", "thesaurus data error${: 1}" },
 #endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+  { "ZXQP8404", "\"$1\": langauge not supported for stemming" },
+#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+  { "ZXQP8405", "\"$1\": langauge not supported for stop-words" },
+#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+  { "ZXQP8406", "\"$1\": langauge not supported for thesaurus" },
+#endif
+#if !defined(ZORBA_NO_FULL_TEXT)
+  { "ZXQP8407", "\"$1\": langauge not supported for tokenizer" },
+#endif
   { "~AllMatchesHasExcludes", "AllMatches contains StringExclude" },
   { "~AlreadySpecified", "already specified" },
   { "~ArithOpNotDefinedBetween_23", "arithmetic operation not defined between types \"$2\" and \"$3\"" },

=== modified file 'src/functions/CMakeLists.txt'
--- src/functions/CMakeLists.txt	2012-04-24 12:39:38 +0000
+++ src/functions/CMakeLists.txt	2012-04-24 22:19:24 +0000
@@ -83,3 +83,7 @@
     func_apply.cpp
     func_serialize_impl.cpp
 )
+
+IF (NOT ZORBA_NO_FULL_TEXT)
+  LIST(APPEND FUNCTIONS_SRCS func_ft_module_impl.cpp)
+ENDIF (NOT ZORBA_NO_FULL_TEXT)

=== modified file 'src/functions/external_function.cpp'
--- src/functions/external_function.cpp	2012-04-24 12:39:38 +0000
+++ src/functions/external_function.cpp	2012-04-24 22:19:24 +0000
@@ -45,12 +45,12 @@
   :
   function(sig, FunctionConsts::FN_UNKNOWN),
   theLoc(loc),
-  theModuleSctx(modSctx),
   theNamespace(ns),
   theScriptingKind(scriptingType),
   theImpl(impl)
 {
   resetFlag(FunctionConsts::isBuiltin);
+  theModuleSctx = modSctx;
 }
 
 
@@ -62,7 +62,6 @@
   zorba::serialization::serialize_baseclass(ar, (function*)this);
 
   ar & theLoc;
-  ar & theModuleSctx;
   ar & theNamespace;
   ar & theScriptingKind;
 

=== added file 'src/functions/func_ft_module_impl.cpp'
--- src/functions/func_ft_module_impl.cpp	1970-01-01 00:00:00 +0000
+++ src/functions/func_ft_module_impl.cpp	2012-04-24 22:19:24 +0000
@@ -0,0 +1,110 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "stdafx.h"
+
+#include "functions/func_ft_module.h"
+
+#include "runtime/full_text/ft_module.h"
+
+#define FT_MODULE_NS "http://www.zorba-xquery.com/modules/full-text";
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+void populate_context_ft_module_impl( static_context *sctx ) {
+
+  xqtref_t tokenize_return_type =
+    GENV_TYPESYSTEM.create_node_type(
+      store::StoreConsts::elementNode,
+      createQName( FT_MODULE_NS, "", "token" ),
+      NULL,
+      TypeConstants::QUANT_STAR,
+      false,
+      false
+    );
+  {
+    DECL_WITH_KIND( sctx, full_text_tokenize,
+      (createQName( FT_MODULE_NS, "", "tokenize"),
+        GENV_TYPESYSTEM.ANY_NODE_TYPE_ONE,
+        tokenize_return_type),
+      FunctionConsts::FULL_TEXT_TOKENIZE_1
+    );
+  }
+  {
+    DECL_WITH_KIND( sctx, full_text_tokenize,
+      (createQName( FT_MODULE_NS, "", "tokenize"),
+        GENV_TYPESYSTEM.ANY_NODE_TYPE_ONE,
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+        tokenize_return_type),
+      FunctionConsts::FULL_TEXT_TOKENIZE_2
+    );
+  }
+
+  xqtref_t tokenizer_properties_return_type =
+    GENV_TYPESYSTEM.create_node_type(
+      store::StoreConsts::elementNode,
+      createQName( FT_MODULE_NS, "", "tokenizer-properties" ),
+      NULL,
+      TypeConstants::QUANT_ONE,
+      false,
+      false
+    );
+  {
+    DECL_WITH_KIND( sctx, full_text_tokenizer_properties,
+      (createQName( FT_MODULE_NS, "", "tokenizer-properties"),
+        tokenizer_properties_return_type),
+      FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_0
+    );
+  }
+  {
+    DECL_WITH_KIND( sctx, full_text_tokenizer_properties,
+      (createQName( FT_MODULE_NS, "", "tokenizer-properties"),
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+        tokenizer_properties_return_type),
+      FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1
+    );
+  }
+
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+PlanIter_t full_text_tokenizer_properties::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new TokenizerPropertiesIterator(theModuleSctx, loc, argv);
+}
+
+
+PlanIter_t full_text_tokenize::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new TokenizeIterator(theModuleSctx, loc, argv);
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */

=== modified file 'src/functions/function.cpp'
--- src/functions/function.cpp	2012-04-24 12:39:38 +0000
+++ src/functions/function.cpp	2012-04-24 22:19:24 +0000
@@ -43,6 +43,7 @@
   theSignature(sig),
   theKind(kind),
   theFlags(0),
+  theModuleSctx(NULL),
   theXQueryVersion(StaticContextConsts::xquery_version_1_0)
 {
   setFlag(FunctionConsts::isBuiltin);
@@ -70,6 +71,7 @@
   SERIALIZE_ENUM(FunctionConsts::FunctionKind, theKind);
   ar & theFlags;
   ar & theAnnotationList;
+  ar & theModuleSctx;
   SERIALIZE_ENUM(StaticContextConsts::xquery_version_t, theXQueryVersion);
 }
 
@@ -92,6 +94,7 @@
   return n == VARIADIC_SIG_SIZE || argv.size() == n;
 }
 
+
 /*******************************************************************************
 
 ********************************************************************************/

=== modified file 'src/functions/function.h'
--- src/functions/function.h	2012-04-24 12:39:38 +0000
+++ src/functions/function.h	2012-04-24 22:19:24 +0000
@@ -42,7 +42,10 @@
 
 
 /*******************************************************************************
-
+  theModuleContext:
+  -----------------
+  The root sctx of the module containing the declaration. It is NULL for 
+  functions that must be executed in the static context of the caller.
 ********************************************************************************/
 class function : public SimpleRCObject
 {
@@ -51,6 +54,7 @@
   FunctionConsts::FunctionKind theKind;
   uint32_t                     theFlags;
   AnnotationList_t             theAnnotationList;
+  static_context             * theModuleSctx;
 
   StaticContextConsts::xquery_version_t theXQueryVersion;
 
@@ -89,6 +93,10 @@
 
   bool isVariadic() const { return theSignature.isVariadic(); }
 
+  static_context* getStaticContext() const { return theModuleSctx; }
+
+  void setStaticContext(static_context* sctx) { theModuleSctx = sctx; }
+
   void setFlag(FunctionConsts::AnnotationFlags flag)
   {
     theFlags |= flag;

=== modified file 'src/functions/library.cpp'
--- src/functions/library.cpp	2012-04-24 12:39:38 +0000
+++ src/functions/library.cpp	2012-04-24 22:19:24 +0000
@@ -68,6 +68,10 @@
 #include "functions/func_reflection.h"
 #include "functions/func_apply.h"
 #include "functions/func_fetch.h"
+#ifndef ZORBA_NO_FULL_TEXT
+#include "functions/func_ft_module.h"
+#include "runtime/full_text/ft_module_impl.h"
+#endif /* ZORBA_NO_FULL_TEXT */
 
 #include "functions/func_function_item_iter.h"
 
@@ -144,6 +148,10 @@
   populate_context_apply(sctx);
 
   populate_context_fetch(sctx);
+#ifndef ZORBA_NO_FULL_TEXT
+  populate_context_ft_module(sctx);
+  populate_context_ft_module_impl(sctx);
+#endif /* ZORBA_NO_FULL_TEXT */
 
   ar.set_loading_hardcoded_objects(false);
 }

=== added file 'src/functions/pregenerated/func_ft_module.cpp'
--- src/functions/pregenerated/func_ft_module.cpp	1970-01-01 00:00:00 +0000
+++ src/functions/pregenerated/func_ft_module.cpp	2012-04-24 22:19:24 +0000
@@ -0,0 +1,496 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ 
+// ******************************************
+// *                                        *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME           *
+// *                                        *
+// ******************************************
+
+
+#include "stdafx.h"
+#include "runtime/full_text/ft_module.h"
+#include "functions/func_ft_module.h"
+
+
+namespace zorba{
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_current_lang::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new CurrentLangIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_host_lang::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new HostLangIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_stem_lang_supported::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new IsStemLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_stop_word::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new IsStopWordIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_stop_word_lang_supported::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new IsStopWordLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_thesaurus_lang_supported::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new IsThesaurusLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_is_tokenizer_lang_supported::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new IsTokenizerLangSupportedIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_stem::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new StemIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_strip_diacritics::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new StripDiacriticsIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_thesaurus_lookup::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new ThesaurusLookupIterator(sctx, loc, argv);
+}
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+PlanIter_t full_text_tokenize_string::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new TokenizeStringIterator(sctx, loc, argv);
+}
+
+#endif
+
+void populate_context_ft_module(static_context* sctx)
+{
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_current_lang,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","current-lang";), 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_CURRENT_LANG_0);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_host_lang,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","host-lang";), 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_HOST_LANG_0);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_stem_lang_supported,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stem-lang-supported";), 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_STEM_LANG_SUPPORTED_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_stop_word,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stop-word";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_STOP_WORD_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_stop_word,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stop-word";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_STOP_WORD_2);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_stop_word_lang_supported,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-stop-word-lang-supported";), 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_STOP_WORD_LANG_SUPPORTED_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_thesaurus_lang_supported,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-thesaurus-lang-supported";), 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_thesaurus_lang_supported,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-thesaurus-lang-supported";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_2);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_is_tokenizer_lang_supported,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","is-tokenizer-lang-supported";), 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.BOOLEAN_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_IS_TOKENIZER_LANG_SUPPORTED_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_stem,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","stem";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_STEM_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_stem,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","stem";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_STEM_2);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_strip_diacritics,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","strip-diacritics";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE),
+        FunctionConsts::FULL_TEXT_STRIP_DIACRITICS_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+        FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+        FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_2);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+        FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_3);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+        FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_4);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_thesaurus_lookup,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","thesaurus-lookup";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.INTEGER_TYPE_ONE, 
+        GENV_TYPESYSTEM.INTEGER_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_PLUS),
+        FunctionConsts::FULL_TEXT_THESAURUS_LOOKUP_6);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_tokenize_string,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","tokenize-string";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_STAR),
+        FunctionConsts::FULL_TEXT_TOKENIZE_STRING_1);
+
+  }
+
+
+#endif
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+  {
+    
+
+    DECL_WITH_KIND(sctx, full_text_tokenize_string,
+        (createQName("http://www.zorba-xquery.com/modules/full-text","","tokenize-string";), 
+        GENV_TYPESYSTEM.STRING_TYPE_ONE, 
+        GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE, 
+        GENV_TYPESYSTEM.STRING_TYPE_STAR),
+        FunctionConsts::FULL_TEXT_TOKENIZE_STRING_2);
+
+  }
+
+
+#endif
+}
+
+
+}
+
+
+

=== added file 'src/functions/pregenerated/func_ft_module.h'
--- src/functions/pregenerated/func_ft_module.h	1970-01-01 00:00:00 +0000
+++ src/functions/pregenerated/func_ft_module.h	2012-04-24 22:19:24 +0000
@@ -0,0 +1,259 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ 
+// ******************************************
+// *                                        *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME           *
+// *                                        *
+// ******************************************
+
+
+#ifndef ZORBA_FUNCTIONS_FT_MODULE_H
+#define ZORBA_FUNCTIONS_FT_MODULE_H
+
+
+#include "common/shared_types.h"
+#include "functions/function_impl.h"
+
+
+namespace zorba {
+
+
+void populate_context_ft_module(static_context* sctx);
+
+
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:current-lang
+class full_text_current_lang : public function
+{
+public:
+  full_text_current_lang(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:host-lang
+class full_text_host_lang : public function
+{
+public:
+  full_text_host_lang(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-stem-lang-supported
+class full_text_is_stem_lang_supported : public function
+{
+public:
+  full_text_is_stem_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-stop-word
+class full_text_is_stop_word : public function
+{
+public:
+  full_text_is_stop_word(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-stop-word-lang-supported
+class full_text_is_stop_word_lang_supported : public function
+{
+public:
+  full_text_is_stop_word_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-thesaurus-lang-supported
+class full_text_is_thesaurus_lang_supported : public function
+{
+public:
+  full_text_is_thesaurus_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:is-tokenizer-lang-supported
+class full_text_is_tokenizer_lang_supported : public function
+{
+public:
+  full_text_is_tokenizer_lang_supported(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:stem
+class full_text_stem : public function
+{
+public:
+  full_text_stem(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:strip-diacritics
+class full_text_strip_diacritics : public function
+{
+public:
+  full_text_strip_diacritics(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:thesaurus-lookup
+class full_text_thesaurus_lookup : public function
+{
+public:
+  full_text_thesaurus_lookup(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:tokenize
+class full_text_tokenize : public function
+{
+public:
+  full_text_tokenize(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:tokenizer-properties
+class full_text_tokenizer_properties : public function
+{
+public:
+  full_text_tokenizer_properties(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  bool accessesDynCtx() const { return true; }
+
+  CODEGEN_DECL();
+};
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+
+//full-text:tokenize-string
+class full_text_tokenize_string : public function
+{
+public:
+  full_text_tokenize_string(const signature& sig, FunctionConsts::FunctionKind kind)
+    : 
+    function(sig, kind)
+  {
+
+  }
+
+  CODEGEN_DECL();
+};
+#endif
+
+
+} //namespace zorba
+
+
+#endif
+/*
+ * Local variables:
+ * mode: c++
+ * End:
+ */ 

=== modified file 'src/functions/pregenerated/function_enum.h'
--- src/functions/pregenerated/function_enum.h	2012-04-24 12:39:38 +0000
+++ src/functions/pregenerated/function_enum.h	2012-04-24 22:19:24 +0000
@@ -138,6 +138,29 @@
   FN_ZORBA_FETCH_CONTENT_2,
   FN_ZORBA_FETCH_CONTENT_TYPE_1,
   FN_PUT_2,
+  FULL_TEXT_CURRENT_LANG_0,
+  FULL_TEXT_HOST_LANG_0,
+  FULL_TEXT_IS_STEM_LANG_SUPPORTED_1,
+  FULL_TEXT_IS_STOP_WORD_1,
+  FULL_TEXT_IS_STOP_WORD_2,
+  FULL_TEXT_IS_STOP_WORD_LANG_SUPPORTED_1,
+  FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_1,
+  FULL_TEXT_IS_THESAURUS_LANG_SUPPORTED_2,
+  FULL_TEXT_IS_TOKENIZER_LANG_SUPPORTED_1,
+  FULL_TEXT_STEM_1,
+  FULL_TEXT_STEM_2,
+  FULL_TEXT_STRIP_DIACRITICS_1,
+  FULL_TEXT_THESAURUS_LOOKUP_1,
+  FULL_TEXT_THESAURUS_LOOKUP_2,
+  FULL_TEXT_THESAURUS_LOOKUP_3,
+  FULL_TEXT_THESAURUS_LOOKUP_4,
+  FULL_TEXT_THESAURUS_LOOKUP_6,
+  FULL_TEXT_TOKENIZE_1,
+  FULL_TEXT_TOKENIZE_2,
+  FULL_TEXT_TOKENIZER_PROPERTIES_0,
+  FULL_TEXT_TOKENIZER_PROPERTIES_1,
+  FULL_TEXT_TOKENIZE_STRING_1,
+  FULL_TEXT_TOKENIZE_STRING_2,
   FN_FUNCTION_NAME_1,
   FN_FUNCTION_ARITY_1,
   FN_PARTIAL_APPLY_2,

=== modified file 'src/runtime/full_text/CMakeLists.txt'
--- src/runtime/full_text/CMakeLists.txt	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/CMakeLists.txt	2012-04-24 22:19:24 +0000
@@ -13,6 +13,7 @@
 # limitations under the License.
 
 SET(FULLTEXT_SRCS
+    ft_util.cpp
     ft_match.cpp
     ft_query_item.cpp
     ft_single_token_iterator.cpp
@@ -40,6 +41,7 @@
     thesaurus.cpp
     tokenizer.cpp
     default_tokenizer.cpp
+    ft_module.cpp
     )
 
 IF (ZORBA_NO_ICU)
@@ -51,5 +53,5 @@
 ADD_SRC_SUBFOLDER(FULLTEXT_SRCS stemmer LIBSTEMMER_SRCS)
 
 IF (ZORBA_WITH_FILE_ACCESS)
-    ADD_SRC_SUBFOLDER(FULLTEXT_SRCS thesauri THESAURUS_SRCS)
+  ADD_SRC_SUBFOLDER(FULLTEXT_SRCS thesauri THESAURUS_SRCS)
 ENDIF (ZORBA_WITH_FILE_ACCESS)

=== modified file 'src/runtime/full_text/apply.cpp'
--- src/runtime/full_text/apply.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/apply.cpp	2012-04-24 22:19:24 +0000
@@ -26,13 +26,14 @@
 #include "diagnostics/dict.h"
 #include "diagnostics/xquery_diagnostics.h"
 #include "store/api/item.h"
+#include "store/api/item_factory.h"
 #include "store/api/store.h"
-#include "store/api/item_factory.h"
 #include "system/globalenv.h"
 #include "util/cxx_util.h"
 #include "util/indent.h"
 #include "util/stl_util.h"
 #include "zorbamisc/ns_consts.h"
+#include "zorbautils/locale.h"
 
 #ifndef NDEBUG
 # include "system/properties.h"
@@ -1184,11 +1185,10 @@
   {
   }
 
-  void operator()( char const *utf8_s, size_type utf8_len, size_type,
-                   size_type, size_type, void* ) {
-    FTToken const t( utf8_s, (int)utf8_len, token_no_, lang_ );
-    tokens_.push_back( t );
-  }
+  // inherited
+  void item( Item const&, bool );
+  void token( char const*, size_type, iso639_1::type, size_type, size_type,
+               size_type, Item const* );
 
 private:
   FTTokenSeqIterator::FTTokens &tokens_;
@@ -1196,51 +1196,72 @@
   iso639_1::type const lang_;
 };
 
+void thesaurus_callback::item( Item const&, bool ) {
+  // out-of-line since it's virtual
+}
+
+void thesaurus_callback::token( char const *utf8_s, size_type utf8_len,
+                                iso639_1::type, size_type, size_type,
+                                size_type, Item const* ) {
+  FTToken const t( utf8_s, (int)utf8_len, token_no_, lang_ );
+  tokens_.push_back( t );
+}
+
 } // anonymous namespace
 
 void ftcontains_visitor::
-lookup_thesaurus( ftthesaurus_id const &tid, zstring const &query_phrase,
+lookup_thesaurus( ftthesaurus_id const &t_id, zstring const &query_phrase,
                   FTToken const &qt0, query_item_star_t &result ) {
   ft_int at_least, at_most;
-  if ( ftrange const *const levels = tid.get_levels() )
+  if ( ftrange const *const levels = t_id.get_levels() )
     eval_ftrange( *levels, &at_least, &at_most );
   else
     at_least = 0, at_most = numeric_limits<ft_int>::max();
 
-  zstring const &uri = tid.get_uri();
+  zstring const &uri = t_id.get_uri();
 
   zstring error_msg;
   auto_ptr<internal::Resource> rsrc = static_ctx_.resolve_uri(
-    uri, internal::ThesaurusEntityData( qt0.lang() ), error_msg
+    uri, internal::EntityData::THESAURUS, error_msg
   );
   if ( !rsrc.get() )
     throw XQUERY_EXCEPTION( err::FTST0018, ERROR_PARAMS( uri ) );
 
-  internal::Thesaurus::ptr thesaurus(
-    dynamic_cast<internal::Thesaurus*>( rsrc.release() )
-  );
-  if ( !thesaurus )
-    throw XQUERY_EXCEPTION( err::FTST0018, ERROR_PARAMS( uri ) );
-
-  internal::Thesaurus::iterator::ptr tresult(
+  internal::ThesaurusProvider const *const t_provider =
+    dynamic_cast<internal::ThesaurusProvider const*>( rsrc.get() );
+  ZORBA_ASSERT( t_provider );
+
+  internal::Thesaurus::ptr thesaurus;
+  if ( !t_provider->getThesaurus( qt0.lang(), &thesaurus ) )
+    throw XQUERY_EXCEPTION(
+      zerr::ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED,
+      ERROR_PARAMS( iso639_1::string_of[ qt0.lang() ] )
+    );
+
+  internal::Thesaurus::iterator::ptr t_synonyms(
     thesaurus->lookup(
-      query_phrase, tid.get_relationship(), at_least, at_most
+      query_phrase, t_id.get_relationship(), at_least, at_most
     )
   );
-  if ( !tresult )
+  if ( !t_synonyms )
     return;
 
   FTTokenSeqIterator::FTTokens synonyms;
   thesaurus_callback cb( qt0.pos(), qt0.lang(), synonyms );
 
-  Tokenizer::Numbers tno;
-  Tokenizer::ptr tokenizer(
-    GENV_STORE.getTokenizerProvider()->getTokenizer( qt0.lang(), tno )
-  );
+  Tokenizer::Numbers t_num;
+  TokenizerProvider const *const provider = GENV_STORE.getTokenizerProvider();
+  ZORBA_ASSERT( provider );
+  Tokenizer::ptr tokenizer;
+  if ( !provider->getTokenizer( qt0.lang(), &t_num, &tokenizer ) )
+    throw XQUERY_EXCEPTION(
+      zerr::ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED,
+      ERROR_PARAMS( iso639_1::string_of[ qt0.lang() ] )
+    );
 
-  for ( zstring synonym; tresult->next( &synonym ); ) {
+  for ( zstring synonym; t_synonyms->next( &synonym ); ) {
     synonyms.clear();
-    tokenizer->tokenize(
+    tokenizer->tokenize_string(
       synonym.data(), synonym.size(), qt0.lang(), false, cb
     );
     query_item_t const query_item( new FTTokenSeqIterator( synonyms ) );

=== added file 'src/runtime/full_text/ft_module_impl.cpp'
--- src/runtime/full_text/ft_module_impl.cpp	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_impl.cpp	2012-04-24 22:19:24 +0000
@@ -0,0 +1,843 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <zorba/config.h>
+
+#ifndef ZORBA_NO_FULL_TEXT
+
+# include <limits>
+# include <typeinfo>
+
+# include <zorba/diagnostic_list.h>
+
+# include "api/unmarshaller.h"
+# include "context/namespace_context.h"
+# include "context/static_context.h"
+# include "diagnostics/assert.h"
+# include "diagnostics/xquery_diagnostics.h"
+# include "store/api/index.h"
+# include "store/api/item.h"
+# include "store/api/item_factory.h"
+# include "store/api/iterator.h"
+# include "store/api/store.h"
+# include "system/globalenv.h"
+# include "types/casting.h"
+# include "types/typeimpl.h"
+# include "types/typeops.h"
+# include "util/utf8_util.h"
+# include "zorbatypes/URI.h"
+# include "zorbautils/locale.h"
+
+# include "ft_stop_words_set.h"
+# include "ft_token_seq_iterator.h"
+# include "ft_util.h"
+# include "thesaurus.h"
+
+#endif /* ZORBA_NO_FULL_TEXT */
+
+#include "runtime/full_text/ft_module.h"
+
+using namespace std;
+using namespace zorba::locale;
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef ZORBA_NO_FULL_TEXT
+inline iso639_1::type get_lang_from( static_context const *sctx ) {
+  iso639_1::type const lang = get_lang_from( sctx->get_match_options() );
+  return lang ? lang : get_host_lang();
+}
+
+static iso639_1::type get_lang_from( store::Item_t lang_item,
+                                     QueryLoc const &loc ) {
+  zstring lang_string;
+  lang_item->getStringValue2( lang_string );
+
+  if ( !GenericCast::instance()->castableToLanguage( lang_string ) )
+    throw XQUERY_EXCEPTION(
+      err::XPTY0004,
+      ERROR_PARAMS(
+        ZED( BadType_23o ), lang_string, ZED( NoCastTo_45o ), "xs:language"
+      ),
+      ERROR_LOC( loc )
+    );
+  if ( iso639_1::type const lang = find_lang( lang_string.c_str() ) )
+    return lang;
+  throw XQUERY_EXCEPTION(
+    err::FTST0009, ERROR_PARAMS( lang_string ), ERROR_LOC( loc )
+  );
+}
+#endif /* ZORBA_NO_FULL_TEXT */
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool CurrentLangIterator::nextImpl( store::Item_t &result,
+                                    PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  iso639_1::type const lang = get_lang_from( getStaticContext() );
+  zstring lang_string( iso639_1::string_of[ lang ] );
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  GENV_ITEMFACTORY->createLanguage( result, lang_string );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool HostLangIterator::nextImpl( store::Item_t &result,
+                                 PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  iso639_1::type const lang = get_host_lang();
+  zstring lang_string = iso639_1::string_of[ lang ];
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  GENV_ITEMFACTORY->createLanguage( result, lang_string );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsStemLangSupportedIterator::nextImpl( store::Item_t &result,
+                                            PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  bool is_supported;
+  store::Item_t item;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  consumeNext( item, theChildren[0], plan_state );
+  try {
+    internal::StemmerProvider const *const provider =
+      GENV_STORE.getStemmerProvider();
+    is_supported = provider->getStemmer( get_lang_from( item, loc ) );
+  }
+  catch ( XQueryException const &e ) {
+    if ( e.diagnostic() != err::FTST0009 )
+      throw;
+    is_supported = false;
+  }
+
+  GENV_ITEMFACTORY->createBoolean( result, is_supported );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsStopWordIterator::nextImpl( store::Item_t &result,
+                                   PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  store::Item_t item;
+  iso639_1::type lang;
+  ft_stop_words_set::ptr stop_words;
+  zstring word;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  lang = get_lang_from( getStaticContext() );
+
+  consumeNext( item, theChildren[0], plan_state );
+  item->getStringValue2( word );
+
+  if ( theChildren.size() > 1 ) {
+    consumeNext( item, theChildren[1], plan_state );
+    lang = get_lang_from( item, loc );
+  }
+
+  stop_words.reset( ft_stop_words_set::get_default( lang ) );
+  if ( !stop_words )
+    throw XQUERY_EXCEPTION(
+      zerr::ZXQP8405_STOP_WORDS_LANG_NOT_SUPPORTED,
+      ERROR_PARAMS( lang ),
+      ERROR_LOC( loc )
+    );
+  GENV_ITEMFACTORY->createBoolean( result, stop_words->contains( word ) );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsStopWordLangSupportedIterator::nextImpl( store::Item_t &result,
+                                                PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  bool is_supported;
+  store::Item_t item;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  consumeNext( item, theChildren[0], plan_state );
+  try {
+    is_supported = ft_stop_words_set::get_default( get_lang_from( item, loc ) );
+  }
+  catch ( XQueryException const &e ) {
+    if ( e.diagnostic() != err::FTST0009 )
+      throw;
+    is_supported = false;
+  }
+
+  GENV_ITEMFACTORY->createBoolean( result, is_supported );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsThesaurusLangSupportedIterator::nextImpl( store::Item_t &result,
+                                                 PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  bool is_supported;
+  store::Item_t item;
+  zstring uri;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  consumeNext( item, theChildren[0], plan_state );
+  if ( theChildren.size() > 1 ) {
+    item->getStringValue2( uri );
+    consumeNext( item, theChildren[1], plan_state );
+  } else {
+    uri = "##default";
+  }
+
+  try {
+    iso639_1::type const lang = get_lang_from( item, loc );
+    static_context const *const sctx = getStaticContext();
+
+    vector<zstring> comp_uris;
+    sctx->get_component_uris(
+      uri, internal::EntityData::THESAURUS, comp_uris
+    );
+    if ( comp_uris.size() != 1 )
+      throw XQUERY_EXCEPTION(
+        err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+      );
+
+    zstring error_msg;
+    auto_ptr<internal::Resource> rsrc = sctx->resolve_uri(
+      comp_uris.front(), internal::EntityData::THESAURUS, error_msg
+    );
+    if ( !rsrc.get() )
+      throw XQUERY_EXCEPTION(
+        err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+      );
+#if 0
+    if ( !error_msg.empty() )
+      cerr << "error_msg=" << error_msg << endl;
+#endif
+    internal::ThesaurusProvider const *const provider =
+      dynamic_cast<internal::ThesaurusProvider const*>( rsrc.get() );
+    ZORBA_ASSERT( provider );
+    is_supported = provider->getThesaurus( lang );
+  }
+  catch ( XQueryException const &e ) {
+    if ( e.diagnostic() != err::FTST0009 /* lang not supported by Zorba */ )
+      throw;
+    is_supported = false;
+  }
+
+  GENV_ITEMFACTORY->createBoolean( result, is_supported );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool IsTokenizerLangSupportedIterator::nextImpl( store::Item_t &result,
+                                                 PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  bool is_supported;
+  store::Item_t item;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  consumeNext( item, theChildren[0], plan_state );
+  try {
+    TokenizerProvider const *const p = GENV_STORE.getTokenizerProvider();
+    is_supported = p && p->getTokenizer( get_lang_from( item, loc ) );
+  }
+  catch ( XQueryException const &e ) {
+    if ( e.diagnostic() != err::FTST0009 )
+      throw;
+    is_supported = false;
+  }
+
+  GENV_ITEMFACTORY->createBoolean( result, is_supported );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool StemIterator::nextImpl( store::Item_t &result,
+                             PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  store::Item_t item;
+  iso639_1::type lang;
+  internal::StemmerProvider const *provider;
+  internal::Stemmer::ptr stemmer;
+  zstring word, stem;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  lang = get_lang_from( getStaticContext() );
+
+  consumeNext( item, theChildren[0], plan_state );
+  item->getStringValue2( word );
+  utf8::to_lower( word );
+
+  if ( theChildren.size() > 1 ) {
+    consumeNext( item, theChildren[1], plan_state );
+    lang = get_lang_from( item, loc );
+  }
+
+  // TODO: why is this always the default StemmerProvider?
+  provider = GENV_STORE.getStemmerProvider();
+  ZORBA_ASSERT( provider );
+  if ( provider->getStemmer( lang, &stemmer ) ) {
+    stemmer->stem( word, lang, &stem );
+    GENV_ITEMFACTORY->createString( result, stem );
+    STACK_PUSH( true, state );
+  } else {
+    throw XQUERY_EXCEPTION(
+      zerr::ZXQP8404_STEM_LANG_NOT_SUPPORTED,
+      ERROR_PARAMS( lang ),
+      ERROR_LOC( loc )
+    );
+  }
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool StripDiacriticsIterator::nextImpl( store::Item_t &result,
+                                        PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  store::Item_t item;
+  zstring phrase, stripped_phrase;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  consumeNext( item, theChildren[0], plan_state );
+  item->getStringValue2( phrase );
+  utf8::strip_diacritics( phrase, &stripped_phrase );
+  GENV_ITEMFACTORY->createString( result, stripped_phrase );
+  STACK_PUSH( true, state );
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool ThesaurusLookupIterator::nextImpl( store::Item_t &result,
+                                        PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  vector<zstring> comp_uris;
+  zstring error_msg;
+  store::Item_t item;
+  iso639_1::type lang;
+  auto_ptr<internal::Resource> rsrc;
+  zstring uri = "##default";
+  static_context const *sctx;
+  zstring synonym;
+  internal::ThesaurusProvider const *provider;
+
+  ThesaurusLookupIteratorState *state;
+  DEFAULT_STACK_INIT( ThesaurusLookupIteratorState, state, plan_state );
+
+  sctx = getStaticContext();
+  lang = get_lang_from( sctx );
+  state->at_least_ = 0;
+  state->at_most_ = numeric_limits<internal::Thesaurus::level_type>::max();
+
+  if ( theChildren.size() == 1 ) {
+    consumeNext( item, theChildren[0], plan_state );
+    item->getStringValue2( state->phrase_ );
+  } else if ( theChildren.size() > 1 ) {
+    consumeNext( item, theChildren[0], plan_state );
+    item->getStringValue2( uri );
+    consumeNext( item, theChildren[1], plan_state );
+    item->getStringValue2( state->phrase_ );
+    if ( theChildren.size() > 2 ) {
+      consumeNext( item, theChildren[2], plan_state );
+      lang = get_lang_from( item, loc );
+      if ( theChildren.size() > 3 ) {
+        consumeNext( item, theChildren[3], plan_state );
+        item->getStringValue2( state->relationship_ );
+        if ( theChildren.size() > 4 ) {
+          ZORBA_ASSERT( theChildren.size() == 6 );
+          consumeNext( item, theChildren[4], plan_state );
+          state->at_least_ = to_ft_int( item->getIntegerValue() );
+          consumeNext( item, theChildren[5], plan_state );
+          state->at_most_ = to_ft_int( item->getIntegerValue() );
+        }
+      }
+    }
+  }
+
+  sctx->get_component_uris(
+    uri, internal::EntityData::THESAURUS, comp_uris
+  );
+  if ( comp_uris.size() != 1 )
+    throw XQUERY_EXCEPTION(
+      err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+    );
+
+  rsrc = sctx->resolve_uri(
+    comp_uris.front(), internal::EntityData::THESAURUS, error_msg
+  );
+  if ( !rsrc.get() )
+    throw XQUERY_EXCEPTION(
+      err::FTST0018, ERROR_PARAMS( uri ), ERROR_LOC( loc )
+    );
+
+  provider = dynamic_cast<internal::ThesaurusProvider const*>( rsrc.get() );
+  ZORBA_ASSERT( provider );
+  if ( !provider->getThesaurus( lang, &state->thesaurus_ ) )
+    throw XQUERY_EXCEPTION(
+      zerr::ZXQP8406_THESAURUS_LANG_NOT_SUPPORTED,
+      ERROR_PARAMS( lang ),
+      ERROR_LOC( loc )
+    );
+
+  state->tresult_ = std::move(
+    state->thesaurus_->lookup(
+      state->phrase_, state->relationship_, state->at_least_, state->at_most_
+    )
+  );
+  ZORBA_ASSERT( state->tresult_.get() );
+
+  while ( state->tresult_->next( &synonym ) ) {
+    GENV_ITEMFACTORY->createString( result, synonym );
+    STACK_PUSH( true, state );
+  }
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+void ThesaurusLookupIterator::resetImpl( PlanState &plan_state ) const {
+#ifndef ZORBA_NO_FULL_TEXT
+  NaryBaseIterator<ThesaurusLookupIterator,ThesaurusLookupIteratorState>::
+    resetImpl( plan_state );
+  ThesaurusLookupIteratorState *const state =
+    StateTraitsImpl<ThesaurusLookupIteratorState>::getState(
+      plan_state, this->theStateOffset
+    );
+  state->tresult_ = std::move(
+    state->thesaurus_->lookup(
+      state->phrase_, state->relationship_, state->at_least_, state->at_most_
+    )
+  );
+  ZORBA_ASSERT( state->tresult_.get() );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool TokenizeIterator::nextImpl( store::Item_t &result,
+                                 PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  store::Item_t attr_name, attr_node;
+  zstring base_uri;
+  store::Item_t item;
+  iso639_1::type lang;
+  Tokenizer::Numbers no;
+  store::NsBindings const ns_bindings;
+  static_context const *sctx;
+  TokenizerProvider const *tokenizer_provider;
+  store::Item_t type_name;
+  zstring value_string;
+
+  sctx = getStaticContext();
+
+  TokenizeIteratorState *state;
+  DEFAULT_STACK_INIT( TokenizeIteratorState, state, plan_state );
+
+  lang = get_lang_from( sctx );
+
+  if ( consumeNext( state->doc_item_, theChildren[0], plan_state ) ) {
+    if ( theChildren.size() > 1 ) {
+      consumeNext( item, theChildren[1], plan_state );
+      lang = get_lang_from( item, loc );
+    }
+
+    tokenizer_provider = GENV_STORE.getTokenizerProvider();
+    state->doc_tokens_ =
+      state->doc_item_->getTokens( *tokenizer_provider, no, lang );
+
+    while ( state->doc_tokens_->hasNext() ) {
+      FTToken const *token;
+      token = state->doc_tokens_->next();
+      ZORBA_ASSERT( token );
+
+      if ( state->token_qname_.isNull() )
+        GENV_ITEMFACTORY->createQName(
+          state->token_qname_, static_context::ZORBA_FULL_TEXT_FN_NS, "",
+          "token"
+        );
+
+      base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
+      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+      GENV_ITEMFACTORY->createElementNode(
+        result, nullptr, state->token_qname_, type_name, false, false,
+        ns_bindings, base_uri
+      );
+
+      if ( token->lang() ) {
+        value_string = iso639_1::string_of[ token->lang() ];
+        GENV_ITEMFACTORY->createQName( attr_name, "", "", "lang" );
+        GENV_ITEMFACTORY->createString( item, value_string );
+        type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+        GENV_ITEMFACTORY->createAttributeNode(
+          attr_node, result, attr_name, type_name, item
+        );
+      }
+
+      ztd::to_string( token->para(), &value_string );
+      GENV_ITEMFACTORY->createQName( attr_name, "", "", "paragraph" );
+      GENV_ITEMFACTORY->createString( item, value_string );
+      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+      GENV_ITEMFACTORY->createAttributeNode(
+        attr_node, result, attr_name, type_name, item
+      );
+
+      ztd::to_string( token->sent(), &value_string );
+      GENV_ITEMFACTORY->createQName( attr_name, "", "", "sentence" );
+      GENV_ITEMFACTORY->createString( item, value_string );
+      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+      GENV_ITEMFACTORY->createAttributeNode(
+        attr_node, result, attr_name, type_name, item
+      );
+
+      value_string = token->value();
+      GENV_ITEMFACTORY->createQName( attr_name, "", "", "value" );
+      GENV_ITEMFACTORY->createString( item, value_string );
+      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+      GENV_ITEMFACTORY->createAttributeNode(
+        attr_node, result, attr_name, type_name, item
+      );
+
+      if ( store::Item const *const token_item = token->item() ) {
+        if ( GENV_STORE.getNodeReference( item, token_item ) ) {
+          item->getStringValue2( value_string );
+          GENV_ITEMFACTORY->createQName( attr_name, "", "", "node-ref" );
+          GENV_ITEMFACTORY->createString( item, value_string );
+          type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+          GENV_ITEMFACTORY->createAttributeNode(
+            attr_node, result, attr_name, type_name, item
+          );
+        }
+      }
+
+#ifndef ZORBA_NO_XMLSCHEMA
+      sctx->validate( result, result, StaticContextConsts::strict_validation );
+#endif /* ZORBA_NO_XMLSCHEMA */
+
+      STACK_PUSH( true, state );
+    } // while
+  }
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+void TokenizeIterator::resetImpl( PlanState &plan_state ) const {
+#ifndef ZORBA_NO_FULL_TEXT
+  NaryBaseIterator<TokenizeIterator,TokenizeIteratorState>::
+    resetImpl( plan_state );
+  TokenizeIteratorState *const state =
+    StateTraitsImpl<TokenizeIteratorState>::getState(
+      plan_state, this->theStateOffset
+    );
+  state->doc_tokens_->reset();
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+bool TokenizerPropertiesIterator::nextImpl( store::Item_t &result,
+                                            PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  store::Item_t element, item, junk, name;
+  zstring base_uri;
+  iso639_1::type lang;
+  Tokenizer::Numbers no;
+  store::NsBindings const ns_bindings;
+  static_context const *sctx;
+  Tokenizer::ptr tokenizer;
+  store::Item_t type_name;
+  Tokenizer::Properties props;
+  TokenizerProvider const *tokenizer_provider;
+  zstring value_string;
+
+  PlanIteratorState *state;
+  DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
+
+  sctx = getStaticContext();
+  lang = get_lang_from( getStaticContext() );
+
+  if ( theChildren.size() > 0 ) {
+    consumeNext( item, theChildren[0], plan_state );
+    lang = get_lang_from( item, loc );
+  }
+
+  tokenizer_provider = GENV_STORE.getTokenizerProvider();
+  ZORBA_ASSERT( tokenizer_provider );
+  if ( !tokenizer_provider->getTokenizer( lang, &no, &tokenizer ) )
+    throw XQUERY_EXCEPTION(
+      zerr::ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED,
+      ERROR_PARAMS( iso639_1::string_of[ lang ] )
+    );
+  tokenizer->properties( &props );
+
+  GENV_ITEMFACTORY->createQName(
+    name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "tokenizer-properties"
+  );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+  GENV_ITEMFACTORY->createElementNode(
+    result, nullptr, name, type_name, false, false, ns_bindings, base_uri
+  );
+
+  // uri="..."
+  GENV_ITEMFACTORY->createQName( name, "", "", "uri" );
+  GENV_ITEMFACTORY->createAnyURI( item, props.uri );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createAttributeNode( junk, result, name, type_name, item );
+
+  // <comments-separate-tokens value="..."/>
+  GENV_ITEMFACTORY->createQName(
+    name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "comments-separate-tokens"
+  );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createElementNode(
+    element, result, name, type_name, false, false, ns_bindings, base_uri
+  );
+  GENV_ITEMFACTORY->createQName( name, "", "", "value" );
+  GENV_ITEMFACTORY->createBoolean( item, props.comments_separate_tokens );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createAttributeNode( junk, element, name, type_name, item );
+
+  // <elements-separate-tokens value="..."/>
+  GENV_ITEMFACTORY->createQName(
+    name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "elements-separate-tokens"
+  );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createElementNode(
+    element, result, name, type_name, false, false, ns_bindings, base_uri
+  );
+  GENV_ITEMFACTORY->createQName( name, "", "", "value" );
+  GENV_ITEMFACTORY->createBoolean( item, props.elements_separate_tokens );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createAttributeNode( junk, element, name, type_name, item );
+
+  // <processing-instructions-separate-tokens value="..."/>
+  GENV_ITEMFACTORY->createQName(
+    name, static_context::ZORBA_FULL_TEXT_FN_NS, "",
+    "processing-instructions-separate-tokens"
+  );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createElementNode(
+    element, result, name, type_name, false, false, ns_bindings, base_uri
+  );
+  GENV_ITEMFACTORY->createQName( name, "", "", "value" );
+  GENV_ITEMFACTORY->createBoolean( item, props.processing_instructions_separate_tokens );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createAttributeNode( junk, element, name, type_name, item );
+
+  // <supported-languages>...</supported-languages>
+  GENV_ITEMFACTORY->createQName(
+    name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "supported-languages"
+  );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+  GENV_ITEMFACTORY->createElementNode(
+    element, result, name, type_name, false, false, ns_bindings, base_uri
+  );
+
+  // <lang>...</lang>
+  FOR_EACH( Tokenizer::Properties::languages_type, i, props.languages ) {
+    store::Item_t lang_element;
+    type_name = GENV_TYPESYSTEM.XS_UNTYPED_ATOMIC_QNAME;
+    GENV_ITEMFACTORY->createQName(
+      name, static_context::ZORBA_FULL_TEXT_FN_NS, "", "lang"
+    );
+    GENV_ITEMFACTORY->createElementNode(
+      lang_element, element, name, type_name, false, false, ns_bindings,
+      base_uri
+    );
+    value_string = iso639_1::string_of[ *i ];
+    GENV_ITEMFACTORY->createTextNode( junk, lang_element.getp(), value_string );
+  }
+
+#ifndef ZORBA_NO_XMLSCHEMA
+  sctx->validate( result, result, StaticContextConsts::strict_validation );
+#endif /* ZORBA_NO_XMLSCHEMA */
+
+  STACK_PUSH( true, state );
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+#ifndef ZORBA_NO_FULL_TEXT
+struct TokenizeStringIteratorCallback : Tokenizer::Callback {
+  void token( char const*, size_type, iso639_1::type, size_type, size_type,
+              size_type, Item const* );
+
+  FTTokenSeqIterator::FTTokens tokens_;
+};
+
+void TokenizeStringIteratorCallback::
+token( char const *utf8_s, size_type utf8_len, iso639_1::type lang,
+       size_type token_no, size_type sent_no, size_type para_no,
+       Item const *item ) {
+  store::Item const *const store_item =
+    item ? Unmarshaller::getInternalItem( *item ) : nullptr;
+
+  FTToken const token(
+    utf8_s, utf8_len, token_no, sent_no, para_no, store_item, lang
+  );
+  tokens_.push_back( token );
+}
+#endif /* ZORBA_NO_FULL_TEXT */
+
+bool TokenizeStringIterator::nextImpl( store::Item_t &result,
+                                       PlanState &plan_state ) const {
+#ifdef ZORBA_NO_FULL_TEXT
+  return false;
+#else
+  store::Item_t item;
+  iso639_1::type lang;
+  zstring value_string;
+
+  TokenizeStringIteratorState *state;
+  DEFAULT_STACK_INIT( TokenizeStringIteratorState, state, plan_state );
+
+  lang = get_lang_from( getStaticContext() );
+
+  if ( consumeNext( item, theChildren[0], plan_state ) ) {
+    item->getStringValue2( value_string );
+    if ( theChildren.size() > 1 ) {
+      consumeNext( item, theChildren[1], plan_state );
+      lang = get_lang_from( item, loc );
+    }
+
+    { // local scope
+    TokenizerProvider const *const tokenizer_provider =
+      GENV_STORE.getTokenizerProvider();
+    ZORBA_ASSERT( tokenizer_provider );
+    Tokenizer::Numbers no;
+    Tokenizer::ptr tokenizer;
+    if ( !tokenizer_provider->getTokenizer( lang, &no, &tokenizer ) )
+      throw XQUERY_EXCEPTION(
+        zerr::ZXQP8407_TOKENIZER_LANG_NOT_SUPPORTED,
+        ERROR_PARAMS( iso639_1::string_of[ lang ] )
+      );
+
+    TokenizeStringIteratorCallback callback;
+    tokenizer->tokenize_string(
+      value_string.data(), value_string.size(), lang, false, callback
+    );
+    state->string_tokens_.take( callback.tokens_ );
+    } // local scope
+
+    while ( state->string_tokens_.hasNext() ) {
+      FTToken const *token;
+      token = state->string_tokens_.next();
+      ZORBA_ASSERT( token );
+      value_string = token->value();
+      GENV_ITEMFACTORY->createString( result, value_string );
+      STACK_PUSH( true, state );
+    }
+  }
+
+  STACK_END( state );
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+void TokenizeStringIterator::resetImpl( PlanState &plan_state ) const {
+#ifndef ZORBA_NO_FULL_TEXT
+  NaryBaseIterator<TokenizeStringIterator,TokenizeStringIteratorState>::
+    resetImpl( plan_state );
+  TokenizeStringIteratorState *const state =
+    StateTraitsImpl<TokenizeStringIteratorState>::getState(
+      plan_state, this->theStateOffset
+    );
+  state->string_tokens_.reset();
+#endif /* ZORBA_NO_FULL_TEXT */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */

=== added file 'src/runtime/full_text/ft_module_impl.h'
--- src/runtime/full_text/ft_module_impl.h	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_impl.h	2012-04-24 22:19:24 +0000
@@ -0,0 +1,32 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef ZORBA_FT_MODULE_IMPL_H
+#define ZORBA_FT_MODULE_IMPL_H
+
+namespace zorba {
+
+class static_context;
+
+///////////////////////////////////////////////////////////////////////////////
+
+void populate_context_ft_module_impl( static_context *sctx );
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+#endif /* ZORBA_FT_MODULE_IMPL_H */
+/* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/ft_query_item.h'
--- src/runtime/full_text/ft_query_item.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_query_item.h	2012-04-24 22:19:24 +0000
@@ -18,6 +18,7 @@
 #define ZORBA_FULL_TEXT_FT_QUERY_ITEM_H
 
 #include <list>
+#include <vector>
 
 #include "store/api/ft_token_iterator.h"
 
@@ -59,7 +60,7 @@
   void reset();
 
 private:
-  typedef std::list<Mark_t> MarkSeq;
+  typedef std::vector<Mark_t> MarkSeq;
 
   struct LocalMark : Mark {
     MarkSeq marks_;

=== modified file 'src/runtime/full_text/ft_single_token_iterator.h'
--- src/runtime/full_text/ft_single_token_iterator.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_single_token_iterator.h	2012-04-24 22:19:24 +0000
@@ -17,8 +17,6 @@
 #ifndef ZORBA_FULL_TEXT_SINGLE_TOKEN_ITERATOR_H
 #define ZORBA_FULL_TEXT_SINGLE_TOKEN_ITERATOR_H
 
-#include <list>
-
 #include "store/api/ft_token_iterator.h"
 
 namespace zorba {

=== modified file 'src/runtime/full_text/ft_stop_words_set.cpp'
--- src/runtime/full_text/ft_stop_words_set.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_stop_words_set.cpp	2012-04-24 22:19:24 +0000
@@ -72,7 +72,7 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-void ft_stop_words_set::apply_word( zstring const &word, set_t &word_set,
+void ft_stop_words_set::apply_word( zstring const &word, word_set_t &word_set,
                                     ft_stop_words_unex::type mode ) {
   // TODO: should "word" be converted to lower-case?
   std::cout << "applying word " << word << std::endl;
@@ -87,33 +87,33 @@
 }
 
 void ft_stop_words_set::apply_word( char const *begin, char const *end,
-                                    set_t &word_set,
+                                    word_set_t &word_set,
                                     ft_stop_words_unex::type mode ) {
-  set_t::value_type const word( begin, end - begin );
+  word_set_t::value_type const word( begin, end - begin );
   apply_word( word, word_set, mode );
 }
 
-ft_stop_words_set const*
+ft_stop_words_set::ptr
 ft_stop_words_set::construct( ftstop_word_option const &option,
                               iso639_1::type lang,
                               static_context const& sctx ) {
   bool must_delete = false;
-  set_t *word_set = nullptr;            // pointless init. to stifle warning
+  word_set_t *word_set = nullptr;       // pointless init. to stifle warning
 
   switch ( option.get_mode() ) {
     case ft_stop_words_mode::with:
-      word_set = new set_t;
+      word_set = new word_set_t;
       must_delete = true;
       break;
     case ft_stop_words_mode::with_default:
       word_set = get_default_word_set_for( lang );
       if ( !word_set ) {
         // TODO: throw exception?
-        return 0;
+        return ptr();
       }
       break;
     case ft_stop_words_mode::without:
-      return 0;
+      return ptr();
   }
 
   FOR_EACH( ftstop_word_option::list_t, ftsw, option.get_stop_words() ) {
@@ -122,31 +122,30 @@
 
     if ( !uri.empty() ) {
       if ( !must_delete ) {
-        word_set = new set_t( *word_set );
+        word_set = new word_set_t( *word_set );
         must_delete = true;
       }
 
       zstring error_msg;
       std::auto_ptr<internal::Resource> rsrc =
-          sctx.resolve_uri(uri, internal::EntityData::STOP_WORDS, error_msg);
-      internal::StreamResource* stream_rsrc =
-          dynamic_cast<internal::StreamResource*>(rsrc.get());
+        sctx.resolve_uri( uri, internal::EntityData::STOP_WORDS, error_msg );
+      internal::StreamResource *const stream_rsrc =
+        dynamic_cast<internal::StreamResource*>( rsrc.get() );
       if ( !stream_rsrc ) {
         // Technically this should be thrown during static analysis.
-        throw ZORBA_EXCEPTION(err::FTST0008, ERROR_PARAMS(uri));
+        throw ZORBA_EXCEPTION( err::FTST0008, ERROR_PARAMS( uri ) );
       }
-      std::istream* stream = stream_rsrc->getStream();
+      std::istream *const stream = stream_rsrc->getStream();
 
       bool in_word = false;
       zstring cur_word;
-      cur_word.reserve(128);
+      cur_word.reserve( 128 );
       char c;
-      while (stream->good()) {
-        stream->get(c);
+      while ( stream->good() ) {
+        stream->get( c );
         // Have to check for EOF *after* attempting the read
-        if (stream->eof()) {
+        if ( stream->eof() )
           break;
-        }
         if ( is_word_char( c ) ) {
           if ( !in_word ) {
             cur_word.clear();
@@ -167,25 +166,31 @@
     ftstop_words::list_t const &word_list = (*ftsw)->get_list();
     if ( !word_list.empty() ) {
       if ( !must_delete ) {
-        word_set = new set_t( *word_set );
+        word_set = new word_set_t( *word_set );
         must_delete = true;
       }
       FOR_EACH( ftstop_words::list_t, word, word_list )
         apply_word( *word, *word_set, mode );
     }
   }
-  return new ft_stop_words_set( word_set, must_delete );
-}
-
-ft_stop_words_set::set_t*
+  return ptr( new ft_stop_words_set( word_set, must_delete ) );
+}
+
+ft_stop_words_set const*
+ft_stop_words_set::get_default( iso639_1::type lang ) {
+  word_set_t const *const word_set = get_default_word_set_for( lang );
+  return word_set ? new ft_stop_words_set( word_set, false ) : nullptr;
+}
+
+ft_stop_words_set::word_set_t*
 ft_stop_words_set::get_default_word_set_for( iso639_1::type lang ) {
-  static set_t* cached_word_sets[ iso639_1::NUM_ENTRIES ];
+  static word_set_t *cached_word_sets[ iso639_1::NUM_ENTRIES ];
   if ( !lang )
     lang = get_host_lang();
-  set_t *&word_set = cached_word_sets[ lang ];
+  word_set_t *&word_set = cached_word_sets[ lang ];
   if ( !word_set ) {
     if ( ft_stop_table const table = get_table_for( lang ) ) {
-      word_set = new set_t;
+      word_set = new word_set_t;
       for ( ft_stop_table word = table; *word; ++word )
         word_set->insert( *word );
     }

=== modified file 'src/runtime/full_text/ft_stop_words_set.h'
--- src/runtime/full_text/ft_stop_words_set.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_stop_words_set.h	2012-04-24 22:19:24 +0000
@@ -20,6 +20,7 @@
 #include <set>
 
 #include <zorba/locale.h>
+#include <zorba/internal/unique_ptr.h>
 
 #include "compiler/expression/ftnode.h"
 #include "zorbatypes/zstring.h"
@@ -27,26 +28,29 @@
 namespace zorba {
 
 /**
- * An %ft_stop_words_set is (as its name suggests) a set of stop-wors.
+ * An %ft_stop_words_set is (as its name suggests) a set of stop-words.
  */
 class ft_stop_words_set {
 public:
+  typedef std::unique_ptr<ft_stop_words_set const> ptr;
+
   ~ft_stop_words_set() {
     if ( delete_ )
       delete word_set_;
   }
 
   /**
-   * Constructs an %ft_stop_words_set.
+   * Constructs an %ft_stop_words_set for the given language.
    *
    * @param option The ftstop_word_option to use to possibly add or remove
    * stop-words.
    * @param lang The language of the stop-words.
-   * @return Returns a new %ft_stop_words_set.
+   * @return Returns a new %ft_stop_words_set or \c nullptr if stop-words for
+   * \a lang are unsupported.
    */
-  static ft_stop_words_set const* construct( ftstop_word_option const &option,
-                                             locale::iso639_1::type lang,
-                                             static_context const& sctx );
+  static ptr construct( ftstop_word_option const &option,
+                        locale::iso639_1::type lang,
+                        static_context const& sctx );
 
   /**
    * Checks whether this %ft_stop_words_set contains the given word.
@@ -60,22 +64,33 @@
     return word_set_->find( word ) != word_set_->end();
   }
 
+  /**
+   * Gets the default %ft_stop_words_set.
+   *
+   * @param lang The language of the stop-words.
+   * @return Returns said default or \c nullptr if stop-words for \a lang are
+   * unsupported.
+   */
+  static ft_stop_words_set const* get_default( locale::iso639_1::type lang );
+
 private:
-  typedef std::set<zstring> set_t;
+  typedef std::set<zstring> word_set_t;
 
-  set_t const *const word_set_;
+  word_set_t const *const word_set_;
   bool const delete_;
 
-  ft_stop_words_set( set_t const *word_set, bool must_delete ) :
+  ft_stop_words_set( word_set_t const *word_set, bool must_delete ) :
     word_set_( word_set ), delete_( must_delete )
   {
   }
 
-  static void apply_word( zstring const&, set_t&, ft_stop_words_unex::type );
-  static void apply_word( char const*, char const*, set_t&,
-                          ft_stop_words_unex::type );
-
-  static set_t* get_default_word_set_for( locale::iso639_1::type );
+  static void apply_word( zstring const&, word_set_t&,
+                          ft_stop_words_unex::type );
+
+  static void apply_word( char const*, char const*, word_set_t&,
+                          ft_stop_words_unex::type );
+
+  static word_set_t* get_default_word_set_for( locale::iso639_1::type );
 
   // forbid these
   ft_stop_words_set( ft_stop_words_set const& );

=== modified file 'src/runtime/full_text/ft_token_matcher.cpp'
--- src/runtime/full_text/ft_token_matcher.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_matcher.cpp	2012-04-24 22:19:24 +0000
@@ -47,12 +47,12 @@
   return false;
 }
 
-inline ft_stop_words_set const* get_stop_words( ftmatch_options const &options,
-                                                iso639_1::type lang,
-                                                static_context const& sctx ) {
+inline ft_stop_words_set::ptr
+get_stop_words( ftmatch_options const &options, iso639_1::type lang,
+                static_context const& sctx ) {
   if ( ftstop_word_option const *const sw = options.get_stop_word_option() )
     return ft_stop_words_set::construct( *sw, lang, sctx );
-  return nullptr;
+  return ft_stop_words_set::ptr();
 }
 
 ///////////////////////////////////////////////////////////////////////////////
@@ -69,7 +69,7 @@
 }
 
 ft_token_matcher::~ft_token_matcher() {
-  delete stop_words_;
+  // out-of-line since it's virtual
 }
 
 ///////////////////////////////////////////////////////////////////////////////
@@ -83,8 +83,8 @@
 void ft_token_matcher::match_stemmer::
 operator()( string_t const &word, iso639_1::type lang,
             string_t *result ) const {
-  internal::Stemmer::ptr stemmer( provider_->get_stemmer( lang ) );
-  if ( stemmer )
+  internal::Stemmer::ptr stemmer;
+  if ( provider_->getStemmer( lang, &stemmer ) )
     stemmer->stem( word, lang, result );
   else
     *result = word;

=== modified file 'src/runtime/full_text/ft_token_matcher.h'
--- src/runtime/full_text/ft_token_matcher.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_matcher.h	2012-04-24 22:19:24 +0000
@@ -62,7 +62,7 @@
   locale::iso639_1::type const lang_;
   bool const stemming_;
   match_stemmer const stemmer_;
-  ft_stop_words_set const *const stop_words_;
+  ft_stop_words_set::ptr stop_words_;
   bool const wildcards_;
 };
 

=== modified file 'src/runtime/full_text/ft_token_seq_iterator.cpp'
--- src/runtime/full_text/ft_token_seq_iterator.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_seq_iterator.cpp	2012-04-24 22:19:24 +0000
@@ -25,8 +25,7 @@
 namespace zorba {
 
 FTTokenSeqIterator::FTTokenSeqIterator( FTTokens &tokens ) {
-  tokens_.swap( tokens );
-  pos_ = 0;
+  take( tokens );
 }
 
 FTTokenSeqIterator::~FTTokenSeqIterator() {
@@ -38,7 +37,7 @@
 }
 
 FTTokenIterator::index_t FTTokenSeqIterator::end() const {
-  return (FTTokenIterator::index_t)tokens_.size();;
+  return static_cast<FTTokenIterator::index_t>( tokens_.size() );
 }
 
 bool FTTokenSeqIterator::hasNext() const {
@@ -61,5 +60,10 @@
   pos_ = 0;
 }
 
+void FTTokenSeqIterator::take( FTTokens &tokens ) {
+  tokens.swap( tokens_ );
+  pos_ = 0;
+}
+
 } // namespace zorba
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/ft_token_seq_iterator.h'
--- src/runtime/full_text/ft_token_seq_iterator.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_seq_iterator.h	2012-04-24 22:19:24 +0000
@@ -33,9 +33,12 @@
 public:
   typedef std::vector<FTToken> FTTokens;
 
+  FTTokenSeqIterator() { }
   FTTokenSeqIterator( FTTokens& );
   ~FTTokenSeqIterator();
 
+  void take( FTTokens& );
+
   // inherited
   index_t begin() const;
   index_t end() const;

=== modified file 'src/runtime/full_text/ft_token_span.h'
--- src/runtime/full_text/ft_token_span.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_token_span.h	2012-04-24 22:19:24 +0000
@@ -20,7 +20,7 @@
 #ifndef NDEBUG
 #include <iostream>
 #endif /* NDEBUG */
-#include <list>
+#include <vector>
 
 #include "zorbatypes/ft_token.h"
 
@@ -51,7 +51,7 @@
 /**
  * An %ft_token_spans contains zero or more ft_token_span objects.
  */
-typedef std::list<ft_token_span> ft_token_spans;
+typedef std::vector<ft_token_span> ft_token_spans;
 
 ////////// Comparison operators ///////////////////////////////////////////////
 

=== added file 'src/runtime/full_text/ft_util.cpp'
--- src/runtime/full_text/ft_util.cpp	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_util.cpp	2012-04-24 22:19:24 +0000
@@ -0,0 +1,42 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "stdafx.h"
+
+#include <stdexcept>
+
+#include "diagnostics/xquery_diagnostics.h"
+#include "zorbatypes/numconversions.h"
+
+#include "ft_util.h"
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+ft_int to_ft_int( xs_integer const &i ) {
+  try {
+    return to_xs_unsignedInt( i );
+  }
+  catch ( std::range_error const& ) {
+    throw XQUERY_EXCEPTION( err::FOCA0003, ERROR_PARAMS( i.toString() ) );
+  }
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/ft_util.h'
--- src/runtime/full_text/ft_util.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ft_util.h	2012-04-24 22:19:24 +0000
@@ -21,6 +21,7 @@
 
 #include "compiler/expression/ftnode.h"
 #include "zorbatypes/schema_types.h"
+#include "util/cxx_util.h"
 
 #include "ft_match.h"
 
@@ -70,7 +71,7 @@
     if ( ftthesaurus_option const *const t = options->get_thesaurus_option() )
       if ( !t->no_thesaurus() )
         return t;
-  return 0;
+  return nullptr;
 }
 
 /**
@@ -87,6 +88,16 @@
   return false;
 }
 
+/**
+ * Attempts to convert an \c xs:integer to an \c xs:unsignedInt.
+ *
+ * @param i The \c xs:integer to convert.
+ * @return Returns the value converted to an \c xs:unsignedInt.
+ * @throws \c err::FOCA0003 if the value can not be represented as an \c
+ * xs:unsignedInt.
+ */
+ft_int to_ft_int( xs_integer const &i );
+
 } // namespace zorba
 #endif /* ZORBA_FULL_TEXT_UTIL_H */
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/ftcontains_visitor.cpp'
--- src/runtime/full_text/ftcontains_visitor.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/ftcontains_visitor.cpp	2012-04-24 22:19:24 +0000
@@ -27,7 +27,6 @@
 #include "util/cxx_util.h"
 #include "util/indent.h"
 #include "util/stl_util.h"
-#include "zorbatypes/numconversions.h"
 
 #ifndef NDEBUG
 #include "system/properties.h"
@@ -77,15 +76,6 @@
   return d.getNumber();
 }
 
-inline ft_int to_ft_int( xs_integer const &i ) {
-  try {
-    return to_xs_unsignedInt( i );
-  }
-  catch ( std::range_error const& ) {
-    throw XQUERY_EXCEPTION( err::FOCA0003, ERROR_PARAMS( i.toString() ) );
-  }
-}
-
 ////////// PUSH/POP ///////////////////////////////////////////////////////////
 
 /**

=== modified file 'src/runtime/full_text/full_text.h'
--- src/runtime/full_text/full_text.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/full_text.h	2012-04-24 22:19:24 +0000
@@ -40,7 +40,7 @@
   SERIALIZABLE_CLASS_CONSTRUCTOR2(FTContainsIterator,base_type);
   void serialize( serialization::Archiver& );
 
-  typedef std::list<PlanIter_t> sub_iter_list_t;
+  typedef std::vector<PlanIter_t> sub_iter_list_t;
 
   FTContainsIterator(
     static_context*,

=== modified file 'src/runtime/full_text/icu_tokenizer.cpp'
--- src/runtime/full_text/icu_tokenizer.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/icu_tokenizer.cpp	2012-04-24 22:19:24 +0000
@@ -16,6 +16,7 @@
 #include "stdafx.h"
 
 #include <cctype>
+#include <cstring>
 #include <unicode/unistr.h>
 
 #define DEBUG_TOKENIZER 0
@@ -54,6 +55,8 @@
 public:
   typedef Tokenizer::size_type size_type;
 
+  temp_token( iso639_1::type lang ) : lang_( lang ) { }
+
   void append( char const *s, size_type slen ) {
     value_.append( s, slen );
   }
@@ -66,12 +69,14 @@
     return value_.empty();
   }
 
-  void send( void *payload, Tokenizer::Callback &callback ) {
+  void send( Item const *item, Tokenizer::Callback &callback ) {
     if ( !empty() ) {
 #     if DEBUG_TOKENIZER
       cout << "TOKEN: \"" << value_ << "\" (" << pos_ << ',' << sent_ << ',' << para_ << ")\n";
 #     endif
-      callback( value_.data(), value_.size(), pos_, sent_, para_, payload );
+      callback.token(
+        value_.data(), value_.size(), lang_, pos_, sent_, para_, item
+      );
       clear();
     }
   }
@@ -87,6 +92,7 @@
 
 private:
   string value_;
+  iso639_1::type const lang_;
   size_type pos_, sent_, para_;
 };
 
@@ -158,6 +164,21 @@
   delete this;
 }
 
+void ICU_Tokenizer::properties( Properties *p ) const {
+  p->comments_separate_tokens = true;
+  p->elements_separate_tokens = true;
+  p->processing_instructions_separate_tokens = true;
+
+  p->languages.clear();
+  for ( int32_t n = ubrk_countAvailable(), i = 0; i < n; ++i ) {
+    if ( char const *const icu_locale = ubrk_getAvailable( i ) )
+      if ( iso639_1::type const lang = find_lang( icu_locale ) )
+        p->languages.push_back( lang );
+  }
+
+  p->uri = "http://www.zorba-xquery.com/full-text/tokenizer/icu";;
+}
+
 #define HANDLE_BACKSLASH()            \
   if ( !got_backslash ) ; else {      \
     got_backslash = in_wild = false;  \
@@ -174,9 +195,9 @@
 #define IS_WORD_BREAK(TYPE,STATUS) \
   ( (STATUS) >= UBRK_WORD_##TYPE && (STATUS) < UBRK_WORD_##TYPE##_LIMIT )
 
-void ICU_Tokenizer::tokenize( char const *utf8_s, size_type utf8_len,
-                              iso639_1::type lang, bool wildcards,
-                              Callback &callback, void *payload ) {
+void ICU_Tokenizer::tokenize_string( char const *utf8_s, size_type utf8_len,
+                                     iso639_1::type lang, bool wildcards,
+                                     Callback &callback, Item const *item ) {
   ZORBA_ASSERT( lang == lang_ );
 
   unicode::char_type *utf16_buf;
@@ -206,7 +227,7 @@
   sent_it_->setText( utf16_s );
   unicode::size_type sent_end = sent_it_->first(); sent_end = sent_it_->next();
 
-  temp_token t;
+  temp_token t( lang );
 
   // True only if the previous token was a backslash ('\').
   bool got_backslash = false;
@@ -295,8 +316,8 @@
     else if ( IS_WORD_BREAK( NUMBER, rule_status ) ) {
       //
       // "NUMBER" tokens are obviously for numbers.  Note that a sequence of
-      // digits containing a ',' (e.g., "1,2") is considered a single token by
-      // ICU.
+      // digits containing either a '.' (e.g., "98.6") or a ',' (e.g., "1,2")
+      // are considered a single tokens by ICU.
       //
 #     if DEBUG_TOKENIZER
       cout << "(NUMBER)" << endl;
@@ -346,7 +367,7 @@
     }
 
     if ( !in_wild && !got_backslash )
-      t.send( payload, callback );
+      t.send( item, callback );
 
 set_token:
 #   if DEBUG_TOKENIZER
@@ -395,7 +416,7 @@
     throw XQUERY_EXCEPTION(
       err::FTDY0020, ERROR_PARAMS( "", ZED( UnbalancedChar_3 ), '}' )
     );
-  t.send( payload, callback );
+  t.send( item, callback );
   // Incrementing "sent" here fixes:
   // https://bugs.launchpad.net/bugs/897800
   ++numbers().sent;
@@ -406,10 +427,18 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-Tokenizer::ptr
-ICU_TokenizerProvider::getTokenizer( iso639_1::type lang,
-                                     Tokenizer::Numbers &no ) const {
-  return Tokenizer::ptr( new ICU_Tokenizer( lang, no ) );
+bool ICU_TokenizerProvider::getTokenizer( iso639_1::type lang,
+                                          Tokenizer::Numbers *num,
+                                          Tokenizer::ptr *t ) const {
+  for ( int32_t n = ubrk_countAvailable(), i = 0; i < n; ++i ) {
+    if ( char const *const icu_locale = ubrk_getAvailable( i ) )
+      if ( lang == find_lang( icu_locale ) ) {
+        if ( num && t )
+          t->reset( new ICU_Tokenizer( lang, *num ) );
+        return true;
+      }
+  }
+  return false;
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/runtime/full_text/icu_tokenizer.h'
--- src/runtime/full_text/icu_tokenizer.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/icu_tokenizer.h	2012-04-24 22:19:24 +0000
@@ -48,8 +48,9 @@
 
   // inherited
   void destroy() const;
-  void tokenize( char const*, size_type, locale::iso639_1::type, bool,
-                 Callback&, void* );
+  void properties( Properties* ) const;
+  void tokenize_string( char const*, size_type, locale::iso639_1::type, bool,
+                        Callback&, Item const* );
 
 private:
   typedef std::unique_ptr<RuleBasedBreakIterator> rbbi_ptr;
@@ -63,10 +64,11 @@
 
 class ICU_TokenizerProvider : public TokenizerProvider {
 public:
-  ICU_TokenizerProvider () {}
+  ICU_TokenizerProvider() { }           // needed to work-around compiler bug
+
   // inherited
-  Tokenizer::ptr
-  getTokenizer( locale::iso639_1::type, Tokenizer::Numbers& ) const;
+  bool getTokenizer( locale::iso639_1::type, Tokenizer::Numbers* = 0,
+                     Tokenizer::ptr* = 0 ) const;
 };
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/runtime/full_text/latin_tokenizer.cpp'
--- src/runtime/full_text/latin_tokenizer.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/latin_tokenizer.cpp	2012-04-24 22:19:24 +0000
@@ -82,15 +82,26 @@
   return ++s < end ? *s : '\0';
 }
 
+void LatinTokenizer::properties( Properties *p ) const {
+  p->comments_separate_tokens = true;
+  p->elements_separate_tokens = true;
+  p->processing_instructions_separate_tokens = true;
+
+  p->languages.clear();
+  p->languages.push_back( iso639_1::en );
+
+  p->uri = "http://www.zorba-xquery.com/full-text/tokenizer/latin";;
+}
+
 #define HANDLE_BACKSLASH()            \
   if ( !got_backslash ) ; else {      \
     got_backslash = in_wild = false;  \
     break;                            \
   }
 
-void LatinTokenizer::tokenize( char const *s, size_type s_len,
-                                 iso639_1::type lang, bool wildcards,
-                                 Callback &callback, void *payload ) {
+void LatinTokenizer::tokenize_string( char const *s, size_type s_len,
+                                      iso639_1::type lang, bool wildcards,
+                                      Callback &callback, Item const *item ) {
   bool got_backslash = false;
   bool in_wild = false;
   string_type token;
@@ -167,7 +178,7 @@
     } else {
       if ( is_word_char( *s ) )
         token += *s;
-      else if ( send_token( token, callback, payload ) ) {
+      else if ( send_token( token, lang, callback, item ) ) {
         token.clear();
         t_type_ = t_generic;
       }
@@ -203,13 +214,13 @@
       }
   } // for
 
-  send_token( token, callback, payload );
+  send_token( token, lang, callback, item );
 }
 
 #define PRINT_TOKENS 0
 
-bool LatinTokenizer::send_token( string_type const &token,
-                                   Callback &callback, void *payload ) {
+bool LatinTokenizer::send_token( string_type const &token, iso639_1::type lang,
+                                 Callback &callback, Item const *item ) {
   if ( !token.empty() ) {
 #if PRINT_TOKENS
     cout <<   "t=" << setw(2) << numbers().token
@@ -219,8 +230,8 @@
 #endif /* PRINT_TOKENS */
 
     callback(
-      token.data(), token.size(),
-      numbers().token, numbers().sent, numbers().para, payload
+      token.data(), token.size(), lang,
+      numbers().token, numbers().sent, numbers().para, item
     );
     ++numbers().token;
     return true;
@@ -230,10 +241,17 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-Tokenizer::ptr
-LatinTokenizerProvider::getTokenizer( iso639_1::type lang,
-                                      Tokenizer::Numbers &num ) const {
-  return Tokenizer::ptr( new LatinTokenizer( num ) );
+bool LatinTokenizerProvider::getTokenizer( iso639_1::type lang,
+                                           Tokenizer::Numbers *num,
+                                           Tokenizer::ptr *t ) const {
+  switch ( lang ) {
+    case iso639_1::en:
+      if ( num && t )
+        t->reset( new LatinTokenizer( *num ) );
+      return true;
+    default:
+      return false;
+  }
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/runtime/full_text/latin_tokenizer.h'
--- src/runtime/full_text/latin_tokenizer.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/latin_tokenizer.h	2012-04-24 22:19:24 +0000
@@ -38,8 +38,9 @@
 
   // inherited
   void destroy() const;
-  void tokenize( char const*, size_type, locale::iso639_1::type, bool,
-                 Callback&, void* );
+  void properties( Properties* ) const;
+  void tokenize_string( char const*, size_type, iso639_1::type, bool, Callback&,
+                        Item const* );
 
 private:
   typedef zstring string_type;
@@ -56,7 +57,8 @@
   static bool is_word_begin_char( char );
   bool is_word_char( char );
   static char peek( char const *s, char const *end );
-  bool send_token( string_type const &token, Callback&, void* );
+  bool send_token( string_type const &token, locale::iso639_1::type, Callback&,
+                   Item const* );
 };
 
 ///////////////////////////////////////////////////////////////////////////////
@@ -64,8 +66,8 @@
 class LatinTokenizerProvider : public TokenizerProvider {
 public:
   // inherited
-  Tokenizer::ptr getTokenizer( locale::iso639_1::type,
-                               Tokenizer::Numbers& ) const;
+  bool getTokenizer( locale::iso639_1::type, Tokenizer::Numbers* = 0,
+                     Tokenizer::ptr* = 0 ) const;
 };
 
 ///////////////////////////////////////////////////////////////////////////////

=== added directory 'src/runtime/full_text/pregenerated'
=== added file 'src/runtime/full_text/pregenerated/ft_module.cpp'
--- src/runtime/full_text/pregenerated/ft_module.cpp	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/pregenerated/ft_module.cpp	2012-04-24 22:19:24 +0000
@@ -0,0 +1,362 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ 
+// ******************************************
+// *                                        *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME           *
+// *                                        *
+// ******************************************
+
+#include "stdafx.h"
+#include "zorbatypes/rchandle.h"
+#include "zorbatypes/zstring.h"
+#include "runtime/visitors/planiter_visitor.h"
+#include "runtime/full_text/ft_module.h"
+#include "system/globalenv.h"
+
+
+#include "store/api/iterator.h"
+
+namespace zorba {
+
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentLangIterator>
+CurrentLangIterator::class_factory<CurrentLangIterator>
+CurrentLangIterator::g_class_factory;
+
+
+void CurrentLangIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+CurrentLangIterator::~CurrentLangIterator() {}
+
+// </CurrentLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <HostLangIterator>
+HostLangIterator::class_factory<HostLangIterator>
+HostLangIterator::g_class_factory;
+
+
+void HostLangIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+HostLangIterator::~HostLangIterator() {}
+
+// </HostLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStemLangSupportedIterator>
+IsStemLangSupportedIterator::class_factory<IsStemLangSupportedIterator>
+IsStemLangSupportedIterator::g_class_factory;
+
+
+void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
+
+// </IsStemLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordIterator>
+IsStopWordIterator::class_factory<IsStopWordIterator>
+IsStopWordIterator::g_class_factory;
+
+
+void IsStopWordIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsStopWordIterator::~IsStopWordIterator() {}
+
+// </IsStopWordIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordLangSupportedIterator>
+IsStopWordLangSupportedIterator::class_factory<IsStopWordLangSupportedIterator>
+IsStopWordLangSupportedIterator::g_class_factory;
+
+
+void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
+
+// </IsStopWordLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsThesaurusLangSupportedIterator>
+IsThesaurusLangSupportedIterator::class_factory<IsThesaurusLangSupportedIterator>
+IsThesaurusLangSupportedIterator::g_class_factory;
+
+
+void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
+
+// </IsThesaurusLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsTokenizerLangSupportedIterator>
+IsTokenizerLangSupportedIterator::class_factory<IsTokenizerLangSupportedIterator>
+IsTokenizerLangSupportedIterator::g_class_factory;
+
+
+void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
+
+// </IsTokenizerLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StemIterator>
+StemIterator::class_factory<StemIterator>
+StemIterator::g_class_factory;
+
+
+void StemIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+StemIterator::~StemIterator() {}
+
+// </StemIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StripDiacriticsIterator>
+StripDiacriticsIterator::class_factory<StripDiacriticsIterator>
+StripDiacriticsIterator::g_class_factory;
+
+
+void StripDiacriticsIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+StripDiacriticsIterator::~StripDiacriticsIterator() {}
+
+// </StripDiacriticsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <ThesaurusLookupIterator>
+ThesaurusLookupIterator::class_factory<ThesaurusLookupIterator>
+ThesaurusLookupIterator::g_class_factory;
+
+
+void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
+
+ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
+
+ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
+
+
+void ThesaurusLookupIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </ThesaurusLookupIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeIterator>
+TokenizeIterator::class_factory<TokenizeIterator>
+TokenizeIterator::g_class_factory;
+
+
+void TokenizeIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizeIterator::~TokenizeIterator() {}
+
+TokenizeIteratorState::TokenizeIteratorState() {}
+
+TokenizeIteratorState::~TokenizeIteratorState() {}
+
+
+void TokenizeIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </TokenizeIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizerPropertiesIterator>
+TokenizerPropertiesIterator::class_factory<TokenizerPropertiesIterator>
+TokenizerPropertiesIterator::g_class_factory;
+
+
+void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
+
+// </TokenizerPropertiesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeStringIterator>
+TokenizeStringIterator::class_factory<TokenizeStringIterator>
+TokenizeStringIterator::g_class_factory;
+
+
+void TokenizeStringIterator::accept(PlanIterVisitor& v) const {
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizeStringIterator::~TokenizeStringIterator() {}
+
+TokenizeStringIteratorState::TokenizeStringIteratorState() {}
+
+TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
+
+
+void TokenizeStringIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </TokenizeStringIterator>
+
+#endif
+
+}
+
+

=== added file 'src/runtime/full_text/pregenerated/ft_module.h'
--- src/runtime/full_text/pregenerated/ft_module.h	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/pregenerated/ft_module.h	2012-04-24 22:19:24 +0000
@@ -0,0 +1,561 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ 
+// ******************************************
+// *                                        *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME           *
+// *                                        *
+// ******************************************
+#ifndef ZORBA_RUNTIME_FULL_TEXT_FT_MODULE_H
+#define ZORBA_RUNTIME_FULL_TEXT_FT_MODULE_H
+
+
+#include "common/shared_types.h"
+
+
+
+#include "runtime/base/narybase.h"
+#include "runtime/full_text/ft_token_seq_iterator.h"
+#include "runtime/full_text/thesaurus.h"
+
+
+namespace zorba {
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class CurrentLangIterator : public NaryBaseIterator<CurrentLangIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(CurrentLangIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(CurrentLangIterator,
+    NaryBaseIterator<CurrentLangIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
+  }
+
+  CurrentLangIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<CurrentLangIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~CurrentLangIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class HostLangIterator : public NaryBaseIterator<HostLangIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(HostLangIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(HostLangIterator,
+    NaryBaseIterator<HostLangIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
+  }
+
+  HostLangIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<HostLangIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~HostLangIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class IsStemLangSupportedIterator : public NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(IsStemLangSupportedIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsStemLangSupportedIterator,
+    NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
+  }
+
+  IsStemLangSupportedIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~IsStemLangSupportedIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class IsStopWordIterator : public NaryBaseIterator<IsStopWordIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(IsStopWordIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsStopWordIterator,
+    NaryBaseIterator<IsStopWordIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
+  }
+
+  IsStopWordIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<IsStopWordIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~IsStopWordIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class IsStopWordLangSupportedIterator : public NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(IsStopWordLangSupportedIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsStopWordLangSupportedIterator,
+    NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
+  }
+
+  IsStopWordLangSupportedIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~IsStopWordLangSupportedIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class IsThesaurusLangSupportedIterator : public NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(IsThesaurusLangSupportedIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsThesaurusLangSupportedIterator,
+    NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
+  }
+
+  IsThesaurusLangSupportedIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~IsThesaurusLangSupportedIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class IsTokenizerLangSupportedIterator : public NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(IsTokenizerLangSupportedIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(IsTokenizerLangSupportedIterator,
+    NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
+  }
+
+  IsTokenizerLangSupportedIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~IsTokenizerLangSupportedIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class StemIterator : public NaryBaseIterator<StemIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(StemIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(StemIterator,
+    NaryBaseIterator<StemIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
+  }
+
+  StemIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<StemIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~StemIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class StripDiacriticsIterator : public NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(StripDiacriticsIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(StripDiacriticsIterator,
+    NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
+  }
+
+  StripDiacriticsIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~StripDiacriticsIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class ThesaurusLookupIteratorState : public PlanIteratorState
+{
+public:
+  zstring phrase_; //
+  zstring relationship_; //
+  internal::Thesaurus::level_type at_least_; //
+  internal::Thesaurus::level_type at_most_; //
+  internal::Thesaurus::ptr thesaurus_; //
+  internal::Thesaurus::iterator::ptr tresult_; //
+
+  ThesaurusLookupIteratorState();
+
+  ~ThesaurusLookupIteratorState();
+
+  void reset(PlanState&);
+};
+
+class ThesaurusLookupIterator : public NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(ThesaurusLookupIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(ThesaurusLookupIterator,
+    NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
+  }
+
+  ThesaurusLookupIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~ThesaurusLookupIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+  void resetImpl(PlanState&) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class TokenizeIteratorState : public PlanIteratorState
+{
+public:
+  store::Item_t doc_item_; //
+  FTTokenIterator_t doc_tokens_; //
+  store::Item_t token_qname_; //
+
+  TokenizeIteratorState();
+
+  ~TokenizeIteratorState();
+
+  void reset(PlanState&);
+};
+
+class TokenizeIterator : public NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(TokenizeIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeIterator,
+    NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>*)this);
+  }
+
+  TokenizeIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<TokenizeIterator, TokenizeIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~TokenizeIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+  void resetImpl(PlanState&) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class TokenizerPropertiesIterator : public NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(TokenizerPropertiesIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizerPropertiesIterator,
+    NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
+  }
+
+  TokenizerPropertiesIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~TokenizerPropertiesIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class TokenizeStringIteratorState : public PlanIteratorState
+{
+public:
+  FTTokenSeqIterator string_tokens_; //
+
+  TokenizeStringIteratorState();
+
+  ~TokenizeStringIteratorState();
+
+  void reset(PlanState&);
+};
+
+class TokenizeStringIterator : public NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(TokenizeStringIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeStringIterator,
+    NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar)
+  {
+    serialize_baseclass(ar,
+    (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
+  }
+
+  TokenizeStringIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~TokenizeStringIterator();
+
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+  void resetImpl(PlanState&) const;
+};
+
+#endif
+
+}
+#endif
+/*
+ * Local variables:
+ * mode: c++
+ * End:
+ */ 

=== modified file 'src/runtime/full_text/stemmer.cpp'
--- src/runtime/full_text/stemmer.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer.cpp	2012-04-24 22:19:24 +0000
@@ -43,7 +43,8 @@
   return default_provider;
 }
 
-Stemmer::ptr StemmerProvider::get_stemmer( iso639_1::type lang ) const {
+bool StemmerProvider::getStemmer( iso639_1::type lang,
+                                  Stemmer::ptr *result ) const {
   typedef unique_ptr<SnowballStemmer const> cache_ptr;
 
   static cache_ptr cached_stemmers[ iso639_1::NUM_ENTRIES ];
@@ -56,7 +57,12 @@
   cache_ptr &ptr_ref = cached_stemmers[ lang ];
   if ( !ptr_ref )
     ptr_ref.reset( SnowballStemmer::create( lang ) );
-  return Stemmer::ptr( ptr_ref.get() );
+  if ( ptr_ref ) {
+    if ( result )
+      result->reset( ptr_ref.get() );
+    return true;
+  }
+  return false;
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/runtime/full_text/stemmer.h'
--- src/runtime/full_text/stemmer.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer.h	2012-04-24 22:19:24 +0000
@@ -41,11 +41,28 @@
                           ztd::destroy_delete<Stemmer const> > ptr;
 
   /**
+   * Various properties of this %Stemmer.
+   */
+  struct Properties {
+    /**
+     * The URI that uniquely identifies this %Stemmer.
+     */
+    char const * uri;
+  };
+
+  /**
    * Destroys this %Stemmer.
    */
   virtual void destroy() const = 0;
 
   /**
+   * Gets the Properties of this %Stemmer.
+   *
+   * @param result The Properties to populate.
+   */
+  virtual void properties( Properties *result ) const = 0;
+
+  /**
    * Gets the stem of the given word.
    *
    * @param word The word to stem.
@@ -74,13 +91,15 @@
   static StemmerProvider const& get_default();
 
   /**
-   * Gets an instance of a Stemmer for the given language.
+   * Gets a Stemmer for the given language.
    *
-   * @param lang The language for the stemmer.
-   * @return Returns said Stemmer or \c nullptr if no stemmer is availabe for
-   * the given language.
+   * @param lang The language to get a Stemmer for.
+   * @param s If not \c null, set to point to a Stemmer for \a lang.
+   * @return Returns \c true only if this provider can provide a stemmer for
+   * \a lang.
    */
-  virtual Stemmer::ptr get_stemmer( locale::iso639_1::type lang ) const;
+  virtual bool getStemmer( locale::iso639_1::type lang,
+                           Stemmer::ptr *s = 0 ) const;
 };
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/runtime/full_text/stemmer/sb_stemmer.cpp'
--- src/runtime/full_text/stemmer/sb_stemmer.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer/sb_stemmer.cpp	2012-04-24 22:19:24 +0000
@@ -32,18 +32,21 @@
 static bool is_lang_supported( iso639_1::type lang ) {
   using namespace iso639_1;
   switch ( lang ) {
-    case da:
-    case de:
-    case en:
-    case es:
-    case fi:
-    case hu:
-    case it:
-    case nl:
-    case no:
-    case pt:
-    case sv:
-    case ru:
+    case da:  // Danish
+    case de:  // German
+    case en:  // English
+    case es:  // Spanish
+    case fi:  // Finnish
+    case fr:  // French
+    case hu:  // Hungarian
+    case it:  // Italian
+    case nl:  // Dutch
+    case no:  // Norwegian
+    case pt:  // Portuguese
+    case ro:  // Romanian
+    case ru:  // Russian
+    case sv:  // Swedish
+    case tr:  // Turkish
       return true;
     default:
       return false;
@@ -70,7 +73,11 @@
   // Do nothing since built-in stemmers are cached for re-use.
 }
 
-void SnowballStemmer::stem( zstring const &word, iso639_1::type lang,
+void SnowballStemmer::properties( Properties *p ) const {
+  p->uri = "http://www.zorba-xquery.com/full-text/stemmer/snowball";;
+}
+
+void SnowballStemmer::stem( zstring const &word, iso639_1::type,
                             zstring *result ) const {
   //
   // We need a mutex since the libstemmer library is not thread-safe.

=== modified file 'src/runtime/full_text/stemmer/sb_stemmer.h'
--- src/runtime/full_text/stemmer/sb_stemmer.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/stemmer/sb_stemmer.h	2012-04-24 22:19:24 +0000
@@ -43,6 +43,7 @@
 
   // inherited
   void destroy() const;
+  void properties( Properties* ) const;
   void stem( zstring const &word, locale::iso639_1::type lang,
              zstring *result ) const;
 

=== modified file 'src/runtime/full_text/thesauri/wn_thesaurus.cpp'
--- src/runtime/full_text/thesauri/wn_thesaurus.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/wn_thesaurus.cpp	2012-04-24 22:19:24 +0000
@@ -23,6 +23,8 @@
 
 #include <zorba/util/path.h>
 
+#include <context/static_context.h>
+
 #include "util/cxx_util.h"
 #include "util/fs_util.h"
 #include "util/less.h"
@@ -56,6 +58,18 @@
 ////////// Helper functions ///////////////////////////////////////////////////
 
 /**
+ * Appends the name of the file Zorba uses for a WordNet thesaurus files.
+ *
+ * @param path The path to append to.
+ * @param lang The language of the thesaurus file.
+ */
+static void append_wordnet_file( zstring &path, iso639_1::type lang ) {
+  fs::append( path, "wordnet-" );
+  path += iso639_1::string_of[ lang ];
+  path += ".zth";
+}
+
+/**
  * "Fixes" the "at most" parameter.  The Full Text specification section 3.4.3
  * says in part:
  *
@@ -70,8 +84,10 @@
  * broad, hence if at_most specifies "all levels" (max int), clamp it at 2
  * (which seems to work well in practice).
  */
-inline ft_int fix_at_most( ft_int at_most ) {
-  return at_most == numeric_limits<ft_int>::max() ? 2 : at_most;
+inline internal::Thesaurus::level_type
+fix_at_most( internal::Thesaurus::level_type at_most ) {
+  return at_most == numeric_limits<internal::Thesaurus::level_type>::max() ?
+    2 : at_most;
 }
 
 /**
@@ -191,9 +207,7 @@
   for ( bool loop = true; loop; ) {
     switch ( fs::get_type( path ) ) {
       case fs::directory:
-        fs::append( path, "wordnet-" );
-        path += iso639_1::string_of[ iso639_1::en ];
-        path += ".zth";
+        append_wordnet_file( path, iso639_1::en );
         break;
       case fs::file:
         loop = false;
@@ -216,7 +230,7 @@
  *
  * @param relationship The XQuery thesaurus relationship.
  * @param lang The language of the relationship.
- * @return Returns the corresponding Wordnet pointer type.
+ * @return Returns the corresponding WordNet pointer type.
  */
 static pointer::type map_xquery_rel( zstring const &relationship,
                                      iso639_1::type lang ) {
@@ -233,8 +247,8 @@
   thesaurus::iterator::LevelMarker = make_pair( ~0u, iso2788::neutral );
 
 thesaurus::iterator::iterator( thesaurus const &t, char const *p,
-                               pointer::type ptr_type, ft_int at_least,
-                               ft_int at_most ) :
+                               pointer::type ptr_type, level_type at_least,
+                               level_type at_most ) :
   thesaurus_( t ), query_ptr_type_( ptr_type ),
   at_least_( at_least ), at_most_( fix_at_most( at_most ) ), level_( 0 )
 {
@@ -506,7 +520,7 @@
 
 thesaurus::iterator::ptr
 thesaurus::lookup( zstring const &phrase, zstring const &relationship,
-                   ft_int at_least, ft_int at_most ) const {
+                   level_type at_least, level_type at_most ) const {
   iterator::ptr result;
 # if DEBUG_FT_THESAURUS
   cout << "==================================================" << endl;
@@ -524,6 +538,62 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+provider::provider( zstring const &path ) : path_( path ) {
+  ZORBA_ASSERT( !path.empty() );
+}
+
+bool provider::getThesaurus( iso639_1::type lang,
+                             internal::Thesaurus::ptr *t ) const {
+#ifdef ZORBA_WITH_FILE_ACCESS
+  zstring file_path;
+
+  switch ( lang ) {
+    case iso639_1::unknown:
+      lang = iso639_1::en;
+      // no break;
+    case iso639_1::en:
+      file_path = path_;
+      append_wordnet_file( file_path, lang );
+      break;
+    default:
+      return false;
+  }
+
+  //
+  // We want to look for the WordNet thesaurus file on the library path.
+  // Unfortunately every static_context can have its own library path and we
+  // don't have direct access to the query's static_context here.  So, for now
+  // we only look on the root static_context's library path.
+  //
+  static_context &sctx = GENV.getRootStaticContext();
+  std::vector<zstring> lib_path_components;
+  sctx.get_full_lib_path( lib_path_components );
+  MUTATE_EACH( std::vector<zstring>, path, lib_path_components ) {
+    fs::append( *path, file_path );
+    if ( fs::get_type( *path ) == fs::file ) {
+      if ( t )
+        t->reset( new thesaurus( *path, lang ) );
+      return true;
+    }
+  }
+  return false;
+#else
+  switch ( lang ) {
+    case iso639_1::unknown:
+      lang = iso639_1::en;
+      // no break;
+    case iso639_1::en:
+      if ( t )
+        t->reset();
+      return true;
+    default:
+      return false;
+  }
+#endif /* ZORBA_WITH_FILE_ACCESS */
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace wordnet
 } // namespace zorba
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/thesauri/wn_thesaurus.h'
--- src/runtime/full_text/thesauri/wn_thesaurus.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/wn_thesaurus.h	2012-04-24 22:19:24 +0000
@@ -34,7 +34,7 @@
 ///////////////////////////////////////////////////////////////////////////////
 
 /**
- * A %wordnet::thesaurus is an ft_thesaurus for Wordnet.
+ * A %wordnet::thesaurus is a Thesaurus for Wordnet.
  * See: http://wordnet.princeton.edu/
  */
 class thesaurus : public internal::Thesaurus {
@@ -44,7 +44,8 @@
 
   // inherited
   void destroy() const;
-  iterator::ptr lookup( zstring const&, zstring const&, ft_int, ft_int ) const;
+  iterator::ptr lookup( zstring const&, zstring const&, level_type,
+                        level_type ) const;
 
 private:
   /**
@@ -86,7 +87,7 @@
 
   private:
     iterator( thesaurus const&, char const *lemma, pointer::type,
-              ft_int at_least, ft_int at_most );
+              level_type at_least, level_type at_most );
     ~iterator();
 
     thesaurus const &thesaurus_;
@@ -97,8 +98,8 @@
      */
     pointer::type query_ptr_type_;
   
-    ft_int const at_least_, at_most_;
-    ft_int level_;
+    level_type const at_least_, at_most_;
+    level_type level_;
   
     typedef std::pair<synset_id_t,iso2788::rel_dir> candidate_t;
     typedef std::deque<candidate_t> candidate_queue_t;
@@ -124,6 +125,29 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+/**
+ * A %wordnet::provider is a ThesaurusProvider for Wordnet.
+ */
+class provider : public internal::ThesaurusProvider {
+public:
+  /**
+   * Constructs a %provider.
+   *
+   * @param path The relative path of where the wordnet-LL.zth file is located
+   * (where LL is the ISO 639-1 language code of the language).
+   */
+  provider( zstring const &path );
+
+  // inherited
+  bool getThesaurus( locale::iso639_1::type,
+                     internal::Thesaurus::ptr* = nullptr ) const;
+
+private:
+  zstring const path_;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace wordnet
 } // namespace zorba
 

=== modified file 'src/runtime/full_text/thesauri/xqftts_thesaurus.cpp'
--- src/runtime/full_text/thesauri/xqftts_thesaurus.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/xqftts_thesaurus.cpp	2012-04-24 22:19:24 +0000
@@ -60,8 +60,8 @@
     make_pair( static_cast<synonym*>(0), iso2788::neutral );
 
 thesaurus::iterator::iterator( thesaurus_t const &t, zstring const &phrase,
-                               zstring const &rel_string, ft_int at_least,
-                               ft_int at_most ) :
+                               zstring const &rel_string, level_type at_least,
+                               level_type at_most ) :
   thesaurus_( t ), at_least_( at_least ), at_most_( at_most ), level_( 1 )
 {
   using namespace iso2788;
@@ -217,7 +217,7 @@
 
 thesaurus::iterator::ptr
 thesaurus::lookup( zstring const &phrase, zstring const &relationship,
-                   ft_int at_least, ft_int at_most ) const {
+                   level_type at_least, level_type at_most ) const {
 # if DEBUG_THESAURUS
   cout << "==================================================" << endl;
   cout << "query phrase: " << phrase << endl;
@@ -364,6 +364,31 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+provider::provider( zstring const &path ) : path_( path ) {
+}
+
+bool provider::getThesaurus( iso639_1::type lang,
+                             internal::Thesaurus::ptr *t ) const {
+  switch ( lang ) {
+    case iso639_1::unknown:
+      lang = iso639_1::en;
+      // no break;
+    case iso639_1::en:
+      if ( t ) {
+#ifdef ZORBA_WITH_FILE_ACCESS
+        t->reset( new thesaurus( path_, lang ) );
+#else
+        t->reset();
+#endif /* ZORBA_WITH_FILE_ACCESS */
+      }
+      return true;
+    default:
+      return false;
+  }
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace xqftts
 } // namespace zorba
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/thesauri/xqftts_thesaurus.h'
--- src/runtime/full_text/thesauri/xqftts_thesaurus.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesauri/xqftts_thesaurus.h	2012-04-24 22:19:24 +0000
@@ -44,7 +44,8 @@
 
   // inherited
   void destroy() const;
-  iterator::ptr lookup( zstring const&, zstring const&, ft_int, ft_int ) const;
+  iterator::ptr lookup( zstring const&, zstring const&, level_type,
+                        level_type ) const;
 
 private:
   //
@@ -123,13 +124,14 @@
 
   private:
     iterator( thesaurus_t const&, zstring const &phrase,
-              zstring const &relationship, ft_int at_least, ft_int at_most );
+              zstring const &relationship, level_type at_least,
+              level_type at_most );
     ~iterator();
 
     thesaurus_t const &thesaurus_;
 
-    ft_int const at_least_, at_most_;
-    ft_int level_;
+    level_type const at_least_, at_most_;
+    level_type level_;
 
     typedef std::pair<synonym const*,iso2788::rel_dir> candidate_t;
     typedef std::deque<candidate_t> candidate_queue_t;
@@ -155,6 +157,28 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+/**
+ * A %xqftts::provider is a ThesaurusProvider for XQFTTS.
+ */
+class provider : public internal::ThesaurusProvider {
+public:
+  /**
+   * Constructs a %provider.
+   *
+   * @param path The absolute path of the thesaurus XML file.
+   */
+  provider( zstring const &path );
+
+  // inherited
+  bool getThesaurus( locale::iso639_1::type,
+                     internal::Thesaurus::ptr* = nullptr ) const;
+
+private:
+  zstring const path_;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace xqftts
 } // namespace zorba
 

=== modified file 'src/runtime/full_text/thesaurus.cpp'
--- src/runtime/full_text/thesaurus.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesaurus.cpp	2012-04-24 22:19:24 +0000
@@ -31,6 +31,7 @@
 #include "thesaurus.h"
 #ifdef ZORBA_WITH_FILE_ACCESS
 # include "thesauri/wn_thesaurus.h"
+# include "zorbatypes/URI.h"
 #endif
 #include "thesauri/xqftts_thesaurus.h"
 
@@ -57,7 +58,8 @@
   type const DEFAULT = wordnet;
 
   /**
-   * Given a thesaurus implementation name, finds its corresponding type.
+   * Given a thesaurus implementation name (as identified by the URI scheme),
+   * finds its corresponding type.
    *
    * @param name The thesaurus implementation's name.
    * @return Returns the implementation's type or \c unknown.
@@ -66,7 +68,6 @@
     typedef map<char const*,type> impl_map_t;
     static impl_map_t impl_map;
     if ( impl_map.empty() ) {
-      impl_map[ "default" ] = DEFAULT;
       impl_map[ "wordnet" ] = wordnet;
       impl_map[ "xqftts"  ] = xqftts;
     }
@@ -78,28 +79,6 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-/**
- * Parses a thesaurus mapping string.  A mapping string is of the form:
- *
- *  [implementation_name|]URI
- *
- * @param mapping The mapping to parse.
- * @param t A pointer to receive the implementation type.
- * @param uri A pointer to the string to receive the URI.
- */
-static void parse_mapping( zstring const &mapping, thesaurus_impl::type *t,
-                           zstring *uri ) {
-  zstring impl_name;
-  if ( zorba::ztd::split( mapping, '|', &impl_name, uri ) ) {
-    *t = thesaurus_impl::find( impl_name );
-  } else {
-    *t = thesaurus_impl::DEFAULT;
-    *uri = mapping;
-  }
-}
-
-///////////////////////////////////////////////////////////////////////////////
-
 Thesaurus::iterator::~iterator() {
   // out-of-line since it's virtual
 }
@@ -112,36 +91,41 @@
 
 Resource*
 ThesaurusURLResolver::resolveURL( zstring const &url, EntityData const *data ) {
+  // Only resolve thesaurus URLs
   if ( data->getKind() != internal::EntityData::THESAURUS )
     return nullptr;
-  ThesaurusEntityData const *const t_data =
-    dynamic_cast<ThesaurusEntityData const*>( data );
-  iso639_1::type const lang = t_data->getLanguage();
-
-  thesaurus_impl::type t_impl;
-  zstring mapped_url;
-  parse_mapping( url, &t_impl, &mapped_url );
-
-  zstring t_path;
-  switch ( uri::get_scheme( mapped_url ) ) {
-    case uri::file:
-    case uri::none:
-      t_path = fs::get_normalized_path( mapped_url );
-      break;
-    default:
-      throw XQUERY_EXCEPTION(
-        zerr::ZXQP0004_NOT_IMPLEMENTED,
-        ERROR_PARAMS( ZED( NonFileThesaurusURI ) )
-      );
-  }
-
-  switch ( t_impl ) {
+
+  zstring const url_copy(
+    url == "##default" ? "wordnet://wordnet.princeton.edu/": url
+  );
+
+  zstring scheme_name;
+  if ( !uri::get_scheme( url_copy, &scheme_name ) )
+    return nullptr;
+
+  switch ( thesaurus_impl::find( scheme_name ) ) {
+    case thesaurus_impl::xqftts: {
+      //
+      // Currently, we presume that an "xqftts:" URL should be used exactly
+      // like a "file:" URL.
+      //
+      zstring t_uri( url_copy );
+      t_uri.replace( 0, 6, "file" );    // xqftts -> file
+      zstring const t_path( fs::get_normalized_path( t_uri ) );
+      return new xqftts::provider( t_path );
+    }
 #   ifdef ZORBA_WITH_FILE_ACCESS
-    case thesaurus_impl::wordnet:
-      return new wordnet::thesaurus( t_path, lang );
+    case thesaurus_impl::wordnet: {
+      //
+      // Wordnet, on the other hand, needs to find its data file in Zorba's
+      // library path using the mangled form of the original URI.  So, mangle
+      // here for convenience.
+      //
+      URI const t_uri( url_copy );
+      zstring const t_path( t_uri.toPathNotation() );
+      return new wordnet::provider( t_path );
+    }
 #   endif /* ZORBA_WITH_FILE_ACCESS */
-    case thesaurus_impl::xqftts:
-      return new xqftts::thesaurus( t_path, lang );
     default:
       throw XQUERY_EXCEPTION( err::FTST0018, ERROR_PARAMS( url ) );
   }

=== modified file 'src/runtime/full_text/thesaurus.h'
--- src/runtime/full_text/thesaurus.h	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/thesaurus.h	2012-04-24 22:19:24 +0000
@@ -34,9 +34,14 @@
 /**
  * A %Thesaurus is the abstract base class for thesaurus implementations.
  */
-class Thesaurus : public internal::Resource {
+class Thesaurus {
 public:
-  typedef std::unique_ptr<Thesaurus,ztd::destroy_delete<Thesaurus> > ptr;
+  typedef ft_int level_type;
+
+  typedef std::unique_ptr<
+            Thesaurus const,ztd::destroy_delete<Thesaurus const>
+          >
+          ptr;
 
   /**
    * An %iterator is used to iterate over lookup results.
@@ -82,8 +87,9 @@
    * the phrase was not found.
    */
   virtual iterator::ptr lookup( zstring const &phrase,
-                                zstring const &relationship, ft_int at_least,
-                                ft_int at_most ) const = 0;
+                                zstring const &relationship,
+                                level_type at_least,
+                                level_type at_most ) const = 0;
 
 protected:
   Thesaurus() { }
@@ -97,6 +103,26 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+/**
+ * A %ThesaurusProvider is-a Resource for providing thesauri for a given
+ * language.
+ */
+class ThesaurusProvider : public internal::Resource {
+public:
+  /**
+   * Gets a Thesaurus for the given language.
+   *
+   * @param lang The desired language of the thesaurus.
+   * @param t If not \c null, set to point to a Thesaurus for \a lang.
+   * @return Returns \c true only if this provider can provide a thesaurus for
+   * \a lang.
+   */
+  virtual bool getThesaurus( locale::iso639_1::type lang,
+                             Thesaurus::ptr *t = nullptr ) const = 0;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace internal
 } // namespace zorba
 #endif  /* ZORBA_THESAURUS_H */

=== modified file 'src/runtime/full_text/tokenizer.cpp'
--- src/runtime/full_text/tokenizer.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/full_text/tokenizer.cpp	2012-04-24 22:19:24 +0000
@@ -15,24 +15,98 @@
  */
 #include "stdafx.h"
 
+#include <zorba/item.h>
+#include <zorba/iterator.h>
+#include <zorba/store_consts.h>
 #include <zorba/tokenizer.h>
+#include <zorba/zorba_string.h>
+
+#include "diagnostics/assert.h"
+#include "store/api/store.h"
+#include "system/globalenv.h"
+#include "zorbamisc/ns_consts.h"
+#include "zorbautils/locale.h"
+
+using namespace zorba::locale;
 
 namespace zorba {
 
 ///////////////////////////////////////////////////////////////////////////////
 
-Tokenizer::Tokenizer( Numbers &no, int trace_options ) :
-  trace_options_( trace_options ),
-  no_( &no )
-{
-}
-
 Tokenizer::~Tokenizer() {
   // out-of-line since it's virtual
 }
 
-void Tokenizer::element( Item const&, int ) {
-  // do nothing
+bool Tokenizer::find_lang_attribute( Item const &item, iso639_1::type *lang ) {
+  bool found_lang = false;
+  if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
+    Iterator_t i( item.getAttributes() );
+    i->open();
+    for ( Item attr; i->next( attr ); ) {
+      Item qname;
+      if ( attr.getNodeName( qname ) &&
+          qname.getLocalName() == "lang" && qname.getNamespace() == XML_NS ) {
+        *lang = locale::find_lang( attr.getStringValue().c_str() );
+        found_lang = true;
+        break;
+      }
+    }
+    i->close();
+  }
+  return found_lang;
+}
+
+void Tokenizer::item( Item const &item, bool entering ) {
+  if ( entering && item.isNode() &&
+       item.getNodeKind() == store::StoreConsts::elementNode ) {
+    ++numbers().para;
+  }
+}
+
+void Tokenizer::tokenize_node_impl( Item const &item, iso639_1::type lang,
+                                    Callback &callback, bool tokenize_acp ) {
+  if ( item.isNode() ) {
+    Iterator_t i;
+    Tokenizer *t_raw = this;
+    Tokenizer::ptr t_ptr;
+
+    this->item( item, true );
+    callback.item( item, true );
+
+    switch ( item.getNodeKind() ) {
+      case store::StoreConsts::elementNode:
+        if ( find_lang_attribute( item, &lang ) ) {
+          TokenizerProvider const *const p = GENV_STORE.getTokenizerProvider();
+          ZORBA_ASSERT( p );
+          if ( !p->getTokenizer( lang, numbers_, &t_ptr ) )
+            break;
+          t_raw = t_ptr.get();
+        }
+        // no break;
+
+      case store::StoreConsts::documentNode:
+        i = item.getChildren();
+        i->open();
+        for ( Item child; i->next( child ); )
+          t_raw->tokenize_node_impl( child, lang, callback, false );
+        i->close();
+        break;
+
+      case store::StoreConsts::attributeNode:
+      case store::StoreConsts::commentNode:
+      case store::StoreConsts::piNode:
+        if ( !tokenize_acp )
+          break;
+      case store::StoreConsts::textNode: {
+        String const s( item.getStringValue() );
+        tokenize_string( s.data(), s.size(), lang, false, callback, &item );
+        break;
+      }
+    } // switch
+
+    this->item( item, false );
+    callback.item( item, false );
+  }
 }
 
 Tokenizer::Numbers::Numbers() {
@@ -44,6 +118,10 @@
   // out-of-line since it's virtual
 }
 
+void Tokenizer::Callback::item( Item const&, bool ) {
+  // out-of-line since it's virtual
+}
+
 ///////////////////////////////////////////////////////////////////////////////
 
 TokenizerProvider::~TokenizerProvider() {

=== modified file 'src/runtime/spec/codegen-cpp.xq'
--- src/runtime/spec/codegen-cpp.xq	2012-04-24 12:39:38 +0000
+++ src/runtime/spec/codegen-cpp.xq	2012-04-24 22:19:24 +0000
@@ -95,7 +95,8 @@
 
             if (exists($iter/@preprocessorGuard))
             then
-              concat($gen:newline, "#endif")
+              concat($gen:newline, "#endif
+")
             else 
               ""
             )
@@ -194,7 +195,7 @@
     if (count($function/zorba:signature) = 0)
     then 
       (: TODO user fn:error :)
-      'Error: could not find "prefix" and "localname" attributes for "zorba:function" element'
+      'Error: could not find \"prefix\" and \"localname\" attributes for \"zorba:function\" element'
     else
       let $name := concat(local:function-name($function), $suffix)
       let $ret := if($iter/@name = "") then "return NULL;"
@@ -275,7 +276,8 @@
           ($gen:newline,
            if (exists($iter/@preprocessorGuard))
            then
-             concat($gen:newline, $iter/@preprocessorGuard)
+             concat($gen:newline, $iter/@preprocessorGuard, "
+")
            else 
              "",
            $gen:indent,
@@ -336,7 +338,13 @@
               '),', $gen:newline, gen:indent(4), 
               'FunctionConsts::', gen:function-kind($sig) ,');',
               $gen:newline, $gen:newline, $gen:indent,
-            '}', $gen:newline, $gen:newline
+            '}', $gen:newline, $gen:newline,
+            if (exists($iter/@preprocessorGuard))
+            then
+              concat($gen:newline, "#endif
+")
+            else 
+              ""
             ),
         ''),
       '')
@@ -351,7 +359,7 @@
     then 
       $tmp/@uri
     else  (: TODO user fn:error :)
-      'Error: could not find "prefix" and "localname" attributes for "zorba:function" element'  
+      'Error: could not find \"prefix\" and \"localname\" attributes for \"zorba:function\" element'  
 };
 
 

=== modified file 'src/runtime/spec/codegen-h.xq'
--- src/runtime/spec/codegen-h.xq	2012-04-24 12:39:38 +0000
+++ src/runtime/spec/codegen-h.xq	2012-04-24 22:19:24 +0000
@@ -146,7 +146,7 @@
     if(count($function/zorba:signature) = 0)
     then
       (: TODO user fn:error :)
-      'Error: could not find "prefix" and "localname" attributes for "zorba:function" element'
+      'Error: could not find \"prefix\" and \"localname\" attributes for \"zorba:function\" element'
     else
       local:create-function-XQuery-30($iter, $function)
       (: local:create-function-arity($iter, $function, xs:integer(1)) :)

=== added directory 'src/runtime/spec/full_text'
=== added file 'src/runtime/spec/full_text/ft_module.xml'
--- src/runtime/spec/full_text/ft_module.xml	1970-01-01 00:00:00 +0000
+++ src/runtime/spec/full_text/ft_module.xml	2012-04-24 22:19:24 +0000
@@ -0,0 +1,247 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<zorba:iterators
+  xmlns:zorba="http://www.zorba-xquery.com";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://www.zorba-xquery.com ../runtime.xsd">
+
+<zorba:header>
+  <zorba:include form="Quoted">runtime/full_text/ft_token_seq_iterator.h</zorba:include>
+  <zorba:include form="Quoted">runtime/full_text/thesaurus.h</zorba:include>
+</zorba:header>
+
+<zorba:source>
+  <zorba:include form="Quoted">store/api/iterator.h</zorba:include>
+</zorba:source>
+
+<zorba:iterator name="CurrentLangIterator"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="current-lang" prefix="full-text">
+      <zorba:output>xs:language</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="HostLangIterator"
+                      preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="host-lang" prefix="full-text">
+      <zorba:output>xs:language</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsStemLangSupportedIterator"
+                      preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="is-stem-lang-supported" prefix="full-text">
+      <zorba:param>xs:language</zorba:param>
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsStopWordIterator"
+                      preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="is-stop-word" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- word -->
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="is-stop-word" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- word -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsStopWordLangSupportedIterator"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="is-stop-word-lang-supported" prefix="full-text">
+      <zorba:param>xs:language</zorba:param>
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsThesaurusLangSupportedIterator"
+                      preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="is-thesaurus-lang-supported" prefix="full-text">
+      <zorba:param>xs:language</zorba:param>
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="is-thesaurus-lang-supported" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- URI -->
+      <zorba:param>xs:language</zorba:param>
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="IsTokenizerLangSupportedIterator"
+                      preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="is-tokenizer-lang-supported" prefix="full-text">
+      <zorba:param>xs:language</zorba:param>
+      <zorba:output>xs:boolean</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="StemIterator"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="stem" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- word -->
+      <zorba:output>xs:string</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="stem" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- word -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:output>xs:string</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="StripDiacriticsIterator"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="strip-diacritics" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- phrase -->
+      <zorba:output>xs:string</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+</zorba:iterator>
+
+<zorba:iterator name="ThesaurusLookupIterator"
+                generateResetImpl="true"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:function>
+    <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- phrase -->
+      <zorba:output>xs:string+</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- URI -->
+      <zorba:param>xs:string</zorba:param>    <!-- phrase -->
+      <zorba:output>xs:string+</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- URI -->
+      <zorba:param>xs:string</zorba:param>    <!-- phrase -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:output>xs:string+</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- URI -->
+      <zorba:param>xs:string</zorba:param>    <!-- phrase -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:param>xs:string</zorba:param>    <!-- relationship -->
+      <zorba:output>xs:string+</zorba:output>
+    </zorba:signature>
+    <zorba:signature localname="thesaurus-lookup" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- URI -->
+      <zorba:param>xs:string</zorba:param>    <!-- phrase -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:param>xs:string</zorba:param>    <!-- relationship -->
+      <zorba:param>xs:integer</zorba:param>   <!-- level-least -->
+      <zorba:param>xs:integer</zorba:param>   <!-- level-most -->
+      <zorba:output>xs:string+</zorba:output>
+    </zorba:signature>
+  </zorba:function>
+  <zorba:state generateInit="use-default">
+    <zorba:member type="zstring" name="phrase_"/>
+    <zorba:member type="zstring" name="relationship_"/>
+    <zorba:member type="internal::Thesaurus::level_type" name="at_least_"/>
+    <zorba:member type="internal::Thesaurus::level_type" name="at_most_"/>
+    <zorba:member type="internal::Thesaurus::ptr" name="thesaurus_"/>
+    <zorba:member type="internal::Thesaurus::iterator::ptr" name="tresult_"/>
+  </zorba:state>
+</zorba:iterator>
+
+<zorba:iterator name="TokenizeIterator"
+                generateResetImpl="true"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+
+  <zorba:function generateCodegen="false" generateDECL="false">
+
+    <zorba:signature localname="tokenize" prefix="full-text">
+      <zorba:param>node()</zorba:param>       <!-- doc -->
+      <zorba:output>node()*</zorba:output>
+    </zorba:signature>
+
+    <zorba:signature localname="tokenize" prefix="full-text">
+      <zorba:param>node()</zorba:param>       <!-- doc -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:output>node()*</zorba:output>
+    </zorba:signature>
+
+  </zorba:function>
+
+  <zorba:state generateInit="use-default">
+    <zorba:member type="store::Item_t" name="doc_item_"/>
+    <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
+    <zorba:member type="store::Item_t" name="token_qname_"/>
+  </zorba:state>
+
+</zorba:iterator>
+
+<zorba:iterator name="TokenizerPropertiesIterator"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+
+  <zorba:function generateCodegen="false" generateDECL="false">
+
+    <zorba:signature localname="tokenizer-properties" prefix="full-text">
+      <zorba:output>node()</zorba:output>
+    </zorba:signature>
+
+    <zorba:signature localname="tokenizer-properties" prefix="full-text">
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:output>node()</zorba:output>
+    </zorba:signature>
+
+    <zorba:methods>
+      <!-- 
+       ! Mark the function as accessing the dyn ctx so that it won't be
+       ! const-folded. We must prevent const-folding because the function
+       ! returns a node that is validated with a schema that may not be
+       ! imported in the module where the function is invoked from.
+      -->
+      <zorba:accessesDynCtx returnValue="true"/>
+    </zorba:methods>
+
+  </zorba:function>
+
+</zorba:iterator>
+
+<zorba:iterator name="TokenizeStringIterator"
+                generateResetImpl="true"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+
+  <zorba:function>
+
+    <zorba:signature localname="tokenize-string" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- string -->
+      <zorba:output>xs:string*</zorba:output>
+    </zorba:signature>
+
+    <zorba:signature localname="tokenize-string" prefix="full-text">
+      <zorba:param>xs:string</zorba:param>    <!-- string -->
+      <zorba:param>xs:language</zorba:param>  <!-- lang -->
+      <zorba:output>xs:string*</zorba:output>
+    </zorba:signature>
+
+  </zorba:function>
+
+  <zorba:state generateInit="use-default">
+    <zorba:member type="FTTokenSeqIterator" name="string_tokens_"/>
+  </zorba:state>
+
+</zorba:iterator>
+
+</zorba:iterators>
+<!-- vim:set et sw=2 ts=2: -->

=== modified file 'src/runtime/spec/mappings.xml'
--- src/runtime/spec/mappings.xml	2012-04-24 12:39:38 +0000
+++ src/runtime/spec/mappings.xml	2012-04-24 22:19:24 +0000
@@ -82,6 +82,11 @@
       define="ZORBA_STORE_DYNAMIC_UNORDERED_MAP_FN_NS"
       prefix="zorba-store-data-structure-unordered-map"/>
 
+    <zorba:namespace
+      uri="http://www.zorba-xquery.com/modules/full-text";
+      define="ZORBA_FULL_TEXT_FN_NS"
+      prefix="full-text"/>
+
     <zorba:namespace uri="http://www.zorba-xquery.com/modules/xqdoc"; 
                      define="ZORBA_XQDOC_FN_NS" 
                      prefix="fn-zorba-xqdoc"/>
@@ -150,9 +155,9 @@
     <zorba:type zorbaType="ANY_NODE">node()</zorba:type>
     <zorba:type zorbaType="ELEMENT">element()</zorba:type>
 
-    
     <zorba:type zorbaType="ANY_ATOMIC">xs:anyAtomicType</zorba:type>
     <zorba:type zorbaType="UNTYPED_ATOMIC">xs:untypedAtomic</zorba:type>
+
     <zorba:type zorbaType="STRING">xs:string</zorba:type>
     <zorba:type zorbaType="NORMALIZED_STRING">xs:normalizedString</zorba:type>
     <zorba:type zorbaType="TOKEN">xs:token</zorba:type>
@@ -160,21 +165,25 @@
     <zorba:type zorbaType="NMTOKEN">xs:NMTOKEN</zorba:type>
     <zorba:type zorbaType="NAME">xs:Name</zorba:type>
     <zorba:type zorbaType="NCNAME">xs:NCName</zorba:type>
+
     <zorba:type zorbaType="ID">xs:ID</zorba:type>
     <zorba:type zorbaType="IDREF">xs:IDREF</zorba:type>
+
     <zorba:type zorbaType="ENTITY">xs:ENTITY</zorba:type>
+
     <zorba:type zorbaType="DATETIME">xs:dateTime</zorba:type>
     <zorba:type zorbaType="DATE">xs:date</zorba:type>
     <zorba:type zorbaType="TIME">xs:time</zorba:type>
     <zorba:type zorbaType="DURATION">xs:duration</zorba:type>
     <zorba:type zorbaType="DT_DURATION">xs:dayTimeDuration</zorba:type>
     <zorba:type zorbaType="YM_DURATION">xs:yearMonthDuration</zorba:type>
+
     <zorba:type zorbaType="FLOAT">xs:float</zorba:type>
     <zorba:type zorbaType="DOUBLE">xs:double</zorba:type>
     <zorba:type zorbaType="DECIMAL">xs:decimal</zorba:type>
     <zorba:type zorbaType="INTEGER">xs:integer</zorba:type>
     <zorba:type zorbaType="NON_POSITIVE_INTEGER">xs:nonPositiveInteger</zorba:type>
-    <zorba:type zorbaType="NEGATIVE_INTEGER">xs:nonNegativeInteger</zorba:type>
+    <zorba:type zorbaType="NEGATIVE_INTEGER">xs:negativeInteger</zorba:type>
     <zorba:type zorbaType="LONG">xs:long</zorba:type>
     <zorba:type zorbaType="INT">xs:int</zorba:type>
     <zorba:type zorbaType="SHORT">xs:short</zorba:type>
@@ -185,14 +194,17 @@
     <zorba:type zorbaType="UNSIGNED_SHORT">xs:unsignedShort</zorba:type>
     <zorba:type zorbaType="UNSIGNED_BYTE">xs:unsignedByte</zorba:type>
     <zorba:type zorbaType="POSITIVE_INTEGER">xs:positiveInteger</zorba:type>
+
     <zorba:type zorbaType="GYEAR_MONTH">xs:gYearMonth</zorba:type>
     <zorba:type zorbaType="GYEAR">xs:gYear</zorba:type>
     <zorba:type zorbaType="GMONTH_DAY">xs:gMonthDay</zorba:type>
     <zorba:type zorbaType="GDAY">xs:gDay</zorba:type>
     <zorba:type zorbaType="GMONTH">xs:gMonth</zorba:type>
+
     <zorba:type zorbaType="BOOLEAN">xs:boolean</zorba:type>
     <zorba:type zorbaType="BASE64BINARY">xs:base64Binary</zorba:type>
     <zorba:type zorbaType="HEXBINARY">xs:hexBinary</zorba:type>
+
     <zorba:type zorbaType="ANY_URI">xs:anyURI</zorba:type>
     <zorba:type zorbaType="QNAME">xs:QName</zorba:type>
     <zorba:type zorbaType="NOTATION">xs:NOTATION</zorba:type>  

=== modified file 'src/runtime/visitors/pregenerated/planiter_visitor.h'
--- src/runtime/visitors/pregenerated/planiter_visitor.h	2012-04-24 12:39:38 +0000
+++ src/runtime/visitors/pregenerated/planiter_visitor.h	2012-04-24 22:19:24 +0000
@@ -193,6 +193,45 @@
 
     class FnPutIterator;
 
+#ifndef ZORBA_NO_FULL_TEXT
+    class CurrentLangIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class HostLangIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class IsStemLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class IsStopWordIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class IsStopWordLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class IsThesaurusLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class IsTokenizerLangSupportedIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class StemIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class StripDiacriticsIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class ThesaurusLookupIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class TokenizeIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class TokenizerPropertiesIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    class TokenizeStringIterator;
+#endif
     class FunctionNameIterator;
 
     class FunctionArityIterator;
@@ -862,6 +901,58 @@
     virtual void beginVisit ( const FnPutIterator& ) = 0;
     virtual void endVisit   ( const FnPutIterator& ) = 0;
 
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const CurrentLangIterator& ) = 0;
+    virtual void endVisit   ( const CurrentLangIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const HostLangIterator& ) = 0;
+    virtual void endVisit   ( const HostLangIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const IsStemLangSupportedIterator& ) = 0;
+    virtual void endVisit   ( const IsStemLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const IsStopWordIterator& ) = 0;
+    virtual void endVisit   ( const IsStopWordIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const IsStopWordLangSupportedIterator& ) = 0;
+    virtual void endVisit   ( const IsStopWordLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const IsThesaurusLangSupportedIterator& ) = 0;
+    virtual void endVisit   ( const IsThesaurusLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const IsTokenizerLangSupportedIterator& ) = 0;
+    virtual void endVisit   ( const IsTokenizerLangSupportedIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const StemIterator& ) = 0;
+    virtual void endVisit   ( const StemIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const StripDiacriticsIterator& ) = 0;
+    virtual void endVisit   ( const StripDiacriticsIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const ThesaurusLookupIterator& ) = 0;
+    virtual void endVisit   ( const ThesaurusLookupIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const TokenizeIterator& ) = 0;
+    virtual void endVisit   ( const TokenizeIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const TokenizerPropertiesIterator& ) = 0;
+    virtual void endVisit   ( const TokenizerPropertiesIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const TokenizeStringIterator& ) = 0;
+    virtual void endVisit   ( const TokenizeStringIterator& ) = 0;
+#endif
     virtual void beginVisit ( const FunctionNameIterator& ) = 0;
     virtual void endVisit   ( const FunctionNameIterator& ) = 0;
 

=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.cpp'
--- src/runtime/visitors/pregenerated/printer_visitor.cpp	2012-04-24 12:39:38 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.cpp	2012-04-24 22:19:24 +0000
@@ -47,6 +47,7 @@
 #include "runtime/errors_and_diagnostics/other_diagnostics.h"
 #include "runtime/fetch/fetch.h"
 #include "runtime/fnput/fnput.h"
+#include "runtime/full_text/ft_module.h"
 #include "runtime/function_item/function_item_iter.h"
 #include "runtime/indexing/ic_ddl.h"
 #include "runtime/introspection/sctx.h"
@@ -1245,6 +1246,201 @@
 }
 // </FnPutIterator>
 
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentLangIterator>
+void PrinterVisitor::beginVisit ( const CurrentLangIterator& a) {
+  thePrinter.startBeginVisit("CurrentLangIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const CurrentLangIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </CurrentLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <HostLangIterator>
+void PrinterVisitor::beginVisit ( const HostLangIterator& a) {
+  thePrinter.startBeginVisit("HostLangIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const HostLangIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </HostLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStemLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsStemLangSupportedIterator& a) {
+  thePrinter.startBeginVisit("IsStemLangSupportedIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsStemLangSupportedIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </IsStemLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordIterator>
+void PrinterVisitor::beginVisit ( const IsStopWordIterator& a) {
+  thePrinter.startBeginVisit("IsStopWordIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsStopWordIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </IsStopWordIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsStopWordLangSupportedIterator& a) {
+  thePrinter.startBeginVisit("IsStopWordLangSupportedIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsStopWordLangSupportedIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </IsStopWordLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsThesaurusLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsThesaurusLangSupportedIterator& a) {
+  thePrinter.startBeginVisit("IsThesaurusLangSupportedIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsThesaurusLangSupportedIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </IsThesaurusLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsTokenizerLangSupportedIterator>
+void PrinterVisitor::beginVisit ( const IsTokenizerLangSupportedIterator& a) {
+  thePrinter.startBeginVisit("IsTokenizerLangSupportedIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const IsTokenizerLangSupportedIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </IsTokenizerLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StemIterator>
+void PrinterVisitor::beginVisit ( const StemIterator& a) {
+  thePrinter.startBeginVisit("StemIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const StemIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </StemIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StripDiacriticsIterator>
+void PrinterVisitor::beginVisit ( const StripDiacriticsIterator& a) {
+  thePrinter.startBeginVisit("StripDiacriticsIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const StripDiacriticsIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </StripDiacriticsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <ThesaurusLookupIterator>
+void PrinterVisitor::beginVisit ( const ThesaurusLookupIterator& a) {
+  thePrinter.startBeginVisit("ThesaurusLookupIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const ThesaurusLookupIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </ThesaurusLookupIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeIterator>
+void PrinterVisitor::beginVisit ( const TokenizeIterator& a) {
+  thePrinter.startBeginVisit("TokenizeIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizeIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </TokenizeIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizerPropertiesIterator>
+void PrinterVisitor::beginVisit ( const TokenizerPropertiesIterator& a) {
+  thePrinter.startBeginVisit("TokenizerPropertiesIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizerPropertiesIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </TokenizerPropertiesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeStringIterator>
+void PrinterVisitor::beginVisit ( const TokenizeStringIterator& a) {
+  thePrinter.startBeginVisit("TokenizeStringIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizeStringIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </TokenizeStringIterator>
+
+#endif
 
 // <FunctionNameIterator>
 void PrinterVisitor::beginVisit ( const FunctionNameIterator& a) {

=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.h'
--- src/runtime/visitors/pregenerated/printer_visitor.h	2012-04-24 12:39:38 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.h	2012-04-24 22:19:24 +0000
@@ -292,6 +292,71 @@
     void beginVisit( const FnPutIterator& );
     void endVisit  ( const FnPutIterator& );
 
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const CurrentLangIterator& );
+    void endVisit  ( const CurrentLangIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const HostLangIterator& );
+    void endVisit  ( const HostLangIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const IsStemLangSupportedIterator& );
+    void endVisit  ( const IsStemLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const IsStopWordIterator& );
+    void endVisit  ( const IsStopWordIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const IsStopWordLangSupportedIterator& );
+    void endVisit  ( const IsStopWordLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const IsThesaurusLangSupportedIterator& );
+    void endVisit  ( const IsThesaurusLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const IsTokenizerLangSupportedIterator& );
+    void endVisit  ( const IsTokenizerLangSupportedIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const StemIterator& );
+    void endVisit  ( const StemIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const StripDiacriticsIterator& );
+    void endVisit  ( const StripDiacriticsIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const ThesaurusLookupIterator& );
+    void endVisit  ( const ThesaurusLookupIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const TokenizeIterator& );
+    void endVisit  ( const TokenizeIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const TokenizerPropertiesIterator& );
+    void endVisit  ( const TokenizerPropertiesIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const TokenizeStringIterator& );
+    void endVisit  ( const TokenizeStringIterator& );
+#endif
+
     void beginVisit( const FunctionNameIterator& );
     void endVisit  ( const FunctionNameIterator& );
 

=== modified file 'src/store/naive/atomic_items.cpp'
--- src/store/naive/atomic_items.cpp	2012-04-24 12:39:38 +0000
+++ src/store/naive/atomic_items.cpp	2012-04-24 22:19:24 +0000
@@ -1657,10 +1657,13 @@
 {
   typedef NaiveFTTokenIterator::container_type tokens_t;
   unique_ptr<tokens_t> tokens( new tokens_t );
+  AtomicItemTokenizerCallback callback( *tokens );
 
-  Tokenizer::ptr t( provider.getTokenizer( lang, numbers ) );
-  AtomicItemTokenizerCallback cb( *t, lang, *tokens );
-  cb.tokenize( theValue.data(), theValue.size(), wildcards );
+  Tokenizer::ptr tokenizer;
+  if ( provider.getTokenizer( lang, &numbers, &tokenizer ) )
+    tokenizer->tokenize_string(
+      theValue.data(), theValue.size(), lang, wildcards, callback
+    );
 
   return FTTokenIterator_t( new NaiveFTTokenIterator( tokens.release() ) );
 }
@@ -3588,25 +3591,22 @@
 ********************************************************************************/
 
 AtomicItemTokenizerCallback::AtomicItemTokenizerCallback( 
-  Tokenizer &tokenizer,
-  locale::iso639_1::type lang,
   container_type &tokens
 ) :
-  tokenizer_( tokenizer ),
-  lang_( lang ),
   tokens_( tokens )
 {
 }
 
-void AtomicItemTokenizerCallback::operator()(
+void AtomicItemTokenizerCallback::token(
   char const *utf8_s,
   size_type utf8_len,
+  iso639_1::type lang,
   size_type token_no, 
   size_type sent_no,
   size_type para_no,
-  void*
+  Item const*
 ) {
-  FTToken const t( utf8_s, utf8_len, token_no, lang_ );
+  FTToken const t( utf8_s, utf8_len, token_no, lang );
   tokens_.push_back( t );
 }
 

=== modified file 'src/store/naive/atomic_items.h'
--- src/store/naive/atomic_items.h	2012-04-24 12:39:38 +0000
+++ src/store/naive/atomic_items.h	2012-04-24 22:19:24 +0000
@@ -1461,7 +1461,7 @@
 
   xs_integer getIntegerValue() const { return theValue; }
 
-  xs_long getLongValue() const;
+  xs_long getLongValue() const; 
 
   xs_unsignedInt getUnsignedIntValue() const;
 
@@ -2603,28 +2603,15 @@
 public:
   typedef FTTokenStore::container_type container_type;
 
-  AtomicItemTokenizerCallback( 
-      Tokenizer &tokenizer,
-      locale::iso639_1::type lang,
-      container_type &tokens );
-
-  void operator()(
-      char const *utf8_s,
-      size_type utf8_len,
-      size_type token_no,
-      size_type sent_no,
-      size_type para_no,
-      void* = 0 );
-
-  void tokenize( char const *utf8_s, size_t len, bool wildcards = false ) 
-  {
-    tokenizer_.tokenize( utf8_s, len, lang_, wildcards, *this );
-  }
+  AtomicItemTokenizerCallback( container_type &tokens );
+
+  // inherited
+  void token( char const *utf8_s, size_type utf8_len, locale::iso639_1::type,
+              size_type token_no, size_type sent_no, size_type para_no,
+              Item const* );
 
 private:
-  Tokenizer                    & tokenizer_;
-  locale::iso639_1::type const   lang_;
-  container_type               & tokens_;
+  container_type &tokens_;
 };
 #endif /* ZORBA_NO_FULL_TEXT */
 

=== modified file 'src/store/naive/node_items.cpp'
--- src/store/naive/node_items.cpp	2012-04-24 12:39:38 +0000
+++ src/store/naive/node_items.cpp	2012-04-24 22:19:24 +0000
@@ -21,6 +21,7 @@
 #include <zorba/config.h>
 #include <zorba/item.h>
 
+#include "api/unmarshaller.h"
 #include "diagnostics/assert.h"
 #include "diagnostics/xquery_diagnostics.h"
 #include "zorbatypes/URI.h"
@@ -4761,108 +4762,57 @@
  ******************************************************************************/
 
 XmlNodeTokenizerCallback::XmlNodeTokenizerCallback(
-  TokenizerProvider const &provider,
-  Tokenizer::Numbers &numbers,
-  iso639_1::type lang,
   FTTokenStore &token_store
 ) :
-  provider_( provider ),
-  numbers_( numbers ),
   token_store_( &token_store ),
   tokens_( token_store.getDocumentTokens() )
 {
-  push_lang( lang );
 }
 
 
 XmlNodeTokenizerCallback::XmlNodeTokenizerCallback(
-  TokenizerProvider const &provider,
-  Tokenizer::Numbers &numbers,
-  iso639_1::type lang,
   container_type &tokens
 ) :
-  provider_( provider ),
-  numbers_( numbers ),
-  token_store_( NULL ),
+  token_store_( nullptr ),
   tokens_( tokens )
 {
-  push_lang( lang );
-}
-
-
-XmlNodeTokenizerCallback::~XmlNodeTokenizerCallback()
-{
-  while ( !tokenizer_stack_.empty() )
-    ztd::pop_stack( tokenizer_stack_ )->destroy();
-}
-
-
-inline XmlNodeTokenizerCallback::begin_type
-XmlNodeTokenizerCallback::beginTokenization() const 
-{
-  return token_store_->getDocumentTokens().size();
-}
-
-
-inline void XmlNodeTokenizerCallback::endTokenization(
-  XmlNode const *node,
-  XmlNodeTokenizerCallback::begin_type begin )
-{
-  token_store_->putRange(node, begin, token_store_->getDocumentTokens().size());
-}
-
-
-void XmlNodeTokenizerCallback::pop_lang() 
-{
-  lang_stack_.pop();
-  ztd::pop_stack( tokenizer_stack_ )->destroy();
-}
-
-
-void XmlNodeTokenizerCallback::push_lang( iso639_1::type lang ) 
-{
-  lang_stack_.push( lang );
-  Tokenizer::ptr t( provider_.getTokenizer( lang, numbers_ ) );
-  ZORBA_ASSERT( t.get() );
-  tokenizer_stack_.push( t.get() );
-  t.release();
+}
+
+
+void XmlNodeTokenizerCallback::item( Item const &api_item, bool entering ) {
+  if ( token_store_ ) {
+    store::Item const *const item = Unmarshaller::getInternalItem( api_item );
+    if ( entering ) {
+      push_item( item );
+      range_stack_.push( token_store_->getDocumentTokens().size() );
+    } else {
+      pop_item();
+      token_store_->putRange(
+        item,
+        ztd::pop_stack( range_stack_ ),
+        token_store_->getDocumentTokens().size()
+      );
+    }
+  }
 }
 
 
 void XmlNodeTokenizerCallback::
-operator()( char const *utf8_s, size_type utf8_len, size_type pos,
-            size_type sent, size_type para, void *payload )
+token( char const *utf8_s, size_type utf8_len, iso639_1::type lang,
+       size_type pos, size_type sent, size_type para, Item const *api_item )
 {
-  store::Item const *const item = static_cast<store::Item*>( payload );
-  FTToken t( utf8_s, utf8_len, pos, sent, para, item, get_lang() );
+  store::Item const *const item = Unmarshaller::getInternalItem( *api_item );
+  FTToken t( utf8_s, utf8_len, pos, sent, para, item, lang );
   tokens_.push_back( t );
 }
 
 
-inline void XmlNodeTokenizerCallback::tokenize( char const *utf8_s,
-                                                size_t len ) 
-{
-  tokenizer().tokenize(
-    utf8_s, len, get_lang(), false, *this,
-    element_stack_.empty() ? NULL : static_cast<void*>( get_element() )
-  );
-}
-
-
 void XmlNode::tokenize( XmlNodeTokenizerCallback& )
 {
   // do nothing
 }
 
 
-void AttributeNode::tokenize( XmlNodeTokenizerCallback &cb )
-{
-  zstring text;
-  getStringValue2( text );
-  cb.tokenize( text.data(), text.size() );
-}
-
-
 FTTokenIterator_t
 AttributeNode::getTokens( TokenizerProvider const &provider,
                           Tokenizer::Numbers &numbers, iso639_1::type lang,
@@ -4875,62 +4825,21 @@
       return FTTokenIterator_t(
         new NaiveFTTokenIterator( *tokens, 0, tokens->size() )
       );
+
     FTTokenStore::container_type att_tokens;
-    XmlNodeTokenizerCallback cb( provider, numbers, lang, att_tokens );
-    const_cast<AttributeNode*>( this )->tokenize( cb );
-    token_store.putAttr( this, att_tokens );
-  }
-}
-
-
-void InternalNode::tokenize( XmlNodeTokenizerCallback& cb )
-{
-  XmlNodeTokenizerCallback::begin_type const begin = cb.beginTokenization();
-  for ( csize i = 0; i < numChildren(); ++i )
-    getChild( i )->tokenize( cb );
-  cb.endTokenization( this, begin );
-}
-
-
-void ElementNode::tokenize( XmlNodeTokenizerCallback& cb )
-{
-  Tokenizer &tokenizer = cb.tokenizer();
-
-  zorba::Item element_name;
-  if ( tokenizer.trace_options() )
-    element_name = getNodeName();
-
-  if ( tokenizer.trace_options() & Tokenizer::trace_begin )
-    tokenizer.element( element_name, Tokenizer::trace_begin );
-  else if ( !tokenizer.trace_options() )
-    ++tokenizer.numbers().para;
-
-  //
-  // See if this XML element has an xml:lang attribute: if so, switch to that
-  // language.
-  //
-  bool pushed_lang = false;
-  for ( ulong i = 0; i < numAttrs(); ++i ) {
-    AttributeNode *const at = getAttr( i );
-    Item const *const name = at->getNodeName();
-    if ( name->getLocalName() == "lang" && name->getNamespace() == XML_NS ) {
-      cb.push_lang( locale::find_lang( at->getStringValue().c_str() ) );
-      pushed_lang = true;
-      break;
+    XmlNodeTokenizerCallback callback( att_tokens );
+
+    zorba::Item const api_attr( this );
+    Tokenizer::ptr tokenizer;
+    if ( provider.getTokenizer( lang, &numbers, &tokenizer ) ) {
+      tokenizer->tokenize_node( api_attr, lang, callback );
+      token_store.putAttr( this, att_tokens );
     }
   }
-
-  cb.push_element( this );
-  InternalNode::tokenize( cb );
-  cb.pop_element();
-  if ( pushed_lang )
-    cb.pop_lang();
-
-  if ( tokenizer.trace_options() & Tokenizer::trace_end )
-    tokenizer.element( element_name, Tokenizer::trace_end );
 }
 
 
+#if 0
 void TextNode::tokenize( XmlNodeTokenizerCallback &cb )
 {
   const zstring* text;
@@ -4986,6 +4895,7 @@
   cb.tokenize( text->data(), text->size() );
   cb.endTokenization( this, begin );
 }
+#endif
 
 
 FTTokenIterator_t
@@ -4998,8 +4908,11 @@
 
   if ( tokens.empty() )
   {
-    XmlNodeTokenizerCallback cb( provider, numbers, lang, token_store );
-    getRoot()->tokenize( cb );
+    zorba::Item const api_root( getRoot() );
+    XmlNodeTokenizerCallback callback( token_store );
+    Tokenizer::ptr tokenizer;
+    if ( provider.getTokenizer( lang, &numbers, &tokenizer ) )
+      tokenizer->tokenize_node( api_root, lang, callback );
   }
 
   FTTokenStore::range_type const &r = token_store.getRange( this );

=== modified file 'src/store/naive/node_items.h'
--- src/store/naive/node_items.h	2012-04-24 12:39:38 +0000
+++ src/store/naive/node_items.h	2012-04-24 22:19:24 +0000
@@ -884,10 +884,6 @@
   const OrdPath* getFirstChildOrdPathAfter(csize pos) const;
 
   const OrdPath* getFirstChildOrdPathBefore(csize pos) const;
-
-#ifndef ZORBA_NO_FULL_TEXT
-  void tokenize( XmlNodeTokenizerCallback& );
-#endif /* ZORBA_NO_FULL_TEXT */
 };
 
 
@@ -1147,10 +1143,6 @@
         zstring& absUri,
         zstring& relUri);
 
-#ifndef ZORBA_NO_FULL_TEXT
-  void tokenize( XmlNodeTokenizerCallback& );
-#endif /* ZORBA_NO_FULL_TEXT */
-
 private:
   //disable default copy constructor
   ElementNode(const ElementNode& src);
@@ -1264,7 +1256,8 @@
 
 #ifndef ZORBA_NO_FULL_TEXT
   FTTokenIterator_t getTokens( TokenizerProvider const&, Tokenizer::Numbers&,
-                               locale::iso639_1::type, bool = false ) const;
+                               locale::iso639_1::type,
+                               bool wildcards = false ) const;
 #endif /* ZORBA_NO_FULL_TEXT */
 
 protected:
@@ -1279,10 +1272,6 @@
   {
     return *reinterpret_cast<ItemVector*>(theTypedValue.getp());
   }
-
-#ifndef ZORBA_NO_FULL_TEXT
-  void tokenize( XmlNodeTokenizerCallback& );
-#endif
   
   store::Iterator_t getChildren() const;
 };
@@ -1441,10 +1430,6 @@
   void setValue(store::Item_t& val) { theContent.setValue(val); }
 
   void setValue(store::Item* val) { theContent.setValue(val); }
-
-#ifndef ZORBA_NO_FULL_TEXT
-  void tokenize( XmlNodeTokenizerCallback& );
-#endif /* ZORBA_NO_FULL_TEXT */
   
   store::Iterator_t getChildren() const;
 };
@@ -1680,59 +1665,26 @@
 {
 public:
   typedef FTTokenStore::container_type container_type;
-  typedef FTTokenStore::size_type begin_type;
-
-  XmlNodeTokenizerCallback( TokenizerProvider const &provider,
-                            Tokenizer::Numbers &numbers,
-                            locale::iso639_1::type lang,
-                            FTTokenStore &token_store );
-
-  XmlNodeTokenizerCallback( TokenizerProvider const &provider,
-                            Tokenizer::Numbers &numbers,
-                            locale::iso639_1::type lang,
-                            container_type &tokens );
-
-  ~XmlNodeTokenizerCallback();
-
-  begin_type beginTokenization() const;
-
-  void endTokenization( XmlNode const*, begin_type );
-
-  void push_element( ElementNode *element ) { element_stack_.push( element ); }
-
-  void pop_element() { element_stack_.pop(); }
-
-  void push_lang( locale::iso639_1::type lang );
-
-  void pop_lang();
-
-  void tokenize( char const *utf8_s, size_t len );
-
-  Tokenizer& tokenizer() const { return *tokenizer_stack_.top(); }
+
+  XmlNodeTokenizerCallback( FTTokenStore &token_store );
+  XmlNodeTokenizerCallback( container_type &tokens );
 
   // inherited
-  void operator()( char const *utf8_s, size_type utf8_len,
-                   size_type pos, size_type sent, size_type para, void* );
+  void item( Item const&, bool );
+  void token( char const *utf8_s, size_type utf8_len, locale::iso639_1::type,
+              size_type pos, size_type sent, size_type para, Item const* );
 private:
-  typedef std::stack<ElementNode*> element_stack_t;
-  typedef std::stack<locale::iso639_1::type> lang_stack_t;
-  typedef std::stack<Tokenizer*> tokenizer_stack_t;
-
-  ElementNode* get_element() const {
-    return element_stack_.top();
-  }
-
-  locale::iso639_1::type get_lang() const {
-    return lang_stack_.top();
-  }
-
-  TokenizerProvider const &provider_;
-  Tokenizer::Numbers &numbers_;
+  typedef std::stack<store::Item const*> item_stack_t;
+  typedef std::stack<FTTokenStore::size_type> range_begin_stack_t;
+
+  store::Item const* get_item() const { return item_stack_.top(); }
+  void push_item( store::Item const *item ) { item_stack_.push( item ); }
+  void pop_item() { item_stack_.pop(); }
+
   FTTokenStore *token_store_;
   container_type &tokens_;
-  element_stack_t element_stack_;
-  lang_stack_t lang_stack_;
-  tokenizer_stack_t tokenizer_stack_;
+  item_stack_t item_stack_;
+  range_begin_stack_t range_stack_;
 };
 #endif /* ZORBA_NO_FULL_TEXT */
 

=== modified file 'src/unit_tests/stemmer.cpp'
--- src/unit_tests/stemmer.cpp	2012-04-24 12:39:38 +0000
+++ src/unit_tests/stemmer.cpp	2012-04-24 22:19:24 +0000
@@ -37,6 +37,7 @@
 public:
   // inherited
   void destroy() const;
+  void properties( Properties* ) const;
   void stem( String const &word, iso639_1::type lang, String *result ) const;
 };
 
@@ -44,6 +45,10 @@
   destroy_called = true;
 }
 
+void TestStemmer::properties( Properties *p ) const {
+  p->uri = "http://www.zorba-xquery.com/full-text/unit-tests/stemmer";;
+}
+
 void TestStemmer::stem( String const &word, iso639_1::type lang,
                         String *result ) const {
   if ( word == "foobar" )
@@ -56,21 +61,20 @@
 
 class TestStemmerProvider : public StemmerProvider {
 public:
-  Stemmer::ptr getStemmer( iso639_1::type lang ) const;
+  bool getStemmer( iso639_1::type lang, Stemmer::ptr* = 0 ) const;
 };
 
-Stemmer::ptr TestStemmerProvider::getStemmer( iso639_1::type lang ) const {
+bool TestStemmerProvider::getStemmer( iso639_1::type lang,
+                                      Stemmer::ptr *result ) const {
   static TestStemmer stemmer;
-  Stemmer::ptr result;
   switch ( lang ) {
     case iso639_1::en:
     case iso639_1::unknown:
-      result.reset( &stemmer );
-      break;
+      result->reset( &stemmer );
+      return true;
     default:
-      break;
+      return false;
   }
-  return std::move( result );
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/unit_tests/string.cpp'
--- src/unit_tests/string.cpp	2012-04-24 12:39:38 +0000
+++ src/unit_tests/string.cpp	2012-04-24 22:19:24 +0000
@@ -542,6 +542,19 @@
 }
 
 template<class StringType>
+static void test_strip_diacritics() {
+  StringType result;
+
+  StringType const s1( "x " utf8_aeiou_acute " x" );
+  utf8::strip_diacritics( s1, &result );
+  ASSERT_TRUE( result == "x aeiou x" );
+
+  StringType const s2( "x " utf8_AEIOU_acute " x" );
+  utf8::strip_diacritics( s2, &result );
+  ASSERT_TRUE( result == "x AEIOU x" );
+}
+
+template<class StringType>
 static void test_to_codepoints( char const *s ) {
   StringType const s1( s );
 
@@ -866,6 +879,9 @@
   test_split<zstring>( "a", "" );
   test_split<String>( "a", "" );
 
+  test_strip_diacritics<string>();
+  test_strip_diacritics<zstring>();
+
   test_to_codepoints<string>( "hello" );
   test_to_codepoints<string>( utf8_aeiou_acute );
   test_to_codepoints<zstring>( "hello" );

=== modified file 'src/unit_tests/thesaurus.cpp'
--- src/unit_tests/thesaurus.cpp	2012-04-24 12:39:38 +0000
+++ src/unit_tests/thesaurus.cpp	2012-04-24 22:19:24 +0000
@@ -42,47 +42,48 @@
   iterator::ptr lookup( String const &phrase, String const &relationship,
                         range_type at_least, range_type at_most ) const;
 private:
-  typedef std::list<String> synonyms_t;
-  typedef std::map<String,synonyms_t const*> thesaurus_t;
+  typedef std::list<String> synonyms_type;
+  typedef std::map<String,synonyms_type const*> thesaurus_data_type;
 
-  static thesaurus_t const& get_thesaurus();
+  static thesaurus_data_type const& get_thesaurus_data();
 
   class iterator : public Thesaurus::iterator {
   public:
-    iterator( synonyms_t const &s ) : synonyms_( s ), i_( s.begin() ) { }
+    iterator( synonyms_type const &s ) : synonyms_( s ), i_( s.begin() ) { }
     void destroy() const;
     bool next( String *synonym );
   private:
-    synonyms_t const &synonyms_;
-    synonyms_t::const_iterator i_;
+    synonyms_type const &synonyms_;
+    synonyms_type::const_iterator i_;
   };
 };
 
-TestThesaurus::thesaurus_t const& TestThesaurus::get_thesaurus() {
-  static thesaurus_t thesaurus;
-  if ( thesaurus.empty() ) {
-    static synonyms_t synonyms;
+void TestThesaurus::destroy() const {
+  destroy_called = true;
+}
+
+TestThesaurus::thesaurus_data_type const& TestThesaurus::get_thesaurus_data() {
+  static thesaurus_data_type thesaurus_data;
+  if ( thesaurus_data.empty() ) {
+    static synonyms_type synonyms;
     synonyms.push_back( "foo" );
     synonyms.push_back( "foobar" );
 
-    thesaurus[ "foo"    ] = &synonyms;
-    thesaurus[ "foobar" ] = &synonyms;
+    thesaurus_data[ "foo"    ] = &synonyms;
+    thesaurus_data[ "foobar" ] = &synonyms;
   }
-  return thesaurus;
-}
-
-void TestThesaurus::destroy() const {
-  destroy_called = true;
+  return thesaurus_data;
 }
 
 Thesaurus::iterator::ptr
 TestThesaurus::lookup( String const &phrase, String const &relationship,
                        range_type at_least, range_type at_most ) const {
-  static thesaurus_t const &thesaurus = get_thesaurus();
-  thesaurus_t::const_iterator const i = thesaurus.find( phrase );
-  Thesaurus::iterator::ptr result;
-  if ( i != thesaurus.end() )
-    result.reset( new iterator( *i->second ) );
+  static thesaurus_data_type const &thesaurus_data = get_thesaurus_data();
+  thesaurus_data_type::const_iterator const entry =
+    thesaurus_data.find( phrase );
+  iterator::ptr result;
+  if ( entry != thesaurus_data.end() )
+    result.reset( new iterator( *entry->second ) );
   return std::move( result );
 }
 
@@ -101,6 +102,28 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+class TestThesaurusProvider : public ThesaurusProvider {
+public:
+  bool getThesaurus( iso639_1::type lang, Thesaurus::ptr* = 0 ) const;
+
+  // inherited
+  void destroy() const;
+};
+
+void TestThesaurusProvider::destroy() const {
+  // do nothing
+}
+
+bool TestThesaurusProvider::getThesaurus( iso639_1::type lang,
+                                          Thesaurus::ptr *result ) const {
+  static TestThesaurus thesaurus;
+  if ( result )
+    result->reset( &thesaurus );
+  return true;
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
 class TestThesaurusResolver : public URLResolver {
 public:
   TestThesaurusResolver( String const &uri ) : uri_( uri ) { }
@@ -112,9 +135,13 @@
 };
 
 Resource*
-TestThesaurusResolver::resolveURL( String const &uri, EntityData const *ed ) {
-  static TestThesaurus thesaurus;
-  return uri == uri_ ? &thesaurus : 0;
+TestThesaurusResolver::resolveURL( String const &uri, EntityData const *data ) {
+  if ( data->getKind() == EntityData::THESAURUS ) {
+    static TestThesaurusProvider provider;
+    if ( uri == uri_ )
+      return &provider;
+  }
+  return 0;
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/unit_tests/tokenizer.cpp'
--- src/unit_tests/tokenizer.cpp	2012-04-24 12:39:38 +0000
+++ src/unit_tests/tokenizer.cpp	2012-04-24 22:19:24 +0000
@@ -24,6 +24,7 @@
 #include <iostream>
 
 #include <zorba/diagnostic_list.h>
+#include <zorba/store_consts.h>
 #include <zorba/store_manager.h>
 #include <zorba/tokenizer.h>
 #include <zorba/user_exception.h>
@@ -59,14 +60,18 @@
 
 class TestTokenizer : public Tokenizer {
 public:
-  TestTokenizer( Numbers &num ) : Tokenizer( num, trace_begin ) { }
+  TestTokenizer( Numbers &num ) : Tokenizer( num ) { }
   ~TestTokenizer();
 
   // inherited
   void destroy() const;
-  void element( Item const&, int );
-  void tokenize( char const*, size_type, iso639_1::type, bool, Callback&,
-                 void* );
+  void properties( Properties* ) const;
+  void tokenize_string( char const*, size_type, iso639_1::type, bool,
+                        Callback&, Item const* );
+
+protected:
+  // inherited
+  void item( Item const&, bool );
 
 private:
   typedef std::string token_t;
@@ -83,7 +88,8 @@
   static bool is_word_begin_char( char );
   bool is_word_char( char );
   static char peek( char const *s, char const *end );
-  bool send_token( token_t const &token, Callback&, void* );
+  bool send_token( token_t const &token, iso639_1::type, Callback&,
+                   Item const* );
 };
 
 TestTokenizer::~TestTokenizer() {
@@ -95,10 +101,7 @@
   delete this;
 }
 
-void TestTokenizer::element( Item const &qname, int trace_options ) {
-  if ( trace_options & trace_end )
-    return;
-
+void TestTokenizer::item( Item const &item, bool entering ) {
   static char const *const block_elements[] = {
     "address",
     "blockquote",
@@ -116,10 +119,14 @@
   static char const *const *const end =
     block_elements + sizeof( block_elements ) / sizeof( char* );
 
-  String const name( qname.getLocalName() );
-  if ( ::binary_search( block_elements, end, name.c_str(),
-                        less<char const*>() ) ) {
-    ++numbers().para;
+  if ( entering && item.isNode() &&
+       item.getNodeKind() == store::StoreConsts::elementNode ) {
+    Item qname;
+    item.getNodeName( qname );
+    if ( ::binary_search( block_elements, end, qname.getLocalName().c_str(),
+                          less<char const*>() ) ) {
+      ++numbers().para;
+    }
   }
 }
 
@@ -170,15 +177,24 @@
   return ++s < end ? *s : '\0';
 }
 
+void TestTokenizer::properties( Properties *p ) const {
+  p->comments_separate_tokens = true;
+  p->elements_separate_tokens = true;
+  p->processing_instructions_separate_tokens = true;
+  p->languages.clear();
+  p->languages.push_back( iso639_1::en );
+  p->uri = "http://www.zorba-xquery.com/full-text/tokenizer/unit-test";;
+}
+
 #define HANDLE_BACKSLASH()            \
   if ( !got_backslash ) ; else {      \
     got_backslash = in_wild = false;  \
     break;                            \
   }
 
-void TestTokenizer::tokenize( char const *s, size_type s_len,
-                              iso639_1::type lang, bool wildcards,
-                              Callback &callback, void *payload ) {
+void TestTokenizer::tokenize_string( char const *s, size_type s_len,
+                                     iso639_1::type lang, bool wildcards,
+                                     Callback &callback, Item const *item ) {
   bool got_backslash = false;
   bool in_wild = false;
   token_t token;
@@ -247,7 +263,7 @@
     } else {
       if ( is_word_char( *s ) )
         token += *s;
-      else if ( send_token( token, callback, payload ) ) {
+      else if ( send_token( token, lang, callback, item ) ) {
         token.clear();
         t_type_ = t_generic;
       }
@@ -279,7 +295,7 @@
       }
   } // for
 
-  send_token( token, callback, payload );
+  send_token( token, lang, callback, item );
 }
 
 static char const *const tokens[] = {
@@ -304,8 +320,8 @@
 
 #define PRINT_TOKENS 0
 
-bool TestTokenizer::send_token( token_t const &token, Callback &callback,
-                                void *payload ) {
+bool TestTokenizer::send_token( token_t const &token, iso639_1::type lang,
+                                Callback &callback, Item const *item ) {
   if ( !token.empty() ) {
 #if PRINT_TOKENS
     cout <<   "t=" << setw(2) << numbers().token
@@ -316,9 +332,9 @@
 
     check_token( token.c_str(), numbers().token );
 
-    callback(
-      token.data(), token.size(),
-      numbers().token, numbers().sent, numbers().para, payload
+    callback.token(
+      token.data(), token.size(), lang,
+      numbers().token, numbers().sent, numbers().para, item
     );
     ++numbers().token;
     return true;
@@ -331,13 +347,16 @@
 class TestTokenizerProvider : public TokenizerProvider {
 public:
   // inherited
-  Tokenizer::ptr getTokenizer( iso639_1::type, Tokenizer::Numbers& ) const;
+  bool getTokenizer( iso639_1::type, Tokenizer::Numbers* = 0,
+                     Tokenizer::ptr* = 0 ) const;
 };
 
-Tokenizer::ptr
-TestTokenizerProvider::getTokenizer( iso639_1::type lang,
-                                     Tokenizer::Numbers &num ) const {
-  return Tokenizer::ptr( new TestTokenizer( num ) );
+bool TestTokenizerProvider::getTokenizer( iso639_1::type lang,
+                                          Tokenizer::Numbers *num,
+                                          Tokenizer::ptr *t ) const {
+  if ( num && t )
+    t->reset( new TestTokenizer( *num ) );
+  return true;
 }
 
 ///////////////////////////////////////////////////////////////////////////////

=== modified file 'src/util/fs_util.h'
--- src/util/fs_util.h	2012-04-24 12:39:38 +0000
+++ src/util/fs_util.h	2012-04-24 22:19:24 +0000
@@ -503,6 +503,7 @@
  * @param path The path to normalize.
  * @param base The base path, if any.
  * @return Returns the normalized path.
+ * @throws XQueryException err::XPTY0004 for malformed paths.
  */
 zstring get_normalized_path( char const *path, char const *base = nullptr );
 
@@ -513,6 +514,7 @@
  * @param path The path to normalize.
  * @param base The base path, if any.
  * @return Returns the normalized path.
+ * @throws XQueryException err::XPTY0004 for malformed paths.
  */
 template<class PathStringType> inline
 zstring get_normalized_path( PathStringType const &path,
@@ -527,6 +529,7 @@
  * @param path The path to normalize.
  * @param base The base path, if any.
  * @return Returns the normalized path.
+ * @throws XQueryException err::XPTY0004 for malformed paths.
  */
 template<class PathStringType> inline
 void normalize_path( PathStringType &path, PathStringType const &base = "" ) {

=== modified file 'src/util/unicode_util.cpp'
--- src/util/unicode_util.cpp	2012-04-24 12:39:38 +0000
+++ src/util/unicode_util.cpp	2012-04-24 22:19:24 +0000
@@ -24,6 +24,7 @@
 
 #ifndef ZORBA_NO_ICU
 # include <unicode/normlzr.h>
+# include <unicode/uchar.h>
 # include <unicode/ustring.h>
 #endif /* ZORBA_NO_ICU */
 
@@ -2228,6 +2229,19 @@
   return U_SUCCESS( status ) == TRUE;
 }
 
+bool strip_diacritics( string const &in, string *out ) {
+  string in_normalized;
+  if ( !normalize( in, normalization::NFKD, &in_normalized ) )
+    return false;
+  out->truncate( 0 );
+  for ( size_type len = in_normalized.length(), i = 0; i < len; ++i ) {
+    UChar32 const uc32 = in_normalized.char32At( i );
+    if ( u_charType( uc32 ) != U_NON_SPACING_MARK )
+      out->append( uc32 );
+  }
+  return true;
+}
+
 bool to_char( char const *in, char_type *out ) {
   UErrorCode status = U_ZERO_ERROR;
   u_strFromUTF8WithSub(

=== modified file 'src/util/unicode_util.h'
--- src/util/unicode_util.h	2012-04-24 12:39:38 +0000
+++ src/util/unicode_util.h	2012-04-24 22:19:24 +0000
@@ -189,6 +189,7 @@
 ////////// normalization //////////////////////////////////////////////////////
 
 #ifndef ZORBA_NO_ICU
+
 /**
  * Normalizes the given string.
  *
@@ -197,6 +198,17 @@
  * @return Returns \c true only if the normalization succeeded.
  */
 bool normalize( string const &in, normalization::type n, string *out );
+
+/**
+ * Strips all diacritical marks from all characters converting them to their
+ * closest non-diacritical equivalents.
+ *
+ * @param in The input string.
+ * @param out The output string.
+ * @return Returns \c true only if the strip succeeded.
+ */
+bool strip_diacritics( string const &in, string *out );
+
 #endif /* ZORBA_NO_ICU */
 
 ////////// string conversion //////////////////////////////////////////////////

=== modified file 'src/util/uri_util.h'
--- src/util/uri_util.h	2012-04-24 12:39:38 +0000
+++ src/util/uri_util.h	2012-04-24 22:19:24 +0000
@@ -54,8 +54,8 @@
  * @param uri The URI to get the scheme of.
  * @param colon If not \c nullptr, this pointer is set to the position of the
  * ':' (if any) that follows the scheme name.
- * @return Returns the URI's scheme, or scheme::none if none, or
- * scheme::unknown if unknown.
+ * @return Returns the URI's scheme (if known), scheme::unknown (if unknown),
+ * or scheme::none (if none).
  */
 scheme get_scheme( char const *uri, char const **colon = nullptr );
 
@@ -64,10 +64,10 @@
  *
  * @tparam StringType The string type.
  * @param uri The URI to get the scheme of.
- * @param sname If not \c nullptr and the scheme is known, this is set to the
- * scheme's name.
- * @return Returns the URI's scheme, or scheme::none if none, or
- * scheme::unknown if unknown.
+ * @param sname If not \c nullptr and the scheme is not \c none, this is set to
+ * the scheme's name.
+ * @return Returns the URI's scheme (if known), scheme::unknown (if unknown),
+ * or scheme::none (if none).
  */
 template<class StringType> inline
 scheme get_scheme( StringType const &uri, StringType *sname = nullptr ) {

=== modified file 'src/util/utf8_util.h'
--- src/util/utf8_util.h	2012-04-24 12:39:38 +0000
+++ src/util/utf8_util.h	2012-04-24 22:19:24 +0000
@@ -759,9 +759,10 @@
  * @tparam OutputStringType The output string type.
  * @param in The input string.
  * @param out The output string.
+ * @return Returns \c true only if the strip succeeded.
  */
 template<class InputStringType,class OutputStringType>
-void strip_diacritics( InputStringType const &in, OutputStringType *out );
+bool strip_diacritics( InputStringType const &in, OutputStringType *out );
 
 /**
  *

=== modified file 'src/util/utf8_util.tcc'
--- src/util/utf8_util.tcc	2012-04-24 12:39:38 +0000
+++ src/util/utf8_util.tcc	2012-04-24 22:19:24 +0000
@@ -123,19 +123,26 @@
 #endif /* ZORBA_NO_ICU */
 
 template<class InputStringType,class OutputStringType>
-void strip_diacritics( InputStringType const &in, OutputStringType *out ) {
-  InputStringType in_normalized;
+bool strip_diacritics( InputStringType const &in, OutputStringType *out ) {
 #ifndef ZORBA_NO_ICU
-  normalize( in, unicode::normalization::NFKD, &in_normalized );
+  unicode::string u_in;
+  if ( !unicode::to_string( in, &u_in ) )
+    return false;
+  unicode::string u_out;
+  unicode::strip_diacritics( u_in, &u_out );
+  storage_type *temp;
+  size_type temp_len;
+  if ( !utf8::to_string( u_out.getBuffer(), u_out.length(), &temp, &temp_len ) )
+    return false;
+  out->assign( temp, temp_len );
+  if ( !string_traits<OutputStringType>::takes_pointer_ownership )
+    delete[] temp;
 #else
-  in_normalized = in.c_str();
-#endif /* ZORBA_NO_ICU */
   out->clear();
-  out->reserve( in_normalized.size() );
-  std::copy(
-    in_normalized.begin(), in_normalized.end(),
-    ascii::back_ascii_inserter( *out )
-  );
+  out->reserve( in.size() );
+  std::copy( in.begin(), in.end(), ascii::back_ascii_inserter( *out ) );
+#endif /* ZORBA_NO_ICU */
+  return true;
 }
 
 #ifndef ZORBA_NO_ICU

=== modified file 'src/zorbatypes/ft_token.cpp'
--- src/zorbatypes/ft_token.cpp	2012-04-24 12:39:38 +0000
+++ src/zorbatypes/ft_token.cpp	2012-04-24 22:19:24 +0000
@@ -175,7 +175,7 @@
 ///////////////////////////////////////////////////////////////////////////////
 
 std::ostream& operator<<( ostream &o, FTToken const &t ) {
-  return  o << "[FTToken: \"" << t.value() << "\" ("
+  return  o << "[\"" << t.value() << "\" ("
             << iso639_1::string_of[ t.lang() ] << ") "
             << t.pos() << ',' << t.sent() << ',' << t.para() << ']';
 }

=== modified file 'src/zorbatypes/ft_token.h'
--- src/zorbatypes/ft_token.h	2012-04-24 12:39:38 +0000
+++ src/zorbatypes/ft_token.h	2012-04-24 22:19:24 +0000
@@ -286,7 +286,7 @@
    */
   mutable mod_values_t *mod_values_;
 
-  inline bool is_query_token() const {
+  bool is_query_token() const {
     return sent_ == QueryTokenMagicValue;
   }
 

=== modified file 'src/zorbatypes/numconversions.cpp'
--- src/zorbatypes/numconversions.cpp	2012-04-24 12:39:38 +0000
+++ src/zorbatypes/numconversions.cpp	2012-04-24 22:19:24 +0000
@@ -15,6 +15,8 @@
  */
 #include "stdafx.h"
 
+#include <stdexcept>
+
 #include "common/common.h"
 #include "util/string_util.h"
 #include "zorbatypes/numconversions.h"
@@ -23,6 +25,9 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
+#define RANGE_ERROR(N,TYPE) \
+  std::range_error( BUILD_STRING( '"', (N), "\": number can not be represented as an " TYPE ) )
+
 xs_int to_xs_int( xs_double const &d ) {
   zstring const temp( d.toIntegerString() );
   return ztd::aton<xs_int>( temp.c_str() );
@@ -33,7 +38,9 @@
   zstring const temp( i.toString() );
   return ztd::aton<xs_int>( temp.c_str() );
 #else
-  return static_cast<xs_int>( i.value_ );
+  if ( i.is_xs_int() )
+    return static_cast<xs_int>( i.value_ );
+  throw RANGE_ERROR( i, "xs:int" );
 #endif /* ZORBA_WITH_BIG_INTEGER */
 }
 
@@ -42,9 +49,7 @@
     zstring const temp( d.toString() );
     return ztd::aton<xs_long>( temp.c_str() );
   }
-  throw std::range_error(
-    BUILD_STRING( '"', d, "\": number can not be represented as an xs:long" )
-  );
+  throw RANGE_ERROR( d, "xs:long" );
 }
 
 xs_long to_xs_long( xs_integer const &i ) {
@@ -52,7 +57,9 @@
   zstring const temp( i.toString() );
   return ztd::aton<xs_long>( temp.c_str() );
 #else
-  return static_cast<xs_long>( i.value_ );
+  if ( i.is_xs_long() )
+    return static_cast<xs_long>( i.value_ );
+  throw RANGE_ERROR( i, "xs:long" );
 #endif /* ZORBA_WITH_BIG_INTEGER */
 }
 
@@ -68,7 +75,9 @@
   zstring const temp( i.toString() );
   return ztd::aton<xs_unsignedInt>( temp.c_str() );
 #else
-  return static_cast<xs_unsignedInt>( i.value_ );
+  if ( i.is_xs_unsignedInt() )
+    return static_cast<xs_unsignedInt>( i.value_ );
+  throw RANGE_ERROR( i, "xs:unsignedInt" );
 #endif /* ZORBA_WITH_BIG_INTEGER */
 }
 
@@ -77,7 +86,9 @@
   zstring const temp( i.toString() );
   return ztd::aton<xs_unsignedLong>( temp.c_str() );
 #else
-  return static_cast<xs_unsignedLong>( i.value_ );
+  if ( i.is_xs_unsignedLong() )
+    return static_cast<xs_unsignedLong>( i.value_ );
+  throw RANGE_ERROR( i, "xs:unsignedLong" );
 #endif /* ZORBA_WITH_BIG_INTEGER */
 }
 

=== modified file 'src/zorbautils/locale.cpp'
--- src/zorbautils/locale.cpp	2012-04-24 12:39:38 +0000
+++ src/zorbautils/locale.cpp	2012-04-24 22:19:24 +0000
@@ -36,10 +36,10 @@
 
 #define DEF_END(CHAR_ARRAY)                             \
   static char const *const *const end =                 \
-    CHAR_ARRAY + sizeof( CHAR_ARRAY ) / sizeof( char* );
+    CHAR_ARRAY + sizeof( CHAR_ARRAY ) / sizeof( char* )
 
-#define FIND(what) \
-  static_cast<type>( find_index( string_of, end, what ) )
+#define FIND(WHAT) \
+  static_cast<type>( find_index( string_of, end, WHAT ) )
 
 using namespace std;
 
@@ -70,10 +70,10 @@
 static char* get_win32_locale_info( int constant ) {
   int bytes = ::GetLocaleInfoA( LOCALE_USER_DEFAULT, constant, NULL, 0 );
   ZORBA_FATAL( bytes, "GetLocaleInfoA() failed" );
-  char *const info = new char[ bytes ];
-  bytes = ::GetLocaleInfoA( LOCALE_USER_DEFAULT, constant, info, bytes );
+  unique_ptr<char[]> info = new char[ bytes ];
+  bytes = ::GetLocaleInfoA( LOCALE_USER_DEFAULT, constant, info.get(), bytes );
   ZORBA_FATAL( bytes, "GetLocaleInfoA() failed" );
-  return info;
+  return info.release();
 }
 
 #else /* WIN32 */
@@ -379,21 +379,192 @@
 
 char const *const string_of[] = {
   "#UNKNOWN",                           // starts with '#' for sorting
+  "aa", // Afar
+  "ab", // Abkhazian
+  "ae", // Avestan
+  "af", // Afrikaans
+  "ak", // Akan
+  "am", // Amharic
+  "an", // Aragonese
+  "ar", // Arabic
+  "as", // Assamese
+  "av", // Avaric
+  "ay", // Aymara
+  "az", // Azerbaijani
+  "ba", // Bashkir
+  "be", // Byelorussian
+  "bg", // Bulgarian
+  "bh", // Bihari
+  "bi", // Bislama
+  "bm", // Bambara
+  "bn", // Bengali; Bangla
+  "bo", // Tibetan
+  "br", // Breton
+  "bs", // Bosnian
+  "ca", // Catalan
+  "ce", // Chechen
+  "ch", // Chamorro
+  "co", // Corsican
+  "cr", // Cree
+  "cs", // Czech
+  "cu", // Church Slavic; Church Slavonic
+  "cv", // Chuvash
+  "cy", // Welsh
   "da", // Danish
   "de", // German
+  "dv", // Divehi
+  "dz", // Bhutani
+  "ee", // Ewe
+  "el", // Greek
   "en", // English
+  "eo", // Esperanto
   "es", // Spanish
+  "et", // Estonian
+  "eu", // Basque
+  "fa", // Persian
+  "ff", // Fulah
   "fi", // Finnish
+  "fj", // Fiji
+  "fo", // Faroese
   "fr", // French
+  "fy", // Frisian
+  "ga", // Irish
+  "gd", // Scots Gaelic
+  "gl", // Galician
+  "gn", // Guarani
+  "gu", // Gujarati
+  "gv", // Manx
+  "ha", // Hausa
+  "he", // Hebrew (formerly iw)
+  "hi", // Hindi
+  "ho", // Hiri Motu
+  "hr", // Croatian
+  "ht", // Haitian Creole
   "hu", // Hungarian
+  "hy", // Armenian
+  "hz", // Herero
+  "ia", // Interlingua
+  "id", // Indonesian (formerly in)
+  "ie", // Interlingue
+  "ig", // Igbo
+  "ii", // Nuosu
+  "ik", // Inupiak
+  "io", // Ido
+  "is", // Icelandic
   "it", // Italian
+  "iu", // Inuktitut
+  "ja", // Japanese
+  "jv", // Javanese
+  "ka", // Georgian
+  "kg", // Kongo
+  "ki", // Gikuyu
+  "kj", // Kuanyama
+  "kk", // Kazakh
+  "kl", // Greenlandic
+  "km", // Cambodian
+  "kn", // Kannada
+  "ko", // Korean
+  "kr", // Kanuri
+  "ks", // Kashmiri
+  "ku", // Kurdish
+  "kv", // Komi
+  "kw", // Cornish
+  "ky", // Kirghiz
+  "la", // Latin
+  "lb", // Letzeburgesch
+  "lg", // Ganda
+  "li", // Limburgan; Limburger; Limburgish
+  "ln", // Lingala
+  "lo", // Laothian
+  "lt", // Lithuanian
+  "lu", // Luba-Katanga
+  "lv", // Latvian, Lettish
+  "mg", // Malagasy
+  "mh", // Marshallese
+  "mi", // Maori
+  "mk", // Macedonian
+  "ml", // Malayalam
+  "mn", // Mongolian
+  "mo", // Moldavian
+  "mr", // Marathi
+  "ms", // Malay
+  "mt", // Maltese
+  "my", // Burmese
+  "na", // Nauru
+  "nb", // Norwegian Bokmal
+  "nd", // Ndebele, North
+  "ne", // Nepali
+  "ng", // Ndonga
   "nl", // Dutch
+  "nn", // Norwegian Nynorsk
   "no", // Norwegian
+  "nr", // Ndebele, South
+  "nv", // Navajo; Navaho
+  "ny", // Chichewa; Chewa; Nyanja
+  "oc", // Occitan
+  "oj", // Ojibwa
+  "om", // (Afan) Oromo
+  "or", // Oriya
+  "os", // Ossetian; Ossetic
+  "pa", // Panjabi; Punjabi
+  "pi", // Pali
+  "pl", // Polish
+  "ps", // Pashto; Pushto
   "pt", // Portuguese
+  "qu", // Quechua
+  "rm", // Romansh
+  "rn", // Kirundi
   "ro", // Romanian
   "ru", // Russian
+  "rw", // Kinyarwanda
+  "sa", // Sanskrit
+  "sc", // Sardinian
+  "sd", // Sindhi
+  "se", // Northern Sami
+  "sg", // Sangho
+  "sh", // Serbo-Croatian
+  "si", // Sinhalese
+  "sk", // Slovak
+  "sl", // Slovenian
+  "sm", // Samoan
+  "sn", // Shona
+  "so", // Somali
+  "sq", // Albanian
+  "sr", // Serbian
+  "ss", // Siswati
+  "st", // Sesotho
+  "su", // Sundanese
   "sv", // Swedish
+  "sw", // Swahili
+  "ta", // Tamil
+  "te", // Telugu
+  "tg", // Tajik
+  "th", // Thai
+  "ti", // Tigrinya
+  "tk", // Turkmen
+  "tl", // Tagalog
+  "tn", // Setswana
+  "to", // Tonga
   "tr", // Turkish
+  "ts", // Tsonga
+  "tt", // Tatar
+  "tw", // Twi
+  "ty", // Tahitian
+  "ug", // Uighur
+  "uk", // Ukrainian
+  "ur", // Urdu
+  "uz", // Uzbek
+  "ve", // Venda
+  "vi", // Vietnamese
+  "vo", // Volapuk
+  "wa", // Walloon
+  "wo", // Wolof
+  "xh", // Xhosa
+  "yi", // Yiddish (formerly ji)
+  "yo", // Yoruba
+  "za", // Zhuang
+  "zh", // Chinese
+  "zu", // Zulu
 };
 
 type find( char const *lang ) {
@@ -409,18 +580,110 @@
 
 char const *const string_of[] = {
   "#UNKNOWN",                           // starts with '#' for sorting
+  "aar",  // Afar
+  "abk",  // Abkhazian
+  "afr",  // Afrikaans
+  "aka",  // Akan
+  "alb",  // Albanian
+  "amh",  // Amharic
+  "ara",  // Arabic
+  "arg",  // Aragonese
+  "arm",  // Armenian
+  "asm",  // Assamese [without '_', it's a C++ keyword]
+  "ava",  // Avaric
+  "ave",  // Avestan
+  "aym",  // Aymara
+  "aze",  // Azerbaijani
+  "bak",  // Bashkir
+  "bam",  // Bambara
+  "baq",  // Basque
+  "bel",  // Belarusian
+  "ben",  // Bengali
+  "bih",  // Bihari
+  "bis",  // Bislama
+  "bos",  // Bosnian
+  "bre",  // Breton
+  "bul",  // Bulgarian
+  "bur",  // Burmese
+  "cat",  // Catalan
+  "cha",  // Chamorro
+  "che",  // Chechen
+  "chi",  // Chinese
+  "chu",  // Church Slavic; Old Slavonic; Church Slavonic
+  "cym",  // Welsh
   "dan",  // Danish
   "deu",  // German (T)
+  "div",  // Divehi; Dhivehi; Maldivian
   "dut",  // Dutch (B)
+  "dzo",  // Dzongkha
+  "ell",  // Modern Greek
   "eng",  // English
+  "epo",  // Esperanto
+  "est",  // Estonian
+  "ewe",  // Ewe
+  "fao",  // Faroese
+  "fij",  // Fijian
   "fin",  // Finnish
   "fra",  // French (T)
   "fre",  // French (B)
+  "fry",  // Western Frisian
+  "ful",  // Fulah
+  "geo",  // Georgian
   "ger",  // German (B)
+  "gla",  // Scottish Gaelic; Gaelic
+  "gle",  // Irish
+  "glg",  // Galician
+  "glv",  // Manx
+  "gre",  // Modern Greek
+  "grn",  // Guarani
+  "guj",  // Gujarati
+  "hat",  // Haitian Creole; Haitian
+  "hau",  // Hausa
+  "heb",  // Hebrew
+  "her",  // Herero
+  "hin",  // Hindi
+  "hmo",  // Hiri Motu
+  "hrv",  // Croatian
   "hun",  // Hungarian
+  "ibo",  // Igbo
+  "ice",  // Icelandic
+  "ido",  // Ido
+  "iku",  // Inuktitut
+  "ile",  // Interlingue; Occidental
+  "ina",  // Interlingua
+  "ind",  // Indonesian
+  "ipk",  // Inupiaq
+  "isl",  // Icelandic
   "ita",  // Italian
+  "jav",  // Javanese
+  "jpn",  // Japanese
+  "kal",  // Kalaallisut; Greenlandic
+  "kan",  // Kannada
+  "kas",  // Kashmiri
+  "kat",  // Georgian
+  "kau",  // Kanuri
+  "kaz",  // Kazakh
+  "khm",  // Central Khmer
+  "kik",  // Kikuyu; Gikuyu
+  "kin",  // Kinyarwanda
+  "kir",  // Kirghiz; Kyrgyz
+  "kom",  // Komi
+  "kon",  // Kongo
+  "kor",  // Korean
+  "kua",  // Kuanyama; Kwanyama
+  "kur",  // Kurdish
+  "lao",  // Lao
+  "lat",  // Latin
+  "lav",  // Latvian
+  "lim",  // Limburgan; Limburger; Limburgish
+  "lin",  // Lingala
+  "lit",  // Lithuanian
+  "ltz",  // Luxembourgish; Letzeburgesch
+  "lib",  // Luba-Katanga
+  "mya",  // Burmese
   "nld",  // Dutch (T)
   "nor",  // Norwegian
+  "nya",  // Chichewa; Chewa; Nyanja
   "por",  // Portuguese
   "ron",  // Romanian (T)
   "rum",  // Romanian (B)
@@ -428,6 +691,18 @@
   "spa",  // Spanish
   "swe",  // Swedish
   "tur",  // Turkish
+  "vie",  // Vietnamese
+  "ven",  // Venda
+  "vol",  // Volapuk
+  "wel",  // Welsh
+  "wln",  // Walloon
+  "wol",  // Wolof
+  "xho",  // Xhosa
+  "yid",  // Yiddish
+  "yor",  // Yoruba
+  "zha",  // Zhuang; Chuang
+  "zho",  // Chinese
+  "zul",  // Zulu
 };
 
 type find( char const *lang ) {
@@ -447,18 +722,110 @@
 
   static type const iso639_2_to_639_1[] = {
     unknown,
+    aa, // aar
+    ab, // abk
+    af, // afr
+    ak, // aka
+    sq, // alb
+    am, // amh
+    ar, // ara
+    an, // arg
+    hy, // arm
+    as, // asm
+    av, // ava
+    ae, // ave
+    ay, // aym
+    az, // aze
+    ba, // bak
+    bm, // bam
+    eu, // baq
+    be, // bel
+    bn, // ben
+    bh, // bih
+    bi, // bis
+    bs, // bos
+    br, // bre
+    br, // bul
+    my, // bur
+    ca, // cat
+    ch, // cha
+    ce, // che
+    zh, // chi
+    cu, // chu
+    cy, // cym
     da, // dan
     de, // deu
+    dv, // div
     nl, // dut
+    dz, // dzo
+    el, // ell
     en, // eng
+    eo, // epo
+    et, // est
+    ee, // ewe
+    fo, // fao
+    fj, // fij
     fi, // fin
     fr, // fra
     fr, // fre
+    fy, // fry
+    ff, // ful
+    ka, // geo
     de, // ger
+    gd, // gla
+    ga, // gle
+    gl, // glg
+    gv, // glv
+    el, // gre
+    gn, // grn
+    gu, // guj
+    ht, // hat
+    ha, // hau
+    he, // heb
+    hz, // her
+    hi, // hin
+    ho, // hmo
+    hr, // hrv
     hu, // hun
+    ig, // ibo
+    is, // ice
+    io, // ido
+    iu, // iku
+    ie, // ile
+    ia, // ina
+    id, // ind
+    ik, // ipk
+    is, // isl
     it, // ita
+    jv, // jav
+    ja, // jpn
+    kl, // kal
+    kn, // kan
+    ks, // kas
+    ka, // kat
+    kr, // kau
+    kk, // kaz
+    km, // khm
+    ki, // kik
+    rw, // kin
+    ky, // kir
+    kv, // kom
+    kg, // kon
+    ko, // kor
+    kj, // kua
+    ku, // kur
+    lo, // lao
+    la, // lat
+    lv, // lav
+    li, // lim
+    ln, // lin
+    lt, // lit
+    lb, // ltz
+    lu, // lub
+    my, // mya
     nl, // nld
     no, // nor
+    ny, // nya
     pt, // por
     ro, // ron
     ro, // rum
@@ -466,6 +833,18 @@
     es, // spa
     sv, // swe
     tr, // tur
+    ve, // ven
+    vi, // vie
+    vo, // vol
+    cy, // wel
+    wa, // wln
+    wo, // wol
+    xh, // xho
+    yi, // yid
+    yo, // yor
+    za, // zha
+    zh, // zho
+    zu, // zul
   };
   return iso639_2_to_639_1[ iso639_2::find( lang ) ];
 }

=== modified file 'src/zorbautils/locale.h'
--- src/zorbautils/locale.h	2012-04-24 12:39:38 +0000
+++ src/zorbautils/locale.h	2012-04-24 22:19:24 +0000
@@ -29,252 +29,252 @@
     namespace iso3166_1 {
       enum type {
         unknown,
-        AD,   // Andorra
-        AE,   // United Arab Emirates
-        AF,   // Afghanistan
-        AG,   // Antigua and Barbuda
-        AI,   // Anguilla
-        AL,   // Albania
-        AM,   // Armenia
-        AN,   // Netherlands Antilles
-        AO,   // Angola
-        AQ,   // AntarcticA
-        AR,   // ArgentinA
-        AS,   // American Samoa
-        AT,   // Austria
-        AU,   // Australia
-        AW,   // Aruba
-        AX,   // Aland Islands
-        AZ,   // Azerbaijan
-        BA,   // Bosnia and Herzegovina
-        BB,   // Barbados
-        BD,   // Bangladesh
-        BE,   // Belgium
-        BF,   // Burkina Faso
-        BG,   // Bulgaria
-        BH,   // Bahrain
-        BI,   // Burundi
-        BJ,   // Benin
-        BL,   // Saint Barthelemy
-        BM,   // Bermuda
-        BN,   // Brunei Darussalam
-        BO,   // Bolivia
-        BR,   // Brazil
-        BS,   // Bahamas
-        BT,   // Bhutan
-        BV,   // Bouvet Island
-        BW,   // Botswana
-        BY,   // Belarus
-        BZ,   // Belize
-        CA,   // Canada
-        CC,   // Cocos Islands
-        CD,   // Congo
-        CF,   // Central African Republic
-        CG,   // Congo
-        CH,   // Switzerland
-        CI,   // Cote D'Ivoire
-        CK,   // Cook Islands
-        CL,   // Chile
-        CM,   // Cameroon
-        CN,   // China
-        CO,   // Colombia
-        CR,   // Costa Rica
-        CU,   // Cuba
-        CV,   // Cape Verde
-        CX,   // Christmas Island
-        CY,   // Cyprus
-        CZ,   // Czech Republic
-        DE,   // Germany
-        DJ,   // Djibouti
-        DK,   // Denmark
-        DM,   // Dominica
-        DO,   // Dominican Republic
-        DZ,   // Algeria
-        EC,   // Ecuador
-        EE,   // Estonia
-        EG,   // Egypt
-        EH,   // Western Sahara
-        ER,   // Eritrea
-        ES,   // Spain
-        ET,   // Ethiopia
-        FI,   // Finland
-        FJ,   // Fiji
-        FK,   // Falkland Islands
-        FM,   // Micronesia
-        FO,   // Faroe Islands
-        FR,   // France
-        GA,   // Gabon
-        GB,   // United Kingdom
-        GD,   // Grenada
-        GE,   // Georgia
-        GF,   // French Guiana
-        GG,   // Guernsey
-        GH,   // Ghana
-        GI,   // Gibraltar
-        GL,   // Greenland
-        GM,   // Gambia
-        GN,   // Guinea
-        GP,   // Guadeloupe
-        GQ,   // Equatorial Guinea
-        GR,   // Greece
-        GS,   // South Georgia and the South Sandwich Islands
-        GT,   // Guatemala
-        GU,   // Guam
-        GW,   // Guinea-Bissau
-        GY,   // Guyana
-        HK,   // Hong Kong
-        HM,   // Heard Island and Mcdonald Islands
-        HN,   // Honduras
-        HR,   // Croatia
-        HT,   // Haiti
-        HU,   // Hungary
-        ID,   // Indonesia
-        IE,   // Ireland
-        IL,   // Israel
-        IM,   // Isle of Man
-        IN_,  // India [without '_', it clashes with an identifier on Windows]
-        IO,   // British Indian Ocean Territory
-        IQ,   // Iraq
-        IR,   // Iran
-        IS,   // Iceland
-        IT,   // Italy
-        JE,   // Jersey
-        JM,   // Jamaica
-        JO,   // Jordan
-        JP,   // Japan
-        KE,   // Kenya
-        KG,   // Kyrgyzstan
-        KH,   // Cambodia
-        KI,   // Kiribati
-        KM,   // Comoros
-        KN,   // Saint Kitts and Nevis
-        KP,   // Korea (Democratic People's Republic)
-        KR,   // Korea
-        KW,   // Kuwait
-        KY,   // Cayman Islands
-        KZ,   // Kazakhstan
-        LA,   // Lao
-        LB,   // Lebanon
-        LC,   // Saint Lucia
-        LI,   // Liechtenstein
-        LK,   // Sri Lanka
-        LR,   // Liberia
-        LS,   // Lesotho
-        LT,   // Lithuania
-        LU,   // Luxembourg
-        LV,   // Latvia
-        LY,   // Libyan Arab Jamahiriya
-        MA,   // Morocco
-        MC,   // Monaco
-        MD,   // Moldova
-        ME,   // Montenegro
-        MF,   // Saint Martin
-        MG,   // Madagascar
-        MH,   // Marshall Islands
-        MK,   // Macedonia
-        ML,   // Mali
-        MM,   // Myanmar
-        MN,   // Mongolia
-        MO,   // Macao
-        MP,   // Northern Mariana Islands
-        MQ,   // Martinique
-        MR,   // Mauritania
-        MS,   // Montserrat
-        MT,   // Malta
-        MU,   // Mauritius
-        MV,   // Maldives
-        MW,   // Malawi
-        MX,   // Mexico
-        MY,   // Malaysia
-        MZ,   // Mozambique
-        NA,   // Namibia
-        NC,   // New Caledonia
-        NE,   // Niger
-        NF,   // Norfolk Island
-        NG,   // Nigeria
-        NI,   // Nicaragua
-        NL,   // Netherlands
-        NO,   // Norway
-        NP,   // Nepal
-        NR,   // Nauru
-        NU,   // Niue
-        NZ,   // New Zealand
-        OM,   // Oman
-        PA,   // Panama
-        PE,   // Peru
-        PF,   // French Polynesia
-        PG,   // Papua New Guinea
-        PH,   // Philippines
-        PK,   // Pakistan
-        PL,   // Poland
-        PM,   // Saint Pierre and Miquelon
-        PN,   // Pitcairn
-        PR,   // Puerto Rico
-        PS,   // Palestinian Territory
-        PT,   // Portugal
-        PW,   // Palau
-        PY,   // Paraguay
-        QA,   // Qatar
-        RE,   // Reunion
-        RO,   // Romania
-        RS,   // Serbia
-        RU,   // Russian Federation
-        RW,   // Rwanda
-        SA,   // Saudi Arabia
-        SB,   // Solomon Islands
-        SC,   // Seychelles
-        SD,   // Sudan
-        SE,   // Sweden
-        SG,   // Singapore
-        SH,   // Saint Helena
-        SI,   // Slovenia
-        SJ,   // Svalbard and Jan Mayen
-        SK,   // Slovakia
-        SL,   // Sierra Leone
-        SM,   // San Marino
-        SN,   // Senegal
-        SO,   // Somalia
-        SR,   // Suriname
-        ST,   // Sao Tome and Principe
-        SV,   // El Salvador
-        SY,   // Syria
-        SZ,   // Swaziland
-        TC,   // Turks and Caicos Islands
-        TD,   // Chad
-        TF,   // French Southern Territories
-        TG,   // Togo
-        TH,   // Thailand
-        TJ,   // Tajikistan
-        TK,   // Tokelau
-        TL,   // Timor-Leste
-        TM,   // Turkmenistan
-        TN,   // Tunisia
-        TO,   // Tonga
-        TR,   // Turkey
-        TT,   // Trinidad and Tobago
-        TV,   // Tuvalu
-        TW,   // Taiwan
-        TZ,   // Tanzania
-        UA,   // Ukraine
-        UG,   // Uganda
-        UM,   // United states Minor Outlying Islands
-        US,   // United States
-        UY,   // Uruguay
-        UZ,   // Uzbekistan
-        VA,   // Vatican
-        VC,   // Saint Vincent and the Grenadines
-        VE,   // Venezuela
-        VG,   // Virgin Islands (British)
-        VI,   // Virgin Islands (USA)
-        VN,   // Viet Nam
-        VU,   // Vanuatu
-        WF,   // Wallis and Futuna
-        WS,   // Samoa
-        YE,   // Yemen
-        YT,   // Mayotte
-        ZA,   // South Africa
-        ZM,   // Zambia
-        ZW,   // Zimbabwe
+        AD,   ///< Andorra
+        AE,   ///< United Arab Emirates
+        AF,   ///< Afghanistan
+        AG,   ///< Antigua and Barbuda
+        AI,   ///< Anguilla
+        AL,   ///< Albania
+        AM,   ///< Armenia
+        AN,   ///< Netherlands Antilles
+        AO,   ///< Angola
+        AQ,   ///< Antarctica
+        AR,   ///< Argentina
+        AS,   ///< American Samoa
+        AT,   ///< Austria
+        AU,   ///< Australia
+        AW,   ///< Aruba
+        AX,   ///< Aland Islands
+        AZ,   ///< Azerbaijan
+        BA,   ///< Bosnia and Herzegovina
+        BB,   ///< Barbados
+        BD,   ///< Bangladesh
+        BE,   ///< Belgium
+        BF,   ///< Burkina Faso
+        BG,   ///< Bulgaria
+        BH,   ///< Bahrain
+        BI,   ///< Burundi
+        BJ,   ///< Benin
+        BL,   ///< Saint Barthelemy
+        BM,   ///< Bermuda
+        BN,   ///< Brunei Darussalam
+        BO,   ///< Bolivia
+        BR,   ///< Brazil
+        BS,   ///< Bahamas
+        BT,   ///< Bhutan
+        BV,   ///< Bouvet Island
+        BW,   ///< Botswana
+        BY,   ///< Belarus
+        BZ,   ///< Belize
+        CA,   ///< Canada
+        CC,   ///< Cocos Islands
+        CD,   ///< Congo
+        CF,   ///< Central African Republic
+        CG,   ///< Congo
+        CH,   ///< Switzerland
+        CI,   ///< Cote D'Ivoire
+        CK,   ///< Cook Islands
+        CL,   ///< Chile
+        CM,   ///< Cameroon
+        CN,   ///< China
+        CO,   ///< Colombia
+        CR,   ///< Costa Rica
+        CU,   ///< Cuba
+        CV,   ///< Cape Verde
+        CX,   ///< Christmas Island
+        CY,   ///< Cyprus
+        CZ,   ///< Czech Republic
+        DE,   ///< Germany
+        DJ,   ///< Djibouti
+        DK,   ///< Denmark
+        DM,   ///< Dominica
+        DO,   ///< Dominican Republic
+        DZ,   ///< Algeria
+        EC,   ///< Ecuador
+        EE,   ///< Estonia
+        EG,   ///< Egypt
+        EH,   ///< Western Sahara
+        ER,   ///< Eritrea
+        ES,   ///< Spain
+        ET,   ///< Ethiopia
+        FI,   ///< Finland
+        FJ,   ///< Fiji
+        FK,   ///< Falkland Islands
+        FM,   ///< Micronesia
+        FO,   ///< Faroe Islands
+        FR,   ///< France
+        GA,   ///< Gabon
+        GB,   ///< United Kingdom
+        GD,   ///< Grenada
+        GE,   ///< Georgia
+        GF,   ///< French Guiana
+        GG,   ///< Guernsey
+        GH,   ///< Ghana
+        GI,   ///< Gibraltar
+        GL,   ///< Greenland
+        GM,   ///< Gambia
+        GN,   ///< Guinea
+        GP,   ///< Guadeloupe
+        GQ,   ///< Equatorial Guinea
+        GR,   ///< Greece
+        GS,   ///< South Georgia and the South Sandwich Islands
+        GT,   ///< Guatemala
+        GU,   ///< Guam
+        GW,   ///< Guinea-Bissau
+        GY,   ///< Guyana
+        HK,   ///< Hong Kong
+        HM,   ///< Heard Island and Mcdonald Islands
+        HN,   ///< Honduras
+        HR,   ///< Croatia
+        HT,   ///< Haiti
+        HU,   ///< Hungary
+        ID,   ///< Indonesia
+        IE,   ///< Ireland
+        IL,   ///< Israel
+        IM,   ///< Isle of Man
+        IN_,  ///< India [without '_', it clashes with an identifier on Windows]
+        IO,   ///< British Indian Ocean Territory
+        IQ,   ///< Iraq
+        IR,   ///< Iran
+        IS,   ///< Iceland
+        IT,   ///< Italy
+        JE,   ///< Jersey
+        JM,   ///< Jamaica
+        JO,   ///< Jordan
+        JP,   ///< Japan
+        KE,   ///< Kenya
+        KG,   ///< Kyrgyzstan
+        KH,   ///< Cambodia
+        KI,   ///< Kiribati
+        KM,   ///< Comoros
+        KN,   ///< Saint Kitts and Nevis
+        KP,   ///< Korea (Democratic People's Republic)
+        KR,   ///< Korea
+        KW,   ///< Kuwait
+        KY,   ///< Cayman Islands
+        KZ,   ///< Kazakhstan
+        LA,   ///< Lao
+        LB,   ///< Lebanon
+        LC,   ///< Saint Lucia
+        LI,   ///< Liechtenstein
+        LK,   ///< Sri Lanka
+        LR,   ///< Liberia
+        LS,   ///< Lesotho
+        LT,   ///< Lithuania
+        LU,   ///< Luxembourg
+        LV,   ///< Latvia
+        LY,   ///< Libyan Arab Jamahiriya
+        MA,   ///< Morocco
+        MC,   ///< Monaco
+        MD,   ///< Moldova
+        ME,   ///< Montenegro
+        MF,   ///< Saint Martin
+        MG,   ///< Madagascar
+        MH,   ///< Marshall Islands
+        MK,   ///< Macedonia
+        ML,   ///< Mali
+        MM,   ///< Myanmar
+        MN,   ///< Mongolia
+        MO,   ///< Macao
+        MP,   ///< Northern Mariana Islands
+        MQ,   ///< Martinique
+        MR,   ///< Mauritania
+        MS,   ///< Montserrat
+        MT,   ///< Malta
+        MU,   ///< Mauritius
+        MV,   ///< Maldives
+        MW,   ///< Malawi
+        MX,   ///< Mexico
+        MY,   ///< Malaysia
+        MZ,   ///< Mozambique
+        NA,   ///< Namibia
+        NC,   ///< New Caledonia
+        NE,   ///< Niger
+        NF,   ///< Norfolk Island
+        NG,   ///< Nigeria
+        NI,   ///< Nicaragua
+        NL,   ///< Netherlands
+        NO,   ///< Norway
+        NP,   ///< Nepal
+        NR,   ///< Nauru
+        NU,   ///< Niue
+        NZ,   ///< New Zealand
+        OM,   ///< Oman
+        PA,   ///< Panama
+        PE,   ///< Peru
+        PF,   ///< French Polynesia
+        PG,   ///< Papua New Guinea
+        PH,   ///< Philippines
+        PK,   ///< Pakistan
+        PL,   ///< Poland
+        PM,   ///< Saint Pierre and Miquelon
+        PN,   ///< Pitcairn
+        PR,   ///< Puerto Rico
+        PS,   ///< Palestinian Territory
+        PT,   ///< Portugal
+        PW,   ///< Palau
+        PY,   ///< Paraguay
+        QA,   ///< Qatar
+        RE,   ///< Reunion
+        RO,   ///< Romania
+        RS,   ///< Serbia
+        RU,   ///< Russian Federation
+        RW,   ///< Rwanda
+        SA,   ///< Saudi Arabia
+        SB,   ///< Solomon Islands
+        SC,   ///< Seychelles
+        SD,   ///< Sudan
+        SE,   ///< Sweden
+        SG,   ///< Singapore
+        SH,   ///< Saint Helena
+        SI,   ///< Slovenia
+        SJ,   ///< Svalbard and Jan Mayen
+        SK,   ///< Slovakia
+        SL,   ///< Sierra Leone
+        SM,   ///< San Marino
+        SN,   ///< Senegal
+        SO,   ///< Somalia
+        SR,   ///< Suriname
+        ST,   ///< Sao Tome and Principe
+        SV,   ///< El Salvador
+        SY,   ///< Syria
+        SZ,   ///< Swaziland
+        TC,   ///< Turks and Caicos Islands
+        TD,   ///< Chad
+        TF,   ///< French Southern Territories
+        TG,   ///< Togo
+        TH,   ///< Thailand
+        TJ,   ///< Tajikistan
+        TK,   ///< Tokelau
+        TL,   ///< Timor-Leste
+        TM,   ///< Turkmenistan
+        TN,   ///< Tunisia
+        TO,   ///< Tonga
+        TR,   ///< Turkey
+        TT,   ///< Trinidad and Tobago
+        TV,   ///< Tuvalu
+        TW,   ///< Taiwan
+        TZ,   ///< Tanzania
+        UA,   ///< Ukraine
+        UG,   ///< Uganda
+        UM,   ///< United States Minor Outlying Islands
+        US,   ///< United States
+        UY,   ///< Uruguay
+        UZ,   ///< Uzbekistan
+        VA,   ///< Vatican
+        VC,   ///< Saint Vincent and the Grenadines
+        VE,   ///< Venezuela
+        VG,   ///< Virgin Islands (British)
+        VI,   ///< Virgin Islands (USA)
+        VN,   ///< Viet Nam
+        VU,   ///< Vanuatu
+        WF,   ///< Wallis and Futuna
+        WS,   ///< Samoa
+        YE,   ///< Yemen
+        YT,   ///< Mayotte
+        ZA,   ///< South Africa
+        ZM,   ///< Zambia
+        ZW,   ///< Zimbabwe
         NUM_ENTRIES
       };
       extern char const *const string_of[];
@@ -294,7 +294,7 @@
        * Finds the ISO 3166-1 country code enumeration from the given string.
        *
        * @param country An ISO 3166-1 country code.
-       * @return Returns said enumeration or <code>unknown</code>.
+       * @return Returns said enumeration or \c unknown.
        */
       type find( char const *country );
     }
@@ -319,7 +319,7 @@
        * Finds the ISO 639-1 language code enumeration from the given string.
        *
        * @param lang An ISO 639-1 langauge code.
-       * @return Returns said enumeration or <code>unknown</code>.
+       * @return Returns said enumeration or \c unknown.
        */
       type find( char const *lang );
     }
@@ -329,25 +329,129 @@
     namespace iso639_2 {
       enum type {
         unknown,
-        dan,  // Danish
-        deu,  // German (T)
-        dut,  // Dutch (B)
-        eng,  // English
-        fin,  // Finnish
-        fra,  // French (T)
-        fre,  // French (B)
-        ger,  // German (B)
-        hun,  // Hungarian
-        ita,  // Italian
-        nld,  // Dutch (T)
-        nor,  // Norwegian
-        por,  // Portuguese
-        ron,  // Romanian (T)
-        rum,  // Romanian (B)
-        rus,  // Russian
-        spa,  // Spanish
-        swe,  // Swedish
-        tur,  // Turkish
+        aar,  ///< Afar
+        abk,  ///< Abkhazian
+        afr,  ///< Afrikaans
+        aka,  ///< Akan
+        alb,  ///< Albanian
+        amh,  ///< Amharic
+        ara,  ///< Arabic
+        arg,  ///< Aragonese
+        arm,  ///< Armenian
+        asm_, ///< Assamese [without '_', it's a C++ keyword]
+        ava,  ///< Avaric
+        ave,  ///< Avestan
+        aym,  ///< Aymara
+        aze,  ///< Azerbaijani
+        bak,  ///< Bashkir
+        bam,  ///< Bambara
+        baq,  ///< Basque
+        bel,  ///< Belarusian
+        ben,  ///< Bengali
+        bih,  ///< Bihari
+        bis,  ///< Bislama
+        bos,  ///< Bosnian
+        bre,  ///< Breton
+        bul,  ///< Bulgarian
+        bur,  ///< Burmese
+        cat,  ///< Catalan
+        cha,  ///< Chamorro
+        che,  ///< Chechen
+        chi,  ///< Chinese
+        chu,  ///< Church Slavic; Old Slavonic; Church Slavonic
+        cym,  ///< Welsh
+        dan,  ///< Danish
+        deu,  ///< German (T)
+        div,  ///< Divehi; Dhivehi; Maldivian
+        dut,  ///< Dutch (B)
+        dzo,  ///< Dzongkha
+        ell,  ///< Modern Greek
+        eng,  ///< English
+        epo,  ///< Esperanto
+        est,  ///< Estonian
+        ewe,  ///< Ewe
+        fao,  ///< Faroese
+        fij,  ///< Fijian
+        fin,  ///< Finnish
+        fra,  ///< French (T)
+        fre,  ///< French (B)
+        fry,  ///< Western Frisian
+        ful,  ///< Fulah
+        geo,  ///< Georgian
+        ger,  ///< German (B)
+        gla,  ///< Scottish Gaelic; Gaelic
+        gle,  ///< Irish
+        glg,  ///< Galician
+        glv,  ///< Manx
+        gre,  ///< Modern Greek
+        grn,  ///< Guarani
+        guj,  ///< Gujarati
+        hat,  ///< Haitian Creole; Haitian
+        hau,  ///< Hausa
+        heb,  ///< Hebrew
+        her,  ///< Herero
+        hin,  ///< Hindi
+        hmo,  ///< Hiri Motu
+        hrv,  ///< Croatian
+        hun,  ///< Hungarian
+        ibo,  ///< Igbo
+        ice,  ///< Icelandic
+        ido,  ///< Ido
+        iku,  ///< Inuktitut
+        ile,  ///< Interlingue; Occidental
+        ina,  ///< Interlingua
+        ind,  ///< Indonesian
+        ipk,  ///< Inupiaq
+        isl,  ///< Icelandic
+        ita,  ///< Italian
+        jav,  ///< Javanese
+        jpn,  ///< Japanese
+        kal,  ///< Kalaallisut; Greenlandic
+        kan,  ///< Kannada
+        kas,  ///< Kashmiri
+        kat,  ///< Georgian
+        kau,  ///< Kanuri
+        kaz,  ///< Kazakh
+        khm,  ///< Central Khmer
+        kik,  ///< Kikuyu; Gikuyu
+        kin,  ///< Kinyarwanda
+        kir,  ///< Kirghiz; Kyrgyz
+        kom,  ///< Komi
+        kon,  ///< Kongo
+        kor,  ///< Korean
+        kua,  ///< Kuanyama; Kwanyama
+        kur,  ///< Kurdish
+        lao,  ///< Lao
+        lat,  ///< Latin
+        lav,  ///< Latvian
+        lim,  ///< Limburgan; Limburger; Limburgish
+        lin,  ///< Lingala
+        lit,  ///< Lithuanian
+        ltz,  ///< Luxembourgish; Letzeburgesch
+        lub,  ///< Luba-Katanga
+        mya,  ///< Burmese
+        nld,  ///< Dutch (T)
+        nor,  ///< Norwegian
+        nya,  ///< Chichewa; Chewa; Nyanja
+        por,  ///< Portuguese
+        ron,  ///< Romanian (T)
+        rum,  ///< Romanian (B)
+        rus,  ///< Russian
+        spa,  ///< Spanish
+        swe,  ///< Swedish
+        tur,  ///< Turkish
+        ven,  ///< Venda
+        vie,  ///< Vietnamese
+        vol,  ///< Volapuk
+        wel,  ///< Welsh
+        wln,  ///< Walloon
+        wol,  ///< Wolof
+        xho,  ///< Xhosa
+        yid,  ///< Yiddish
+        yor,  ///< Yoruba
+        zha,  ///< Zhuang; Chuang
+        zho,  ///< Chinese
+        zul,  ///< Zulu
         NUM_ENTRIES
       };
       extern char const *const string_of[];
@@ -367,7 +471,7 @@
        * Finds the ISO 639-2 language code enumeration from the given string.
        *
        * @param lang An ISO 639-2 langauge code.
-       * @return Returns said enumeration or <code>unknown</code>.
+       * @return Returns said enumeration or \c unknown.
        */
       type find( char const *lang );
     }
@@ -378,21 +482,21 @@
      * Finds the ISO 639-1 language code enumeration from the given string.
      *
      * @param lang Either an ISO 639-1 or an ISO 639-2 langauge code.
-     * @return Returns said enumeration or <code>unknown</code>.
+     * @return Returns said enumeration or \c unknown.
      */
     iso639_1::type find_lang( char const *lang );
 
     /**
      * Gets the ISO 3166-1 country code enumeration for the host system.
      *
-     * @return Returns said enumeration or <code>unknown</code>.
+     * @return Returns said enumeration or \c unknown.
      */
     iso3166_1::type get_host_country();
 
     /**
      * Gets the ISO 639-1 language code enumeration for the host system.
      *
-     * @return Returns said enumeration defaulting to <code>en</code>.
+     * @return Returns said enumeration defaulting to \c en.
      */
     iso639_1::type get_host_lang();
 

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-current-lang-true-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-current-lang-true-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-current-lang-true-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-false-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-false-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-3.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-3.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-4.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-stop-word-true-4.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+false

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+flavor

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+chic

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-3.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-3.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+flavor

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-4.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-stem-4.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+chic

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-strip-diacritics-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-strip-diacritics-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-strip-diacritics-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+e

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-3.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-3.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-4.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-4.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-5.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-5.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-thesaurus-lookup-5.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-3.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-3.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-3.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-4.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-4.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-4.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-string-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-1.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,112 @@
+<tokenizer-properties xmlns="http://www.zorba-xquery.com/modules/full-text"; uri="http://www.zorba-xquery.com/full-text/tokenizer/icu";>
+  <comments-separate-tokens value="true"/>
+  <elements-separate-tokens value="true"/>
+  <processing-instructions-separate-tokens value="true"/>
+  <supported-languages>
+    <lang>af</lang>
+    <lang>ak</lang>
+    <lang>am</lang>
+    <lang>ar</lang>
+    <lang>as</lang>
+    <lang>az</lang>
+    <lang>be</lang>
+    <lang>bg</lang>
+    <lang>bm</lang>
+    <lang>bn</lang>
+    <lang>bo</lang>
+    <lang>br</lang>
+    <lang>bs</lang>
+    <lang>ca</lang>
+    <lang>cs</lang>
+    <lang>cy</lang>
+    <lang>da</lang>
+    <lang>de</lang>
+    <lang>ee</lang>
+    <lang>el</lang>
+    <lang>en</lang>
+    <lang>eo</lang>
+    <lang>es</lang>
+    <lang>et</lang>
+    <lang>eu</lang>
+    <lang>fa</lang>
+    <lang>ff</lang>
+    <lang>fi</lang>
+    <lang>fo</lang>
+    <lang>fr</lang>
+    <lang>ga</lang>
+    <lang>gl</lang>
+    <lang>gu</lang>
+    <lang>gv</lang>
+    <lang>ha</lang>
+    <lang>he</lang>
+    <lang>hi</lang>
+    <lang>hr</lang>
+    <lang>hu</lang>
+    <lang>hy</lang>
+    <lang>id</lang>
+    <lang>ig</lang>
+    <lang>ii</lang>
+    <lang>is</lang>
+    <lang>it</lang>
+    <lang>ja</lang>
+    <lang>ka</lang>
+    <lang>ki</lang>
+    <lang>kk</lang>
+    <lang>kl</lang>
+    <lang>km</lang>
+    <lang>kn</lang>
+    <lang>ko</lang>
+    <lang>kw</lang>
+    <lang>lg</lang>
+    <lang>ln</lang>
+    <lang>lt</lang>
+    <lang>lu</lang>
+    <lang>lv</lang>
+    <lang>mg</lang>
+    <lang>mk</lang>
+    <lang>ml</lang>
+    <lang>mr</lang>
+    <lang>ms</lang>
+    <lang>mt</lang>
+    <lang>my</lang>
+    <lang>nb</lang>
+    <lang>nd</lang>
+    <lang>ne</lang>
+    <lang>nl</lang>
+    <lang>nn</lang>
+    <lang>om</lang>
+    <lang>or</lang>
+    <lang>pa</lang>
+    <lang>pl</lang>
+    <lang>ps</lang>
+    <lang>pt</lang>
+    <lang>rm</lang>
+    <lang>rn</lang>
+    <lang>ro</lang>
+    <lang>ru</lang>
+    <lang>rw</lang>
+    <lang>sg</lang>
+    <lang>si</lang>
+    <lang>sk</lang>
+    <lang>sl</lang>
+    <lang>sn</lang>
+    <lang>so</lang>
+    <lang>sq</lang>
+    <lang>sr</lang>
+    <lang>sv</lang>
+    <lang>sw</lang>
+    <lang>ta</lang>
+    <lang>te</lang>
+    <lang>th</lang>
+    <lang>ti</lang>
+    <lang>to</lang>
+    <lang>tr</lang>
+    <lang>uk</lang>
+    <lang>ur</lang>
+    <lang>uz</lang>
+    <lang>vi</lang>
+    <lang>yo</lang>
+    <lang>zh</lang>
+    <lang>zu</lang>
+  </supported-languages>
+</tokenizer-properties>

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-2.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-2.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenizer-properties-2.xml.res	2012-04-24 22:19:24 +0000
@@ -0,0 +1,112 @@
+<tokenizer-properties xmlns="http://www.zorba-xquery.com/modules/full-text"; uri="http://www.zorba-xquery.com/full-text/tokenizer/icu";>
+  <comments-separate-tokens value="true"/>
+  <elements-separate-tokens value="true"/>
+  <processing-instructions-separate-tokens value="true"/>
+  <supported-languages>
+    <lang>af</lang>
+    <lang>ak</lang>
+    <lang>am</lang>
+    <lang>ar</lang>
+    <lang>as</lang>
+    <lang>az</lang>
+    <lang>be</lang>
+    <lang>bg</lang>
+    <lang>bm</lang>
+    <lang>bn</lang>
+    <lang>bo</lang>
+    <lang>br</lang>
+    <lang>bs</lang>
+    <lang>ca</lang>
+    <lang>cs</lang>
+    <lang>cy</lang>
+    <lang>da</lang>
+    <lang>de</lang>
+    <lang>ee</lang>
+    <lang>el</lang>
+    <lang>en</lang>
+    <lang>eo</lang>
+    <lang>es</lang>
+    <lang>et</lang>
+    <lang>eu</lang>
+    <lang>fa</lang>
+    <lang>ff</lang>
+    <lang>fi</lang>
+    <lang>fo</lang>
+    <lang>fr</lang>
+    <lang>ga</lang>
+    <lang>gl</lang>
+    <lang>gu</lang>
+    <lang>gv</lang>
+    <lang>ha</lang>
+    <lang>he</lang>
+    <lang>hi</lang>
+    <lang>hr</lang>
+    <lang>hu</lang>
+    <lang>hy</lang>
+    <lang>id</lang>
+    <lang>ig</lang>
+    <lang>ii</lang>
+    <lang>is</lang>
+    <lang>it</lang>
+    <lang>ja</lang>
+    <lang>ka</lang>
+    <lang>ki</lang>
+    <lang>kk</lang>
+    <lang>kl</lang>
+    <lang>km</lang>
+    <lang>kn</lang>
+    <lang>ko</lang>
+    <lang>kw</lang>
+    <lang>lg</lang>
+    <lang>ln</lang>
+    <lang>lt</lang>
+    <lang>lu</lang>
+    <lang>lv</lang>
+    <lang>mg</lang>
+    <lang>mk</lang>
+    <lang>ml</lang>
+    <lang>mr</lang>
+    <lang>ms</lang>
+    <lang>mt</lang>
+    <lang>my</lang>
+    <lang>nb</lang>
+    <lang>nd</lang>
+    <lang>ne</lang>
+    <lang>nl</lang>
+    <lang>nn</lang>
+    <lang>om</lang>
+    <lang>or</lang>
+    <lang>pa</lang>
+    <lang>pl</lang>
+    <lang>ps</lang>
+    <lang>pt</lang>
+    <lang>rm</lang>
+    <lang>rn</lang>
+    <lang>ro</lang>
+    <lang>ru</lang>
+    <lang>rw</lang>
+    <lang>sg</lang>
+    <lang>si</lang>
+    <lang>sk</lang>
+    <lang>sl</lang>
+    <lang>sn</lang>
+    <lang>so</lang>
+    <lang>sq</lang>
+    <lang>sr</lang>
+    <lang>sv</lang>
+    <lang>sw</lang>
+    <lang>ta</lang>
+    <lang>te</lang>
+    <lang>th</lang>
+    <lang>ti</lang>
+    <lang>to</lang>
+    <lang>tr</lang>
+    <lang>uk</lang>
+    <lang>ur</lang>
+    <lang>uz</lang>
+    <lang>vi</lang>
+    <lang>yo</lang>
+    <lang>zh</lang>
+    <lang>zu</lang>
+  </supported-languages>
+</tokenizer-properties>

=== modified file 'test/rbkt/Queries/CMakeLists.txt'
--- test/rbkt/Queries/CMakeLists.txt	2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/CMakeLists.txt	2012-04-24 22:19:24 +0000
@@ -109,10 +109,15 @@
 # depend on module features into the modules themselves.
 
 
-# Check if WordNet thesaurus is installed in the location tests expect
-IF(EXISTS "${CMAKE_BINARY_DIR}/test/rbkt/thesauri/wordnet-en.zth")
+# Check if WordNet thesaurus is installed
+SET(WORDNET_THESAURUS_FILE 
+  "${CMAKE_BINARY_DIR}/LIB_PATH/edu/princeton/wordnet/wordnet-en.zth")
+IF(EXISTS "${WORDNET_THESAURUS_FILE}")
   SET(ZORBA_WORDNET_FOUND 1)
-ENDIF(EXISTS "${CMAKE_BINARY_DIR}/test/rbkt/thesauri/wordnet-en.zth")
+  # Kind of a weird place to put this directive, but convenient
+  INSTALL(FILES "${WORDNET_THESAURUS_FILE}"
+    DESTINATION "${ZORBA_CORE_LIB_DIR}/edu/princeton/wordnet")
+ENDIF(EXISTS "${WORDNET_THESAURUS_FILE}")
 
 IF(ZORBA_SUPPRESS_CURL)
   MESSAGE(STATUS "ZORBA_SUPPRESS_CURL is true - not searching for cURL library")
@@ -273,6 +278,7 @@
   EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-SCORE/score-queries-results-q1 866923)
   EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-SCORE/score-queries-results-q3 866923)
   EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-SCORE/score-queries-results-q3b 866923)
+  EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q1 987632)
   EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q4 866926)
   EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9 866926)
   EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQuery/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9b 866926)
@@ -355,6 +361,7 @@
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-SCORE/score-queries-results-q1 866923)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-SCORE/score-queries-results-q3 866923)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-SCORE/score-queries-results-q3b 866923)
+    EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q1 987632)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-FULL-TEXT-COMPOSABILITY/full-text-composability-queries-results-q4 866926)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9 866926)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-XQUERY-XPATH-COMPOSABILITY/xquery-xpath-composability-queries-results-q9b 866926)
@@ -432,6 +439,7 @@
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/UseCase/UseCase-STOP-WORD/stop-word-queries-results-q3 909375)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.7-StopWordOption/ft-5.2.11-examples-q5 909375)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.7-StopWordOption/ft-5.2.11-examples-q4 909375)
+    EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q4 909375)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q3 909375)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q2 909375)
     EXPECTED_FAILURE(test/rbkt/w3c_full_text_testsuite/XQueryX/Examples/3.4.3-ThesaurusOption/ft-3.4.3-examples-q1 909375)
@@ -523,7 +531,9 @@
 EXPECTED_FAILURE(test/rbkt/zorba/http-client/put/put3_binary_element 3391756)
 EXPECTED_FAILURE(test/rbkt/zorba/http-client/post/post3_binary_element 3391756)
 IF(NOT ZORBA_NO_ICU)
-  EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err17 974477)
+  IF ( ${ICU_VERSION} VERSION_LESS 4.0.0 )
+    EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_err17 974477)
+  ENDIF ( ${ICU_VERSION} VERSION_LESS 4.0.0 )
   EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m11 866874)
   EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m40 866874)
   EXPECTED_FAILURE(test/rbkt/zorba/string/Regex/regex_m41 866874)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-current-lang-true-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "zu";
+
+ft:current-lang() eq "zu"

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-da-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("da") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-de-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("de") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-en-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-es-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("es") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-fi-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("fi") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-hu-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("hu") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-it-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("it") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-nl-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("nl") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-no-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("no") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-pt-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("pt") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-ru-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("ru") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Valid, but unsupported ISO 639-1 code. :)
+ft:is-stem-lang-supported( xs:language("zu") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-supported-false-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Invalid ISO 639-1 code. :)
+ft:is-stem-lang-supported( xs:language("XX") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stem-lang-sv-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stem-lang-supported( xs:language("sv") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-false-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-false-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word( "flavor", xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-da-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-de-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-en-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-es-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fi-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-fr-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-hu-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-it-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-nl-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-no-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-pt-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Valid, but unsupported ISO 639-1 code. :)
+ft:is-stop-word-lang-supported( xs:language("zu") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-supported-false-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Invalid ISO 639-1 code. :)
+ft:is-stop-word-lang-supported( xs:language("XX") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-lang-sv-supported-true.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word( "the", xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "en";
+
+ft:is-stop-word( "the" )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-3.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-stop-word( "el", xs:language("es") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-stop-word-true-4.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "es";
+
+ft:is-stop-word( "el" )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Valid, but unsupported ISO 639-1 code. :)
+ft:is-thesaurus-lang-supported( xs:language("zu") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Invalid ISO 639-1 code. :)
+ft:is-thesaurus-lang-supported( xs:language("XX") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,1 @@
+Error: http://www.w3.org/2005/xqt-errors:FTST0018

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-false-3.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Thesaurus URI that is not statically known. :)
+ft:is-thesaurus-lang-supported( "http://www.example.com/";, xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-thesaurus-lang-supported( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+Args: 
+--thesaurus 
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-is-thesaurus-lang-supported-true-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,6 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:is-thesaurus-lang-supported( "http://wordnet.princeton.edu";,
+                                xs:language("en") )
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:stem( "flavoring", xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:stem( "chico", xs:language("es") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-3.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "en";
+
+ft:stem( "flavoring" )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-stem-4.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "es";
+
+ft:stem( "chico" )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-strip-diacritics-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:strip-diacritics( "é" )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,6 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "en";
+
+let $synonyms := ft:thesaurus-lookup( "marmite" )
+return $synonyms = "pot"

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+Args: 
+--thesaurus 
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,6 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu";,
+                                      "marmite",
+                                      xs:language("en") )
+return $synonyms = "pot"

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+Args: 
+--thesaurus 
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-3.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,7 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "en";
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu";,
+                                      "marmite" )
+return $synonyms = "pot"

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+Args: 
+--thesaurus 
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-4.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,7 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu";,
+                                      "breakfast",
+                                      xs:language("en"),
+                                      "BT" )
+return $synonyms = "meal"

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+Args: 
+--thesaurus 
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-thesaurus-lookup-5.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,8 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $synonyms := ft:thesaurus-lookup( "http://wordnet.princeton.edu";,
+                                      "breakfast",
+                                      xs:language("en"),
+                                      "USE",
+                                      2, 2 )
+return $synonyms = "nourishment"

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,18 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $doc := <msg>hello, world</msg>
+let $tokens := ft:tokenize( $doc, xs:language("en") )
+let $t1 := $tokens[1]
+let $t2 := $tokens[2]
+
+return  $t1/@value = "hello"
+    and $t1/@lang = "en"
+    and $t1/@paragraph = 1
+    and $t1/@sentence = 1
+
+    and $t2/@value = "world"
+    and $t2/@lang = "en"
+    and $t2/@paragraph = 1
+    and $t2/@sentence = 1
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,18 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $doc := <msg xml:lang="es">hola, mundo</msg>
+let $tokens := ft:tokenize( $doc )
+let $t1 := $tokens[1]
+let $t2 := $tokens[2]
+
+return  $t1/@value = "hola"
+    and $t1/@lang = "es"
+    and $t1/@paragraph = 1
+    and $t1/@sentence = 1
+
+    and $t2/@value = "mundo"
+    and $t2/@lang = "es"
+    and $t2/@paragraph = 1
+    and $t2/@sentence = 1
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-3.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+import module namespace ref = "http://www.zorba-xquery.com/modules/node-reference";;
+
+let $x := <p xml:lang="en">Houston, we have a <em>problem</em>!</p>
+let $tokens := ft:tokenize( $x )
+let $node-ref := $tokens[5]/@node-ref
+let $node := ref:node-by-reference( $node-ref )
+return $node instance of text()
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-4.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+import module namespace ref = "http://www.zorba-xquery.com/modules/node-reference";;
+
+let $x := <msg xml:lang="en" content="Houston, we have a problem!"/>
+let $tokens := ft:tokenize( $x/@content )
+let $node-ref := $tokens[5]/@node-ref
+let $node := ref:node-by-reference( $node-ref )
+return $node instance of attribute(content)
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,8 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $x := "hello, world"
+let $tokens := ft:tokenize-string( $x, xs:language("en") )
+return $tokens[1] = "hello"
+   and $tokens[2] = "world"
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-string-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "en";
+
+let $x := "hello, world"
+let $tokens := ft:tokenize-string( $x )
+return $tokens[1] = "hello"
+   and $tokens[2] = "world"
+
+(: vim:set et sw=2 ts=2: :)

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,3 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+ft:tokenizer-properties( xs:language("en") )

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenizer-properties-2.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,5 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+declare ft-option using language "en";
+
+ft:tokenizer-properties()

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.spec	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.spec	2012-04-24 22:19:24 +0000
@@ -0,0 +1,4 @@
+Error: http://www.w3.org/2005/xqt-errors:FOCA0003
+Args: 
+--thesaurus 
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-FOCA0003-1.xq	2012-04-24 22:19:24 +0000
@@ -0,0 +1,10 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+
+(: Invalid at least/most range. :)
+ft:thesaurus-lookup( "http://wordnet.princeton.edu";,
+                     "affluent",
+                     xs:language("en"),
+                     "use",
+                     -1, 2 )
+
+(: vim:set et sw=2 ts=2: :)

=== removed file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-1.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-1.spec	2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-1.spec	1970-01-01 00:00:00 +0000
@@ -1,3 +0,0 @@
-Args: 
---thesaurus 
-##default:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth

=== removed file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-2.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-2.spec	2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-2.spec	1970-01-01 00:00:00 +0000
@@ -1,3 +0,0 @@
-Args: 
---thesaurus 
-##default:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth

=== modified file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-3.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-3.spec	2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-3.spec	2012-04-24 22:19:24 +0000
@@ -1,3 +1,3 @@
 Args: 
 --thesaurus 
-http://wordnet.princeton.edu:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== modified file 'test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-4.spec'
--- test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-4.spec	2012-04-24 12:39:38 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-thesaurus-true-4.spec	2012-04-24 22:19:24 +0000
@@ -1,3 +1,3 @@
 Args: 
 --thesaurus 
-http://wordnet.princeton.edu:=$RBKT_BINARY_DIR/thesauri/wordnet-en.zth
+http://wordnet.princeton.edu:=wordnet://wordnet.princeton.edu

=== modified file 'test/rbkt/Scripts/w3c/import_w3c_full_text_testsuite.sh'
--- test/rbkt/Scripts/w3c/import_w3c_full_text_testsuite.sh	2012-04-24 12:39:38 +0000
+++ test/rbkt/Scripts/w3c/import_w3c_full_text_testsuite.sh	2012-04-24 22:19:24 +0000
@@ -148,7 +148,7 @@
   # does not understand $RBKT_SRC_DIR. Should change specification.h to
   # do that replacement universally and eliminate the numerous other places
   # that do it.
-  $thesauri {$id} = "$uri:=xqftts|$test_src_path/$path";
+  $thesauri {$id} = "$uri:=xqftts://$test_src_path/$path";
   next;
 }
 if (m/^%stop /) {

=== modified file 'test/rbkt/testdriver.cpp'
--- test/rbkt/testdriver.cpp	2012-04-24 12:39:38 +0000
+++ test/rbkt/testdriver.cpp	2012-04-24 22:19:24 +0000
@@ -26,7 +26,7 @@
 #include <time.h>
 #endif
 
-//#define ZORBA_TEST_PLAN_SERIALIZATION
+/*#define ZORBA_TEST_PLAN_SERIALIZATION /**/
 
 #include "testdriverconfig.h" // SRC and BIN dir definitions
 #include "specification.h" // parsing spec files

Follow ups

[Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba
From: Zorba Build Bot, 2012-04-25
Re: [Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba
From: Zorba Build Bot, 2012-04-25
[Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba
From: Zorba Build Bot, 2012-04-25
[Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba
From: Paul J. Lucas, 2012-04-25
[Merge] lp:~zorba-coders/zorba/feature-ft_module into lp:zorba
From: Paul J. Lucas, 2012-04-24