zorba-coders team mailing list archive

Thread
Date

[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba

To: mp+112811@xxxxxxxxxxxxxxxxxx
From: "Paul J. Lucas" <paul@xxxxxxxxxxxxx>
Date: Fri, 29 Jun 2012 16:45:23 -0000
Reply-to: mp+112811@xxxxxxxxxxxxxxxxxx
Sender: bounces@xxxxxxxxxxxxx

Paul J. Lucas has proposed merging lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba.

Requested reviews:
  Paul J. Lucas (paul-lucas)
Related bugs:
  Bug #1014999 in Zorba: "Implement full-text black/white feature"
  https://bugs.launchpad.net/zorba/+bug/1014999

For more details, see:
https://code.launchpad.net/~paul-lucas/zorba/feature-ft_bw/+merge/112811

Added tokenize-nodes() function.
-- 
https://code.launchpad.net/~paul-lucas/zorba/feature-ft_bw/+merge/112811
Your team Zorba Coders is subscribed to branch lp:zorba.

=== modified file 'ChangeLog'
--- ChangeLog	2012-06-29 13:25:20 +0000
+++ ChangeLog	2012-06-29 16:44:26 +0000
@@ -4,8 +4,10 @@
 version 2.x
 
 New Features:
+
   * Item::isSeekable API extension for streamable content (xs:string and xs:base64Binary).
   * Implemented the latest W3C specification for the group by clause
+  * Added ft:tokenize-nodes() function to full-text module
   * New XQuery 3.0 functions
     - fn:parse-xml-fragment#1
   * Added support for transient maps to the http://www.zorba-xquery.com/modules/store/data-structures/unordered-map module.

=== modified file 'include/zorba/tokenizer.h'
--- include/zorba/tokenizer.h	2012-06-28 04:14:03 +0000
+++ include/zorba/tokenizer.h	2012-06-29 16:44:26 +0000
@@ -79,7 +79,7 @@
 
     /**
      * This member-function is called whenever an item that is being tokenized
-     * is entered or exited.
+     * is entered or exited.  The default implementation does nothing.
      *
      * @param item The item being entered or exited.
      * @param entering If \c true, the item is being entered; if \c false, the

=== modified file 'modules/com/zorba-xquery/www/modules/full-text.xq'
--- modules/com/zorba-xquery/www/modules/full-text.xq	2012-06-28 04:14:03 +0000
+++ modules/com/zorba-xquery/www/modules/full-text.xq	2012-06-29 16:44:26 +0000
@@ -767,14 +767,14 @@
   as xs:string* external;
 
 (:~
- : Tokenizes the given node and all of its descendants.
+ : Tokenizes the given node and all of its decendants.
  :
  : @param $node The node to tokenize.
  : @param $lang The default
  : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
  : of <code>$node</code>.
  : @return a (possibly empty) sequence of tokens.
- : @error err:FTST0009 if <code>$lang</code> is not supported in general.
+ : @error err:FTST0009 if <code>$lang</code> is not supported.
  : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-1.xq
  :)
 declare function ft:tokenize-node( $node as node(), $lang as xs:language )
@@ -784,12 +784,11 @@
  : Tokenizes the given node and all of its descendants.
  :
  : @param $node The node to tokenize.
- : The document's default
+ : The node's default
  : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
  : is assumed to be the one returned by <code>ft:current-lang()</code>.
  : @return a (possibly empty) sequence of tokens.
- : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported in
- : general.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported.
  : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-2.xq
  : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-3.xq
  : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-node-4.xq
@@ -798,10 +797,47 @@
   as element(ft-schema:token)* external;
 
 (:~
+ : Tokenizes the set of nodes comprising <code>$includes</code> (and all of its
+ : descendants) but excluding <code>$excludes</code> (and all of its
+ : descendants), if any.
+ :
+ : @param $includes The set of nodes (and its descendants) to include.
+ : The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : is assumed to be the one returned by <code>ft:current-lang()</code>.
+ : @param $excludes The set of nodes (and its descendants) to exclude.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>ft:current-lang()</code> is not supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq
+ :)
+declare function ft:tokenize-nodes( $includes as node()+,
+                                    $excludes as node()* )
+  as element(ft-schema:token)* external;
+
+(:~
+ : Tokenizes the set of nodes comprising <code>$includes</code> (and all of its
+ : descendants) but excluding <code>$excludes</code> (and all of its
+ : descendants), if any.
+ :
+ : @param $includes The set of nodes (and its descendants) to include.
+ : @param $excludes The set of nodes (and its descendants) to exclude.
+ : @param $lang The default
+ : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
+ : for nodes.
+ : @return a (possibly empty) sequence of tokens.
+ : @error err:FTST0009 if <code>$lang</code> is not supported.
+ : @example test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq
+ :)
+declare function ft:tokenize-nodes( $includes as node()+,
+                                    $excludes as node()*,
+                                    $lang as xs:language )
+  as element(ft-schema:token)* external;
+
+(:~
  : Tokenizes the given string.
  :
  : @param $string The string to tokenize.
- : @param $lang The default
+ : @param $lang The
  : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
  : of <code>$string</code>.
  : @return a (possibly empty) sequence of tokens.
@@ -816,7 +852,7 @@
  : Tokenizes the given string.
  :
  : @param $string The string to tokenize.
- : The string's default
+ : The string's
  : <a href="http://www.w3.org/TR/xmlschema-2/#language";>language</a>
  : is assumed to be the one returned by <code>ft:current-lang()</code>.
  : @return a (possibly empty) sequence of tokens.

=== modified file 'src/functions/func_ft_module_impl.cpp'
--- src/functions/func_ft_module_impl.cpp	2012-06-28 04:14:03 +0000
+++ src/functions/func_ft_module_impl.cpp	2012-06-29 16:44:26 +0000
@@ -36,6 +36,17 @@
 }
 
 
+PlanIter_t full_text_tokenize_nodes::codegen(
+  CompilerCB*,
+  static_context* sctx,
+  const QueryLoc& loc,
+  std::vector<PlanIter_t>& argv,
+  expr& ann) const
+{
+  return new TokenizeNodesIterator(sctx, loc, argv);
+}
+
+
 PlanIter_t full_text_tokenizer_properties::codegen(
   CompilerCB*,
   static_context* sctx,
@@ -59,7 +70,6 @@
 
 #endif // ZORBA_NO_FULL_TEXT
 
-
 ///////////////////////////////////////////////////////////////////////////////
 
 void populate_context_ft_module_impl(static_context* sctx) 
@@ -105,6 +115,25 @@
                     tokenize_return_type),
                    FunctionConsts::FULL_TEXT_TOKENIZE_NODE_2);
   }
+  {
+    DECL_WITH_KIND(sctx,
+                   full_text_tokenize_nodes,
+                   (createQName( FT_MODULE_NS, "", "tokenize-nodes"),
+                    GENV_TYPESYSTEM.ANY_NODE_TYPE_PLUS,
+                    GENV_TYPESYSTEM.ANY_NODE_TYPE_STAR,
+                    tokenize_return_type),
+                   FunctionConsts::FULL_TEXT_TOKENIZE_NODES_2);
+  }
+  {
+    DECL_WITH_KIND(sctx,
+                   full_text_tokenize_nodes,
+                   (createQName( FT_MODULE_NS, "", "tokenize-nodes"),
+                    GENV_TYPESYSTEM.ANY_NODE_TYPE_PLUS,
+                    GENV_TYPESYSTEM.ANY_NODE_TYPE_STAR,
+                    GENV_TYPESYSTEM.LANGUAGE_TYPE_ONE,
+                    tokenize_return_type),
+                   FunctionConsts::FULL_TEXT_TOKENIZE_NODES_3);
+  }
 
   xqtref_t tokenizer_properties_return_type =
   GENV_TYPESYSTEM.create_node_type(store::StoreConsts::elementNode,
@@ -128,10 +157,10 @@
                     tokenizer_properties_return_type),
                    FunctionConsts::FULL_TEXT_TOKENIZER_PROPERTIES_1);
   }
-#endif // ZORBA_NO_FULL_TEXT
+#endif /* ZORBA_NO_FULL_TEXT */
 }
 
-
+///////////////////////////////////////////////////////////////////////////////
 
 } // namespace zorba
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/functions/func_ft_module_impl.h'
--- src/functions/func_ft_module_impl.h	2012-06-28 04:14:03 +0000
+++ src/functions/func_ft_module_impl.h	2012-06-29 16:44:26 +0000
@@ -49,6 +49,26 @@
 };
 
 
+//full-text:tokenize_nodes
+class full_text_tokenize_nodes : public function
+{
+public:
+  full_text_tokenize_nodes(const signature& sig,
+                           FunctionConsts::FunctionKind kind) : 
+    function(sig, kind)
+  {
+
+  }
+
+  // Mark the function as accessing the dyn ctx so that it won't be
+  // const-folded. We must prevent const-folding because the function
+  // uses the store to get access to the tokenizer provider.
+  bool accessesDynCtx() const { return true; }
+
+  CODEGEN_DECL();
+};
+
+
 //full-text:tokenizer-properties
 class full_text_tokenizer_properties : public function
 {

=== modified file 'src/functions/function_consts.h'
--- src/functions/function_consts.h	2012-06-28 04:14:03 +0000
+++ src/functions/function_consts.h	2012-06-29 16:44:26 +0000
@@ -238,7 +238,9 @@
   FULL_TEXT_TOKENIZER_PROPERTIES_0,
   FULL_TEXT_TOKENIZE_NODE_2,
   FULL_TEXT_TOKENIZE_NODE_1,
-#endif
+  FULL_TEXT_TOKENIZE_NODES_3,
+  FULL_TEXT_TOKENIZE_NODES_2,
+#endif /* ZORBA_NO_FULL_TEXT */
 
 #include "functions/function_enum.h"
 

=== modified file 'src/runtime/full_text/CMakeLists.txt'
--- src/runtime/full_text/CMakeLists.txt	2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/CMakeLists.txt	2012-06-29 16:44:26 +0000
@@ -41,6 +41,7 @@
     thesaurus.cpp
     tokenizer.cpp
     default_tokenizer.cpp
+    ft_module_util.cpp
     ft_module.cpp
     )
 

=== modified file 'src/runtime/full_text/apply.h'
--- src/runtime/full_text/apply.h	2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/apply.h	2012-06-29 16:44:26 +0000
@@ -24,6 +24,8 @@
 
 namespace zorba {
 
+///////////////////////////////////////////////////////////////////////////////
+
 void apply_ftand( ft_all_matches const&, ft_all_matches const&,
                   ft_all_matches &result );
 
@@ -52,6 +54,8 @@
 void apply_ftwindow( ft_all_matches const&, ft_int window_size, ft_unit::type,
                      ft_all_matches &result );
 
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace zorba
 #endif  /* ZORBA_FULL_TEXT_APPLY_H */
 /* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/ft_module_impl.cpp'
--- src/runtime/full_text/ft_module_impl.cpp	2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/ft_module_impl.cpp	2012-06-29 16:44:26 +0000
@@ -13,7 +13,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-#include "stdafx.h"
+
 #include <zorba/config.h>
 
 //
@@ -23,6 +23,8 @@
 //
 #ifndef ZORBA_NO_FULL_TEXT
 
+#include "stdafx.h"
+
 #include <limits>
 #include <typeinfo>
 
@@ -42,10 +44,12 @@
 #include "types/casting.h"
 #include "types/typeimpl.h"
 #include "types/typeops.h"
+#include "util/stl_util.h"
 #include "util/utf8_util.h"
 #include "zorbatypes/URI.h"
 #include "zorbautils/locale.h"
 
+#include "ft_module_util.h"
 #include "ft_stop_words_set.h"
 #include "ft_token_seq_iterator.h"
 #include "ft_util.h"
@@ -87,6 +91,85 @@
   );
 }
 
+static Tokenizer::ptr get_tokenizer( iso639_1::type lang,
+                                     Tokenizer::State *t_state,
+                                     QueryLoc const &loc ) {
+  TokenizerProvider const *const provider = GENV_STORE.getTokenizerProvider();
+  ZORBA_ASSERT( provider );
+  Tokenizer::ptr tokenizer;
+  if ( !provider->getTokenizer( lang, t_state, &tokenizer ) )
+    throw XQUERY_EXCEPTION(
+      err::FTST0009 /* lang not supported */,
+      ERROR_PARAMS(
+        iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
+      ),
+      ERROR_LOC( loc )
+    );
+  return std::move( tokenizer );
+}
+
+static void make_token_element( FTToken const &token,
+                                TokenQNames const &qnames,
+                                store::Item_t &result ) {
+  zstring base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
+  store::Item_t item, attr_node, node_name, type_name;
+  store::NsBindings const ns_bindings;
+  zstring value_string;
+
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+  node_name = qnames.token;
+  GENV_ITEMFACTORY->createElementNode(
+    result, nullptr, node_name, type_name, false, false,
+    ns_bindings, base_uri
+  );
+
+  if ( token.lang() ) {
+    value_string = iso639_1::string_of[ token.lang() ];
+    GENV_ITEMFACTORY->createString( item, value_string );
+    type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+    node_name = qnames.lang;
+    GENV_ITEMFACTORY->createAttributeNode(
+      attr_node, result, node_name, type_name, item
+    );
+  }
+
+  ztd::to_string( token.para(), &value_string );
+  GENV_ITEMFACTORY->createString( item, value_string );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+  node_name = qnames.paragraph;
+  GENV_ITEMFACTORY->createAttributeNode(
+    attr_node, result, node_name, type_name, item
+  );
+
+  ztd::to_string( token.sent(), &value_string );
+  GENV_ITEMFACTORY->createString( item, value_string );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+  node_name = qnames.sentence;
+  GENV_ITEMFACTORY->createAttributeNode(
+    attr_node, result, node_name, type_name, item
+  );
+
+  value_string = token.value();
+  GENV_ITEMFACTORY->createString( item, value_string );
+  type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+  node_name = qnames.value;
+  GENV_ITEMFACTORY->createAttributeNode(
+    attr_node, result, node_name, type_name, item
+  );
+
+  if ( store::Item const *const token_item = token.item() ) {
+    if ( GENV_STORE.getNodeReference( item, token_item ) ) {
+      item->getStringValue2( value_string );
+      GENV_ITEMFACTORY->createString( item, value_string );
+      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
+      node_name = qnames.node_ref;
+      GENV_ITEMFACTORY->createAttributeNode(
+        attr_node, result, node_name, type_name, item
+      );
+    }
+  }
+}
+
 ///////////////////////////////////////////////////////////////////////////////
 
 bool CurrentCompareOptionsIterator::nextImpl( store::Item_t &result,
@@ -296,10 +379,9 @@
   }
 
   try {
-    static_context const *const sctx = getStaticContext();
-    ZORBA_ASSERT( sctx );
     iso639_1::type const lang = get_lang_from( item, loc );
-
+    static_context const *const sctx = getStaticContext();
+    ZORBA_ASSERT( sctx );
     zstring error_msg;
     auto_ptr<internal::Resource> rsrc = sctx->resolve_uri(
       uri, internal::EntityData::THESAURUS, error_msg
@@ -369,7 +451,6 @@
   PlanIteratorState *state;
   DEFAULT_STACK_INIT( PlanIteratorState, state, plan_state );
 
-
   consumeNext( item, theChildren[0], plan_state );
   item->getStringValue2( word );
   utf8::to_lower( word );
@@ -535,45 +616,12 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-TokenizeNodeIterator::TokenizeNodeIterator( static_context *sctx,
-                                            QueryLoc const &loc,
-                                            std::vector<PlanIter_t>& children ):
-  NaryBaseIterator<TokenizeNodeIterator,TokenizeNodeIteratorState>(sctx, loc, children)
-{
-  initMembers();
-}
-
-void TokenizeNodeIterator::initMembers() {
-  GENV_ITEMFACTORY->createQName(
-    token_qname_, static_context::ZORBA_FULL_TEXT_FN_NS, "", "token" );
-
-  GENV_ITEMFACTORY->createQName(
-    lang_qname_, "", "", "lang" );
-
-  GENV_ITEMFACTORY->createQName(
-    para_qname_, "", "", "paragraph" );
-
-  GENV_ITEMFACTORY->createQName(
-    sent_qname_, "", "", "sentence" );
-
-  GENV_ITEMFACTORY->createQName(
-    value_qname_, "", "", "value" );
-
-  GENV_ITEMFACTORY->createQName(
-    ref_qname_, "", "", "node-ref" );
-}
-
 bool TokenizeNodeIterator::nextImpl( store::Item_t &result,
                                      PlanState &plan_state ) const {
-  store::Item_t node_name, attr_node;
-  zstring base_uri;
   store::Item_t item;
   iso639_1::type lang;
   Tokenizer::State t_state;
-  store::NsBindings const ns_bindings;
   TokenizerProvider const *tokenizer_provider;
-  store::Item_t type_name;
-  zstring value_string;
 
   TokenizeNodeIteratorState *state;
   DEFAULT_STACK_INIT( TokenizeNodeIteratorState, state, plan_state );
@@ -594,66 +642,11 @@
       state->doc_item_->getTokens( *tokenizer_provider, t_state, lang );
 
     while ( state->doc_tokens_->hasNext() ) {
-      FTToken const *token;
-      token = state->doc_tokens_->next();
-      ZORBA_ASSERT( token );
-
-      base_uri = static_context::ZORBA_FULL_TEXT_FN_NS;
-      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
-      node_name = token_qname_;
-      GENV_ITEMFACTORY->createElementNode(
-        result, nullptr, node_name, type_name, false, false,
-        ns_bindings, base_uri
-      );
-
-      if ( token->lang() ) {
-        value_string = iso639_1::string_of[ token->lang() ];
-        GENV_ITEMFACTORY->createString( item, value_string );
-        type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
-        node_name = lang_qname_;
-        GENV_ITEMFACTORY->createAttributeNode(
-          attr_node, result, node_name, type_name, item
-        );
-      }
-
-      ztd::to_string( token->para(), &value_string );
-      GENV_ITEMFACTORY->createString( item, value_string );
-      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
-      node_name = para_qname_;
-      GENV_ITEMFACTORY->createAttributeNode(
-        attr_node, result, node_name, type_name, item
-      );
-
-      ztd::to_string( token->sent(), &value_string );
-      GENV_ITEMFACTORY->createString( item, value_string );
-      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
-      node_name = sent_qname_;
-      GENV_ITEMFACTORY->createAttributeNode(
-        attr_node, result, node_name, type_name, item
-      );
-
-      value_string = token->value();
-      GENV_ITEMFACTORY->createString( item, value_string );
-      type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
-      node_name = value_qname_;
-      GENV_ITEMFACTORY->createAttributeNode(
-        attr_node, result, node_name, type_name, item
-      );
-
-      if ( store::Item const *const token_item = token->item() ) {
-        if ( GENV_STORE.getNodeReference( item, token_item ) ) {
-          item->getStringValue2( value_string );
-          GENV_ITEMFACTORY->createString( item, value_string );
-          type_name = GENV_TYPESYSTEM.XS_UNTYPED_QNAME;
-          node_name = ref_qname_;
-          GENV_ITEMFACTORY->createAttributeNode(
-            attr_node, result, node_name, type_name, item
-          );
-        }
-      }
-
+      make_token_element(
+        *state->doc_tokens_->next(), state->token_qnames_, result
+      );
       STACK_PUSH( true, state );
-    } // while
+    }
   }
 
   STACK_END( state );
@@ -669,12 +662,140 @@
   state->doc_tokens_->reset();
 }
 
-void TokenizeNodeIterator::serialize( serialization::Archiver &ar ) {
-  serialize_baseclass(
-    ar, (NaryBaseIterator<TokenizeNodeIterator,TokenizeNodeIteratorState>*)this
-  );
-  if ( !ar.is_serializing_out() )
-    initMembers();
+///////////////////////////////////////////////////////////////////////////////
+
+bool TokenizeNodesIterator::nextImpl( store::Item_t &result,
+                                      PlanState &plan_state ) const {
+  store::Item_t item;
+  iso639_1::type lang;
+  Tokenizer::State t_state;
+  Tokenizer::ptr tokenizer;
+
+  TokenizeNodesIteratorState *state;
+  DEFAULT_STACK_INIT( TokenizeNodesIteratorState, state, plan_state );
+
+  if ( theChildren.size() > 2 ) {
+    consumeNext( item, theChildren[2], plan_state );
+    lang = get_lang_from( item, loc );
+  } else {
+    static_context const *const sctx = getStaticContext();
+    ZORBA_ASSERT( sctx );
+    lang = get_lang_from( sctx );
+  }
+
+  tokenizer = get_tokenizer( lang, &state->t_state_, loc );
+
+  // $includes
+  while ( consumeNext( item, theChildren[0], plan_state ) )
+    state->includes_.push_back( item );
+  state->includes_.push_back( store::Item_t() );  // sentinel
+
+  // $excludes
+  while ( consumeNext( item, theChildren[1], plan_state ) ) {
+    store::Item_t exc_si;
+    GENV_STORE.getStructuralInformation( exc_si, item.getp() );
+    state->excludes_.push_back( exc_si );
+  }
+
+  state->callback_.set_tokens( state->tokens_ );
+  state->langs_.push( lang );
+  state->tokenizers_.push( tokenizer.release() );
+
+  while ( true ) {
+    if ( state->tokens_.empty() ) {
+      if ( state->includes_.empty() )
+        break;
+
+      store::Item_t inc( state->includes_.front() );
+      state->includes_.pop_front();
+      if ( inc.isNull() ) {             // sentinel
+        state->langs_.pop();
+        Tokenizer::ptr deleter( ztd::pop_stack( state->tokenizers_ ) );
+        continue;
+      }
+
+      store::Item_t inc_si;
+      GENV_STORE.getStructuralInformation( inc_si, inc.getp() );
+      bool excluded = false;
+      FOR_EACH( vector<store::Item_t>, exc, state->excludes_ ) {
+        if ( inc_si->equals( *exc ) || (*exc)->isInSubtreeOf( inc_si ) ) {
+          excluded = true;
+          break;
+        }
+      }
+      if ( excluded )
+        continue;
+
+      bool add_sentinel = false;
+      switch ( inc->getNodeKind() ) {
+        case store::StoreConsts::elementNode:
+          ++state->t_state_.para;
+          if ( find_lang_attribute( *inc, &lang ) ) {
+            state->langs_.push( lang );
+            tokenizer = get_tokenizer( lang, &state->t_state_, loc );
+            state->tokenizers_.push( tokenizer.release() );
+            add_sentinel = true;
+          }
+          // no break;
+        case store::StoreConsts::documentNode: {
+          list<store::Item_t>::iterator pos = state->includes_.begin();
+          store::Iterator_t i = inc->getChildren();
+          i->open();
+          for ( store::Item_t child; i->next( child ); ) {
+            switch ( child->getNodeKind() ) {
+              case store::StoreConsts::attributeNode:
+              case store::StoreConsts::commentNode:
+              case store::StoreConsts::piNode:
+                continue;               // never include these implicitly
+              default:
+                pos = state->includes_.insert( pos, child );
+                ++pos;
+            }
+          }
+          i->close();
+          if ( add_sentinel )           // sentinel
+            state->includes_.insert( pos, store::Item_t() );
+          continue;
+        }
+
+        case store::StoreConsts::attributeNode:
+        case store::StoreConsts::commentNode:
+        case store::StoreConsts::piNode:
+          // tokenize these because they were included explicitly
+        case store::StoreConsts::textNode: {
+          zstring const s( inc->getStringValue() );
+          Item const temp( inc.getp() );
+          state->tokenizers_.top()->tokenize_string(
+            s.data(), s.size(), state->langs_.top(), false, state->callback_,
+            &temp
+          );
+          break;
+        }
+
+        default:
+          break;
+      } // switch
+      continue;
+    } // if ( state->tokens_.empty() )
+
+    make_token_element(
+      state->tokens_.front(), state->token_qnames_, result
+    );
+    state->tokens_.pop_front();
+    STACK_PUSH( true, state );
+  } // while
+
+  STACK_END( state );
+}
+
+void TokenizeNodesIterator::resetImpl( PlanState &plan_state ) const {
+  NaryBaseIterator<TokenizeNodesIterator,TokenizeNodesIteratorState>::
+    resetImpl( plan_state );
+  TokenizeNodesIteratorState *const state =
+    StateTraitsImpl<TokenizeNodesIteratorState>::getState(
+      plan_state, this->theStateOffset
+    );
+  state->doc_tokens_->reset();
 }
 
 ///////////////////////////////////////////////////////////////////////////////
@@ -689,7 +810,6 @@
   Tokenizer::ptr tokenizer;
   store::Item_t type_name;
   Tokenizer::Properties props;
-  TokenizerProvider const *tokenizer_provider;
   zstring value_string;
 
   PlanIteratorState *state;
@@ -704,15 +824,7 @@
     lang = get_lang_from( sctx );
   }
 
-  tokenizer_provider = GENV_STORE.getTokenizerProvider();
-  ZORBA_ASSERT( tokenizer_provider );
-  if ( !tokenizer_provider->getTokenizer( lang, &t_state, &tokenizer ) )
-    throw XQUERY_EXCEPTION(
-      err::FTST0009 /* lang not supported */,
-      ERROR_PARAMS(
-        iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
-      )
-    );
+  tokenizer = get_tokenizer( lang, &t_state, loc );
   tokenizer->properties( &props );
 
   GENV_ITEMFACTORY->createQName(
@@ -840,19 +952,8 @@
     }
 
     { // local scope
-    TokenizerProvider const *const tokenizer_provider =
-      GENV_STORE.getTokenizerProvider();
-    ZORBA_ASSERT( tokenizer_provider );
     Tokenizer::State t_state;
-    Tokenizer::ptr tokenizer;
-    if ( !tokenizer_provider->getTokenizer( lang, &t_state, &tokenizer ) )
-      throw XQUERY_EXCEPTION(
-        err::FTST0009 /* lang not supported */,
-        ERROR_PARAMS(
-          iso639_1::string_of[ lang ], ZED( FTST0009_BadTokenizerLang )
-        )
-      );
-
+    Tokenizer::ptr const tokenizer( get_tokenizer( lang, &t_state, loc ) );
     TokenizeStringIteratorCallback callback;
     tokenizer->tokenize_string(
       value_string.data(), value_string.size(), lang, false, callback

=== added file 'src/runtime/full_text/ft_module_util.cpp'
--- src/runtime/full_text/ft_module_util.cpp	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_util.cpp	2012-06-29 16:44:26 +0000
@@ -0,0 +1,57 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "api/unmarshaller.h"
+#include "context/static_context.h"
+#include "store/api/item_factory.h"
+#include "system/globalenv.h"
+
+#include "ft_module_util.h"
+
+using namespace std;
+using namespace zorba::locale;
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+void TokenizeNodesCallback::token( char const *utf8_s, size_type utf8_len,
+                                   iso639_1::type lang, size_type token_no,
+                                   size_type sent_no, size_type para_no,
+                                   Item const *api_item ) {
+  store::Item const *const item = Unmarshaller::getInternalItem( *api_item );
+  tokens_->push_back(
+    FTToken( utf8_s, utf8_len, token_no, sent_no, para_no, item )
+  );
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+TokenQNames::TokenQNames() {
+  GENV_ITEMFACTORY->createQName(
+    token, static_context::ZORBA_FULL_TEXT_FN_NS, "", "token"
+  );
+  GENV_ITEMFACTORY->createQName( lang, "", "", "lang" );
+  GENV_ITEMFACTORY->createQName( paragraph, "", "", "paragraph" );
+  GENV_ITEMFACTORY->createQName( sentence, "", "", "sentence" );
+  GENV_ITEMFACTORY->createQName( value, "", "", "value" );
+  GENV_ITEMFACTORY->createQName( node_ref, "", "", "node-ref" );
+}
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+/* vim:set et sw=2 ts=2: */

=== added file 'src/runtime/full_text/ft_module_util.h'
--- src/runtime/full_text/ft_module_util.h	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/ft_module_util.h	2012-06-29 16:44:26 +0000
@@ -0,0 +1,80 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ * 
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef ZORBA_FT_MODULE_UTIL_H
+#define ZORBA_FT_MODULE_UTIL_H
+
+//
+// The reason this header (and related .cpp) are necessary (instead of just
+// puting this code into ft_module.h/.cpp directly) is because this header
+// needs to be #include'd into the .cpp generated from the ft_module.xml file.
+//
+
+#include <zorba/tokenizer.h>
+
+#include <deque>
+
+#include "store/api/item.h"
+#include "util/cxx_util.h"
+#include "zorbatypes/ft_token.h"
+
+#include "ft_module_util.h"
+
+namespace zorba {
+
+///////////////////////////////////////////////////////////////////////////////
+
+/**
+ * A %TokenizeNodesCallback is-a Tokenizer::Callback that's used exclusively by
+ * the TokenizeNodesIterator that implements the ft:tokenize-nodes() full-text
+ * module function.
+ */
+class TokenizeNodesCallback : public Tokenizer::Callback {
+public:
+  TokenizeNodesCallback() : tokens_( nullptr ) { }
+  TokenizeNodesCallback( std::deque<FTToken> &tokens ) : tokens_( &tokens ) { }
+
+  void set_tokens( std::deque<FTToken> &tokens ) {
+    tokens_ = &tokens;
+  }
+
+  // inherited
+  void token( char const *utf8_s, size_type utf8_len,
+              locale::iso639_1::type lang, size_type token_no,
+              size_type sent_no, size_type para_no, Item const *item = 0 );
+
+private:
+  std::deque<FTToken> *tokens_;
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
+struct TokenQNames {
+  store::Item_t token;
+  store::Item_t lang;
+  store::Item_t paragraph;
+  store::Item_t sentence;
+  store::Item_t value;
+  store::Item_t node_ref;
+
+  TokenQNames();
+};
+
+///////////////////////////////////////////////////////////////////////////////
+
+} // namespace zorba
+#endif /* ZORBA_FT_MODULE_UTIL_H */
+/* vim:set et sw=2 ts=2: */

=== modified file 'src/runtime/full_text/ft_util.cpp'
--- src/runtime/full_text/ft_util.cpp	2012-04-27 17:07:47 +0000
+++ src/runtime/full_text/ft_util.cpp	2012-06-29 16:44:26 +0000
@@ -19,14 +19,38 @@
 #include <stdexcept>
 
 #include "diagnostics/xquery_diagnostics.h"
+#include "zorbamisc/ns_consts.h"
 #include "zorbatypes/numconversions.h"
+#include "zorbautils/locale.h"
 
 #include "ft_util.h"
 
+using namespace zorba::locale;
+
 namespace zorba {
 
 ///////////////////////////////////////////////////////////////////////////////
 
+bool find_lang_attribute( store::Item const &item, iso639_1::type *lang ) {
+  bool found_lang = false;
+  if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
+    store::Iterator_t i( item.getAttributes() );
+    i->open();
+    for ( store::Item_t attr; i->next( attr ); ) {
+      store::Item const *const qname = attr->getNodeName();
+      if ( qname &&
+           qname->getLocalName() == "lang" &&
+           qname->getNamespace() == XML_NS ) {
+        *lang = locale::find_lang( attr->getStringValue().c_str() );
+        found_lang = true;
+        break;
+      }
+    }
+    i->close();
+  }
+  return found_lang;
+}
+
 ft_int to_ft_int( xs_integer const &i ) {
   try {
     return to_xs_unsignedInt( i );

=== modified file 'src/runtime/full_text/ft_util.h'
--- src/runtime/full_text/ft_util.h	2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/ft_util.h	2012-06-29 16:44:26 +0000
@@ -17,11 +17,13 @@
 #ifndef ZORBA_FULL_TEXT_UTIL_H
 #define ZORBA_FULL_TEXT_UTIL_H
 
+#include <zorba/item.h>
 #include <zorba/locale.h>
 
 #include "compiler/expression/ftnode.h"
+#include "store/api/item.h"
+#include "util/cxx_util.h"
 #include "zorbatypes/schema_types.h"
-#include "util/cxx_util.h"
 
 #include "ft_match.h"
 
@@ -44,6 +46,16 @@
 ////////// Functions //////////////////////////////////////////////////////////
 
 /**
+ * TODO
+ *
+ * @param item TODO
+ * @param lang TODO
+ * @return Returns \c true only if TODO
+ */
+bool find_lang_attribute( store::Item const &item,
+                          locale::iso639_1::type *lang );
+
+/**
  * Gets the language from the given ftmatch_options, if any.
  *
  * @param options The ftmatch_options to get the language from.  This may be \c
@@ -98,6 +110,8 @@
  */
 ft_int to_ft_int( xs_integer const &i );
 
+///////////////////////////////////////////////////////////////////////////////
+
 } // namespace zorba
 #endif /* ZORBA_FULL_TEXT_UTIL_H */
 /* vim:set et sw=2 ts=2: */

=== added file 'src/runtime/full_text/pregenerated/ft_module.cpp'
--- src/runtime/full_text/pregenerated/ft_module.cpp	1970-01-01 00:00:00 +0000
+++ src/runtime/full_text/pregenerated/ft_module.cpp	2012-06-29 16:44:26 +0000
@@ -0,0 +1,506 @@
+/*
+ * Copyright 2006-2008 The FLWOR Foundation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+ 
+// ******************************************
+// *                                        *
+// * THIS IS A GENERATED FILE. DO NOT EDIT! *
+// * SEE .xml FILE WITH SAME NAME           *
+// *                                        *
+// ******************************************
+
+#include "stdafx.h"
+#include "zorbatypes/rchandle.h"
+#include "zorbatypes/zstring.h"
+#include "runtime/visitors/planiter_visitor.h"
+#include "runtime/full_text/ft_module.h"
+#include "system/globalenv.h"
+
+
+#include "store/api/iterator.h"
+
+namespace zorba {
+
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentCompareOptionsIterator>
+SERIALIZABLE_CLASS_VERSIONS(CurrentCompareOptionsIterator)
+
+void CurrentCompareOptionsIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<CurrentCompareOptionsIterator, PlanIteratorState>*)this);
+}
+
+
+void CurrentCompareOptionsIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+CurrentCompareOptionsIterator::~CurrentCompareOptionsIterator() {}
+
+// </CurrentCompareOptionsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <CurrentLangIterator>
+SERIALIZABLE_CLASS_VERSIONS(CurrentLangIterator)
+
+void CurrentLangIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
+}
+
+
+void CurrentLangIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+CurrentLangIterator::~CurrentLangIterator() {}
+
+// </CurrentLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <HostLangIterator>
+SERIALIZABLE_CLASS_VERSIONS(HostLangIterator)
+
+void HostLangIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
+}
+
+
+void HostLangIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+HostLangIterator::~HostLangIterator() {}
+
+// </HostLangIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStemLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsStemLangSupportedIterator)
+
+void IsStemLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
+
+// </IsStemLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsStopWordIterator)
+
+void IsStopWordIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
+}
+
+
+void IsStopWordIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsStopWordIterator::~IsStopWordIterator() {}
+
+// </IsStopWordIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsStopWordLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsStopWordLangSupportedIterator)
+
+void IsStopWordLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
+
+// </IsStopWordLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsThesaurusLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsThesaurusLangSupportedIterator)
+
+void IsThesaurusLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
+
+// </IsThesaurusLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <IsTokenizerLangSupportedIterator>
+SERIALIZABLE_CLASS_VERSIONS(IsTokenizerLangSupportedIterator)
+
+void IsTokenizerLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
+}
+
+
+void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
+
+// </IsTokenizerLangSupportedIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StemIterator>
+SERIALIZABLE_CLASS_VERSIONS(StemIterator)
+
+void StemIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
+}
+
+
+void StemIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+StemIterator::~StemIterator() {}
+
+// </StemIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <StripDiacriticsIterator>
+SERIALIZABLE_CLASS_VERSIONS(StripDiacriticsIterator)
+
+void StripDiacriticsIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
+}
+
+
+void StripDiacriticsIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+StripDiacriticsIterator::~StripDiacriticsIterator() {}
+
+// </StripDiacriticsIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <ThesaurusLookupIterator>
+SERIALIZABLE_CLASS_VERSIONS(ThesaurusLookupIterator)
+
+void ThesaurusLookupIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
+}
+
+
+void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
+
+ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
+
+ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
+
+
+void ThesaurusLookupIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </ThesaurusLookupIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeNodeIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizeNodeIterator)
+
+void TokenizeNodeIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>*)this);
+}
+
+
+void TokenizeNodeIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizeNodeIterator::~TokenizeNodeIterator() {}
+
+TokenizeNodeIteratorState::TokenizeNodeIteratorState() {}
+
+TokenizeNodeIteratorState::~TokenizeNodeIteratorState() {}
+
+
+void TokenizeNodeIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </TokenizeNodeIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeNodesIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizeNodesIterator)
+
+void TokenizeNodesIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>*)this);
+}
+
+
+void TokenizeNodesIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizeNodesIterator::~TokenizeNodesIterator() {}
+
+TokenizeNodesIteratorState::TokenizeNodesIteratorState() {}
+
+TokenizeNodesIteratorState::~TokenizeNodesIteratorState() {}
+
+
+void TokenizeNodesIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </TokenizeNodesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizerPropertiesIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizerPropertiesIterator)
+
+void TokenizerPropertiesIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
+}
+
+
+void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
+
+// </TokenizerPropertiesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeStringIterator>
+SERIALIZABLE_CLASS_VERSIONS(TokenizeStringIterator)
+
+void TokenizeStringIterator::serialize(::zorba::serialization::Archiver& ar)
+{
+  serialize_baseclass(ar,
+  (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
+}
+
+
+void TokenizeStringIterator::accept(PlanIterVisitor& v) const
+{
+  v.beginVisit(*this);
+
+  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
+  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
+  for ( ; lIter != lEnd; ++lIter ){
+    (*lIter)->accept(v);
+  }
+
+  v.endVisit(*this);
+}
+
+TokenizeStringIterator::~TokenizeStringIterator() {}
+
+TokenizeStringIteratorState::TokenizeStringIteratorState() {}
+
+TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
+
+
+void TokenizeStringIteratorState::reset(PlanState& planState) {
+  PlanIteratorState::reset(planState);
+}
+// </TokenizeStringIterator>
+
+#endif
+
+}
+
+

=== removed file 'src/runtime/full_text/pregenerated/ft_module.cpp'
--- src/runtime/full_text/pregenerated/ft_module.cpp	2012-05-22 19:09:20 +0000
+++ src/runtime/full_text/pregenerated/ft_module.cpp	1970-01-01 00:00:00 +0000
@@ -1,463 +0,0 @@
-/*
- * Copyright 2006-2008 The FLWOR Foundation.
- *
- * Licensed under the Apache License, Version 2.0 (the "License");
- * you may not use this file except in compliance with the License.
- * You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
- 
-// ******************************************
-// *                                        *
-// * THIS IS A GENERATED FILE. DO NOT EDIT! *
-// * SEE .xml FILE WITH SAME NAME           *
-// *                                        *
-// ******************************************
-
-#include "stdafx.h"
-#include "zorbatypes/rchandle.h"
-#include "zorbatypes/zstring.h"
-#include "runtime/visitors/planiter_visitor.h"
-#include "runtime/full_text/ft_module.h"
-#include "system/globalenv.h"
-
-
-#include "store/api/iterator.h"
-
-namespace zorba {
-
-#ifndef ZORBA_NO_FULL_TEXT
-// <CurrentCompareOptionsIterator>
-SERIALIZABLE_CLASS_VERSIONS(CurrentCompareOptionsIterator)
-
-void CurrentCompareOptionsIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<CurrentCompareOptionsIterator, PlanIteratorState>*)this);
-}
-
-
-void CurrentCompareOptionsIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-CurrentCompareOptionsIterator::~CurrentCompareOptionsIterator() {}
-
-// </CurrentCompareOptionsIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <CurrentLangIterator>
-SERIALIZABLE_CLASS_VERSIONS(CurrentLangIterator)
-
-void CurrentLangIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<CurrentLangIterator, PlanIteratorState>*)this);
-}
-
-
-void CurrentLangIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-CurrentLangIterator::~CurrentLangIterator() {}
-
-// </CurrentLangIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <HostLangIterator>
-SERIALIZABLE_CLASS_VERSIONS(HostLangIterator)
-
-void HostLangIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<HostLangIterator, PlanIteratorState>*)this);
-}
-
-
-void HostLangIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-HostLangIterator::~HostLangIterator() {}
-
-// </HostLangIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsStemLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsStemLangSupportedIterator)
-
-void IsStemLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<IsStemLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsStemLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-IsStemLangSupportedIterator::~IsStemLangSupportedIterator() {}
-
-// </IsStemLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsStopWordIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsStopWordIterator)
-
-void IsStopWordIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<IsStopWordIterator, PlanIteratorState>*)this);
-}
-
-
-void IsStopWordIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-IsStopWordIterator::~IsStopWordIterator() {}
-
-// </IsStopWordIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsStopWordLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsStopWordLangSupportedIterator)
-
-void IsStopWordLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<IsStopWordLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsStopWordLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-IsStopWordLangSupportedIterator::~IsStopWordLangSupportedIterator() {}
-
-// </IsStopWordLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsThesaurusLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsThesaurusLangSupportedIterator)
-
-void IsThesaurusLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<IsThesaurusLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsThesaurusLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-IsThesaurusLangSupportedIterator::~IsThesaurusLangSupportedIterator() {}
-
-// </IsThesaurusLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <IsTokenizerLangSupportedIterator>
-SERIALIZABLE_CLASS_VERSIONS(IsTokenizerLangSupportedIterator)
-
-void IsTokenizerLangSupportedIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<IsTokenizerLangSupportedIterator, PlanIteratorState>*)this);
-}
-
-
-void IsTokenizerLangSupportedIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-IsTokenizerLangSupportedIterator::~IsTokenizerLangSupportedIterator() {}
-
-// </IsTokenizerLangSupportedIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <StemIterator>
-SERIALIZABLE_CLASS_VERSIONS(StemIterator)
-
-void StemIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<StemIterator, PlanIteratorState>*)this);
-}
-
-
-void StemIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-StemIterator::~StemIterator() {}
-
-// </StemIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <StripDiacriticsIterator>
-SERIALIZABLE_CLASS_VERSIONS(StripDiacriticsIterator)
-
-void StripDiacriticsIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<StripDiacriticsIterator, PlanIteratorState>*)this);
-}
-
-
-void StripDiacriticsIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-StripDiacriticsIterator::~StripDiacriticsIterator() {}
-
-// </StripDiacriticsIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <ThesaurusLookupIterator>
-SERIALIZABLE_CLASS_VERSIONS(ThesaurusLookupIterator)
-
-void ThesaurusLookupIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<ThesaurusLookupIterator, ThesaurusLookupIteratorState>*)this);
-}
-
-
-void ThesaurusLookupIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-ThesaurusLookupIterator::~ThesaurusLookupIterator() {}
-
-ThesaurusLookupIteratorState::ThesaurusLookupIteratorState() {}
-
-ThesaurusLookupIteratorState::~ThesaurusLookupIteratorState() {}
-
-
-void ThesaurusLookupIteratorState::reset(PlanState& planState) {
-  PlanIteratorState::reset(planState);
-}
-// </ThesaurusLookupIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <TokenizeNodeIterator>
-SERIALIZABLE_CLASS_VERSIONS(TokenizeNodeIterator)
-
-
-void TokenizeNodeIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-TokenizeNodeIterator::~TokenizeNodeIterator() {}
-
-TokenizeNodeIteratorState::TokenizeNodeIteratorState() {}
-
-TokenizeNodeIteratorState::~TokenizeNodeIteratorState() {}
-
-
-void TokenizeNodeIteratorState::reset(PlanState& planState) {
-  PlanIteratorState::reset(planState);
-}
-// </TokenizeNodeIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <TokenizerPropertiesIterator>
-SERIALIZABLE_CLASS_VERSIONS(TokenizerPropertiesIterator)
-
-void TokenizerPropertiesIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<TokenizerPropertiesIterator, PlanIteratorState>*)this);
-}
-
-
-void TokenizerPropertiesIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-TokenizerPropertiesIterator::~TokenizerPropertiesIterator() {}
-
-// </TokenizerPropertiesIterator>
-
-#endif
-#ifndef ZORBA_NO_FULL_TEXT
-// <TokenizeStringIterator>
-SERIALIZABLE_CLASS_VERSIONS(TokenizeStringIterator)
-
-void TokenizeStringIterator::serialize(::zorba::serialization::Archiver& ar)
-{
-  serialize_baseclass(ar,
-  (NaryBaseIterator<TokenizeStringIterator, TokenizeStringIteratorState>*)this);
-}
-
-
-void TokenizeStringIterator::accept(PlanIterVisitor& v) const
-{
-  v.beginVisit(*this);
-
-  std::vector<PlanIter_t>::const_iterator lIter = theChildren.begin();
-  std::vector<PlanIter_t>::const_iterator lEnd = theChildren.end();
-  for ( ; lIter != lEnd; ++lIter ){
-    (*lIter)->accept(v);
-  }
-
-  v.endVisit(*this);
-}
-
-TokenizeStringIterator::~TokenizeStringIterator() {}
-
-TokenizeStringIteratorState::TokenizeStringIteratorState() {}
-
-TokenizeStringIteratorState::~TokenizeStringIteratorState() {}
-
-
-void TokenizeStringIteratorState::reset(PlanState& planState) {
-  PlanIteratorState::reset(planState);
-}
-// </TokenizeStringIterator>
-
-#endif
-
-}
-
-

=== modified file 'src/runtime/full_text/pregenerated/ft_module.h'
--- src/runtime/full_text/pregenerated/ft_module.h	2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/pregenerated/ft_module.h	2012-06-29 16:44:26 +0000
@@ -29,6 +29,11 @@
 
 
 #include "runtime/base/narybase.h"
+#include <deque>
+#include <list>
+#include <stack>
+#include <vector>
+#include "runtime/full_text/ft_module_util.h"
 #include "runtime/full_text/ft_token_seq_iterator.h"
 #include "runtime/full_text/thesaurus.h"
 
@@ -416,6 +421,7 @@
 public:
   store::Item_t doc_item_; //
   FTTokenIterator_t doc_tokens_; //
+  TokenQNames token_qnames_; //
 
   TokenizeNodeIteratorState();
 
@@ -426,13 +432,6 @@
 
 class TokenizeNodeIterator : public NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>
 { 
-protected:
-  store::Item_t token_qname_; //
-  store::Item_t lang_qname_; //
-  store::Item_t para_qname_; //
-  store::Item_t sent_qname_; //
-  store::Item_t value_qname_; //
-  store::Item_t ref_qname_; //
 public:
   SERIALIZABLE_CLASS(TokenizeNodeIterator);
 
@@ -445,12 +444,67 @@
     static_context* sctx,
     const QueryLoc& loc,
     std::vector<PlanIter_t>& children)
-    ;
+    : 
+    NaryBaseIterator<TokenizeNodeIterator, TokenizeNodeIteratorState>(sctx, loc, children)
+  {}
 
   virtual ~TokenizeNodeIterator();
 
-public:
-  void initMembers();
+  void accept(PlanIterVisitor& v) const;
+
+  bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;
+
+  void resetImpl(PlanState&) const;
+};
+
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
+/**
+ * 
+ * Author: 
+ */
+class TokenizeNodesIteratorState : public PlanIteratorState
+{
+public:
+  store::Item_t doc_item_; //
+  FTTokenIterator_t doc_tokens_; //
+  TokenQNames token_qnames_; //
+  std::list<store::Item_t> includes_; //
+  std::vector<store::Item_t> excludes_; //
+  std::stack<Tokenizer*> tokenizers_; //
+  std::stack<locale::iso639_1::type> langs_; //
+  TokenizeNodesCallback callback_; //
+  Tokenizer::State t_state_; //
+  std::deque<FTToken> tokens_; //
+
+  TokenizeNodesIteratorState();
+
+  ~TokenizeNodesIteratorState();
+
+  void reset(PlanState&);
+};
+
+class TokenizeNodesIterator : public NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>
+{ 
+public:
+  SERIALIZABLE_CLASS(TokenizeNodesIterator);
+
+  SERIALIZABLE_CLASS_CONSTRUCTOR2T(TokenizeNodesIterator,
+    NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>);
+
+  void serialize( ::zorba::serialization::Archiver& ar);
+
+  TokenizeNodesIterator(
+    static_context* sctx,
+    const QueryLoc& loc,
+    std::vector<PlanIter_t>& children)
+    : 
+    NaryBaseIterator<TokenizeNodesIterator, TokenizeNodesIteratorState>(sctx, loc, children)
+  {}
+
+  virtual ~TokenizeNodesIterator();
+
   void accept(PlanIterVisitor& v) const;
 
   bool nextImpl(store::Item_t& result, PlanState& aPlanState) const;

=== modified file 'src/runtime/full_text/tokenizer.cpp'
--- src/runtime/full_text/tokenizer.cpp	2012-06-28 04:14:03 +0000
+++ src/runtime/full_text/tokenizer.cpp	2012-06-29 16:44:26 +0000
@@ -21,12 +21,15 @@
 #include <zorba/tokenizer.h>
 #include <zorba/zorba_string.h>
 
+#include "api/unmarshaller.h"
 #include "diagnostics/assert.h"
 #include "store/api/store.h"
 #include "system/globalenv.h"
 #include "zorbamisc/ns_consts.h"
 #include "zorbautils/locale.h"
 
+#include "ft_util.h"
+
 using namespace zorba::locale;
 
 namespace zorba {
@@ -38,22 +41,9 @@
 }
 
 bool Tokenizer::find_lang_attribute( Item const &item, iso639_1::type *lang ) {
-  bool found_lang = false;
-  if ( item.getNodeKind() == store::StoreConsts::elementNode ) {
-    Iterator_t i( item.getAttributes() );
-    i->open();
-    for ( Item attr; i->next( attr ); ) {
-      Item qname;
-      if ( attr.getNodeName( qname ) &&
-          qname.getLocalName() == "lang" && qname.getNamespace() == XML_NS ) {
-        *lang = locale::find_lang( attr.getStringValue().c_str() );
-        found_lang = true;
-        break;
-      }
-    }
-    i->close();
-  }
-  return found_lang;
+  return zorba::find_lang_attribute(
+    *Unmarshaller::getInternalItem( item ), lang
+  );
 }
 
 void Tokenizer::item( Item const &item, bool entering ) {

=== modified file 'src/runtime/json/jsonml_array.cpp'
--- src/runtime/json/jsonml_array.cpp	2012-06-28 04:14:03 +0000
+++ src/runtime/json/jsonml_array.cpp	2012-06-29 16:44:26 +0000
@@ -30,6 +30,7 @@
 #include "util/omanip.h"
 #include "util/oseparator.h"
 #include "util/stl_util.h"
+#include "util/xml_util.h"
 
 #include "jsonml_array.h"
 
@@ -39,20 +40,12 @@
 
 ///////////////////////////////////////////////////////////////////////////////
 
-static void split_name( zstring const &name, zstring *prefix, zstring *local ) {
-  zstring::size_type const colon = name.find( ':' );
-  if ( colon != zstring::npos ) {
-    *prefix = name.substr( 0, colon );
-    *local = name.substr( colon + 1 );
-    if ( prefix->empty() || local->empty() )
-      throw XQUERY_EXCEPTION(
-        zerr::ZJPE0008_ILLEGAL_QNAME,
-        ERROR_PARAMS( name )
-      );
-  } else {
-    prefix->clear();
-    *local = name;
-  }
+inline void split_name( zstring const &name, zstring *prefix, zstring *local ) {
+  if ( !xml::split_name( name, prefix, local ) )
+    throw XQUERY_EXCEPTION(
+      zerr::ZJPE0008_ILLEGAL_QNAME,
+      ERROR_PARAMS( name )
+    );
 }
 
 namespace expect {

=== modified file 'src/runtime/pregenerated/iterator_enum.h'
--- src/runtime/pregenerated/iterator_enum.h	2012-06-28 21:54:08 +0000
+++ src/runtime/pregenerated/iterator_enum.h	2012-06-29 16:44:26 +0000
@@ -114,6 +114,7 @@
   TYPE_StripDiacriticsIterator,
   TYPE_ThesaurusLookupIterator,
   TYPE_TokenizeNodeIterator,
+  TYPE_TokenizeNodesIterator,
   TYPE_TokenizerPropertiesIterator,
   TYPE_TokenizeStringIterator,
   TYPE_FunctionNameIterator,

=== modified file 'src/runtime/spec/full_text/ft_module.xml'
--- src/runtime/spec/full_text/ft_module.xml	2012-06-28 04:14:03 +0000
+++ src/runtime/spec/full_text/ft_module.xml	2012-06-29 16:44:26 +0000
@@ -6,6 +6,12 @@
   xsi:schemaLocation="http://www.zorba-xquery.com ../runtime.xsd">
 
 <zorba:header>
+  <zorba:include form="Angle-bracket">deque</zorba:include>
+  <zorba:include form="Angle-bracket">list</zorba:include>
+  <zorba:include form="Angle-bracket">stack</zorba:include>
+  <zorba:include form="Angle-bracket">vector</zorba:include>
+  <zorba:include form="Angle-brakcet">zorba/locale.h</zorba:include>
+  <zorba:include form="Quoted">runtime/full_text/ft_module_util.h</zorba:include>
   <zorba:include form="Quoted">runtime/full_text/ft_token_seq_iterator.h</zorba:include>
   <zorba:include form="Quoted">runtime/full_text/thesaurus.h</zorba:include>
 </zorba:header>
@@ -14,6 +20,8 @@
   <zorba:include form="Quoted">store/api/iterator.h</zorba:include>
 </zorba:source>
 
+<!--========================================================================-->
+
 <zorba:iterator name="CurrentCompareOptionsIterator"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
 </zorba:iterator>
@@ -27,6 +35,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="HostLangIterator"
                       preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -36,6 +46,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="IsStemLangSupportedIterator"
                       preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -46,6 +58,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="IsStopWordIterator"
                       preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -61,6 +75,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="IsStopWordLangSupportedIterator"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -71,6 +87,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="IsThesaurusLangSupportedIterator"
                       preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -86,6 +104,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="IsTokenizerLangSupportedIterator"
                       preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -96,6 +116,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="StemIterator"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -111,6 +133,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="StripDiacriticsIterator"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
   <zorba:function>
@@ -121,6 +145,8 @@
   </zorba:function>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="ThesaurusLookupIterator"
                 generateResetImpl="true"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
@@ -167,56 +193,69 @@
   </zorba:state>
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="TokenizeNodeIterator"
                 generateResetImpl="true"
-                generateSerialize="false"
-                generateConstructor="false"
-                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
-
-  <zorba:state generateInit="use-default">
-    <zorba:member type="store::Item_t" name="doc_item_"/>
-    <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
-  </zorba:state>
-
-  <zorba:member type="store::Item_t" name="token_qname_"/>
-  <zorba:member type="store::Item_t" name="lang_qname_"/>
-  <zorba:member type="store::Item_t" name="para_qname_"/>
-  <zorba:member type="store::Item_t" name="sent_qname_"/>
-  <zorba:member type="store::Item_t" name="value_qname_"/>
-  <zorba:member type="store::Item_t" name="ref_qname_"/>
-
-  <zorba:method name="initMembers" return="void"/>
-
-</zorba:iterator>
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:state generateInit="use-default">
+    <zorba:member type="store::Item_t" name="doc_item_"/>
+    <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
+    <zorba:member type="TokenQNames" name="token_qnames_"/>
+  </zorba:state>
+</zorba:iterator>
+
+<!--========================================================================-->
+
+<zorba:iterator name="TokenizeNodesIterator"
+                generateResetImpl="true"
+                preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
+  <zorba:state generateInit="use-default">
+    <zorba:member type="store::Item_t" name="doc_item_"/>
+    <zorba:member type="FTTokenIterator_t" name="doc_tokens_"/>
+
+    <zorba:member type="TokenQNames" name="token_qnames_"/>
+
+    <zorba:member type="std::list&lt;store::Item_t&gt;" name="includes_"/>
+    <zorba:member type="std::vector&lt;store::Item_t&gt;" name="excludes_"/>
+
+    <zorba:member type="std::stack&lt;Tokenizer*>" name="tokenizers_"/>
+    <zorba:member type="std::stack&lt;locale::iso639_1::type&gt;" name="langs_"/>
+    <zorba:member type="TokenizeNodesCallback" name="callback_"/>
+    <zorba:member type="Tokenizer::State" name="t_state_"/>
+    <zorba:member type="std::deque&lt;FTToken&gt;" name="tokens_"/>
+  </zorba:state>
+</zorba:iterator>
+
+<!--========================================================================-->
 
 <zorba:iterator name="TokenizerPropertiesIterator"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
 </zorba:iterator>
 
+<!--========================================================================-->
+
 <zorba:iterator name="TokenizeStringIterator"
                 generateResetImpl="true"
                 preprocessorGuard="#ifndef ZORBA_NO_FULL_TEXT">
 
   <zorba:function>
-
     <zorba:signature localname="tokenize-string" prefix="full-text">
       <zorba:param>xs:string</zorba:param>    <!-- string -->
       <zorba:output>xs:string*</zorba:output>
     </zorba:signature>
-
     <zorba:signature localname="tokenize-string" prefix="full-text">
       <zorba:param>xs:string</zorba:param>    <!-- string -->
       <zorba:param>xs:language</zorba:param>  <!-- lang -->
       <zorba:output>xs:string*</zorba:output>
     </zorba:signature>
-
   </zorba:function>
-
   <zorba:state generateInit="use-default">
     <zorba:member type="FTTokenSeqIterator" name="string_tokens_"/>
   </zorba:state>
-
 </zorba:iterator>
 
+<!--========================================================================-->
+
 </zorba:iterators>
 <!-- vim:set et sw=2 ts=2: -->

=== modified file 'src/runtime/visitors/pregenerated/planiter_visitor.h'
--- src/runtime/visitors/pregenerated/planiter_visitor.h	2012-06-28 21:54:08 +0000
+++ src/runtime/visitors/pregenerated/planiter_visitor.h	2012-06-29 16:44:26 +0000
@@ -232,6 +232,9 @@
     class TokenizeNodeIterator;
 #endif
 #ifndef ZORBA_NO_FULL_TEXT
+    class TokenizeNodesIterator;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
     class TokenizerPropertiesIterator;
 #endif
 #ifndef ZORBA_NO_FULL_TEXT
@@ -1015,6 +1018,10 @@
     virtual void endVisit   ( const TokenizeNodeIterator& ) = 0;
 #endif
 #ifndef ZORBA_NO_FULL_TEXT
+    virtual void beginVisit ( const TokenizeNodesIterator& ) = 0;
+    virtual void endVisit   ( const TokenizeNodesIterator& ) = 0;
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
     virtual void beginVisit ( const TokenizerPropertiesIterator& ) = 0;
     virtual void endVisit   ( const TokenizerPropertiesIterator& ) = 0;
 #endif

=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.cpp'
--- src/runtime/visitors/pregenerated/printer_visitor.cpp	2012-06-28 21:54:08 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.cpp	2012-06-29 16:44:26 +0000
@@ -1442,6 +1442,21 @@
 
 #endif
 #ifndef ZORBA_NO_FULL_TEXT
+// <TokenizeNodesIterator>
+void PrinterVisitor::beginVisit ( const TokenizeNodesIterator& a) {
+  thePrinter.startBeginVisit("TokenizeNodesIterator", ++theId);
+  printCommons( &a, theId );
+  thePrinter.endBeginVisit( theId );
+}
+
+void PrinterVisitor::endVisit ( const TokenizeNodesIterator& ) {
+  thePrinter.startEndVisit();
+  thePrinter.endEndVisit();
+}
+// </TokenizeNodesIterator>
+
+#endif
+#ifndef ZORBA_NO_FULL_TEXT
 // <TokenizerPropertiesIterator>
 void PrinterVisitor::beginVisit ( const TokenizerPropertiesIterator& a) {
   thePrinter.startBeginVisit("TokenizerPropertiesIterator", ++theId);

=== modified file 'src/runtime/visitors/pregenerated/printer_visitor.h'
--- src/runtime/visitors/pregenerated/printer_visitor.h	2012-06-28 21:54:08 +0000
+++ src/runtime/visitors/pregenerated/printer_visitor.h	2012-06-29 16:44:26 +0000
@@ -356,6 +356,11 @@
 #endif
 
 #ifndef ZORBA_NO_FULL_TEXT
+    void beginVisit( const TokenizeNodesIterator& );
+    void endVisit  ( const TokenizeNodesIterator& );
+#endif
+
+#ifndef ZORBA_NO_FULL_TEXT
     void beginVisit( const TokenizerPropertiesIterator& );
     void endVisit  ( const TokenizerPropertiesIterator& );
 #endif

=== modified file 'src/util/xml_util.h'
--- src/util/xml_util.h	2012-06-28 04:14:03 +0000
+++ src/util/xml_util.h	2012-06-29 16:44:26 +0000
@@ -40,12 +40,14 @@
   return o << version_string_of[ v ];
 }
 
-////////// "James Clark notation" universal name functions ////////////////////
+////////// XML name handing ///////////////////////////////////////////////////
 
 /**
  * Attempts to extract the local name from a "universal name".
  * See: http://www.jclark.com/xml/xmlns.htm
  *
+ * @tparam InputStringType The input string type.
+ * @tparam OutputStringType The output string type.
  * @param uname The universal name.
  * @param local A pointer to the string to receive the local name.
  * @return Returns \c true only if the extraction was successful.
@@ -64,6 +66,8 @@
  * Attempts to extract the URI from a "universal name".
  * See: http://www.jclark.com/xml/xmlns.htm
  *
+ * @tparam InputStringType The input string type.
+ * @tparam OutputStringType The output string type.
  * @param uname The universal name.
  * @param uri A pointer to the string to receive the URI.
  * @return Returns \c true only if the extraction was successful.
@@ -80,11 +84,39 @@
   return false;
 }
 
+/**
+ * Splits an XML name at a \c : if present.
+ *
+ * @tparam InputStringType The input string type.
+ * @tparam PrefixStringType The output prefix string type.
+ * @tparam LocalStringType The output local string type.
+ * @param name The XML name to be split.
+ * @param prefix The prefix is put here, if any.
+ * @param local The local name is put here.
+ * @return If \a name contains a \c : and either \a prefix or \a local strings
+ * become empty, returns \c false; otherwise returns \a true.
+ */
+template<class InputStringType,class PrefixStringType,class LocalStringType>
+inline bool split_name( InputStringType const &name, PrefixStringType *prefix,
+                        LocalStringType *local ) {
+  typename InputStringType::size_type const colon = name.find( ':' );
+  if ( colon != InputStringType::npos ) {
+    prefix->assign( name, 0, colon );
+    local->assign( name, colon + 1, LocalStringType::npos );
+    return !( prefix->empty() || local->empty() );
+  } else {
+    prefix->clear();
+    *local = name;
+    return true;
+  }
+}
+
 ////////// Character validity /////////////////////////////////////////////////
 
 /**
  * Checks whether the given code-point is valid for the given XML version.
  *
+ * @tparam CodePointType The integral Unicode code-point type.
  * @param v The XML version to use.
  * @return Returns \c true only if the code-point is valid.
  */
@@ -196,7 +228,7 @@
 /**
  * Parses an XML entity reference.
  *
- * @tparam StringType The type of the input string.
+ * @tparam StringType The input string type.
  * @param ref The string pointing to the start of the entity reference.
  * @param c A pointer to the code-point result.
  * @return If successful, returns the number of characters parsed; otherwise
@@ -211,7 +243,7 @@
  * Parses an XML entity reference and appends the UTF-8 encoding of the
  * resulting code-point to the given string.
  *
- * @tparam StringType The type of the output string.
+ * @tparam StringType The output string type.
  * @param ref The C string pointing to the start of the entity reference.
  * @param out A string to append to.
  * @return If successful, returns the number of characters parsed; otherwise
@@ -230,8 +262,8 @@
  * Parses an XML entity reference and appends the UTF-8 encoding of the
  * resulting code-point to the given string.
  *
- * @tparam InputStringType The type of the input string.
- * @tparam OutputStringType The type of the output string.
+ * @tparam InputStringType The input string type.
+ * @tparam OutputStringType The output string type.
  * @param ref The string pointing to the start of the entity reference.
  * @param out A string to append to.
  * @return If successful, returns the number of characters parsed; otherwise

=== added file 'test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res'
--- test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res	1970-01-01 00:00:00 +0000
+++ test/rbkt/ExpQueryResults/zorba/fulltext/ft-module-tokenize-nodes-1.xml.res	2012-06-29 16:44:26 +0000
@@ -0,0 +1,1 @@
+true

=== added file 'test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq'
--- test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq	1970-01-01 00:00:00 +0000
+++ test/rbkt/Queries/zorba/fulltext/ft-module-tokenize-nodes-1.xq	2012-06-29 16:44:26 +0000
@@ -0,0 +1,42 @@
+import module namespace ft = "http://www.zorba-xquery.com/modules/full-text";;
+import schema namespace fts = "http://www.zorba-xquery.com/modules/full-text";;
+
+let $book :=
+  <book>
+    <title>The C++ Programming Language</title>
+    <authors>
+      <author>Bjarne Stroustrup</author>
+    </authors>
+    <chapters>
+      <chapter>
+        <title>Notes to the Reader</title>
+        <content>
+          <quote>
+            <content>
+              "The time has come," the Walrus said,
+              "to talk of many things."
+            </content>
+            <source>Lewis Carroll</source>
+          </quote>
+          <!-- more content -->
+        </content>
+      </chapter>
+    </chapters>
+  </book>
+
+let $includes := $book//chapter
+let $excludes := $book//quote
+
+let $tokens := ft:tokenize-nodes( $includes, $excludes, xs:language("en") )
+
+let $t1 := validate { $tokens[1] }
+let $t2 := validate { $tokens[2] }
+let $t3 := validate { $tokens[3] }
+let $t4 := validate { $tokens[4] }
+
+return  $t1/@value = "Notes"
+    and $t2/@value = "to"
+    and $t3/@value = "the"
+    and $t4/@value = "Reader"
+
+(: vim:set et sw=2 ts=2: :)

Follow ups

[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: noreply, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Zorba Build Bot, 2012-06-29
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
[Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Matthias Brantner, 2012-06-29
Re: [Merge] lp:~paul-lucas/zorba/feature-ft_bw into lp:zorba
From: Paul J. Lucas, 2012-06-29